skbold.core package

The core subpackage contains skbold’s most important data-structure: the Mvp. This class forms the basis of the ‘multivoxel-patterns’ (i.e. mvp) that are used throughout the package. Subclasses of Mvp (MvpWithin and MvpBetween) are also defined in this core module.

The MvpWithin object is meant as a data-structure that contains a set of multivoxel fMRI patterns of single trials, for a single subject, hence the ‘within’ part (i.e. within-subjects). Currently, it has a single public method, create(), loading a set of contrasts from a FSL-firstlevel directory (i.e. a .feat-directory). Thus, importantly, it assumes that the single-trial patterns are already modelled, on a single-trial basis, using some kind of GLM. These trialwise patterns are then horizontally stacked to create a 2D samples by features matrix, which is set to the X attribute of MvpWithin.

The MvpBetween object is meant as a data-structure that contains a set of multivoxel fMRI patterns of single conditions, for a set of subjects. It is, so to say, a ‘between-subjects’ multivoxel pattern, in which subjects are ‘samples’. In contrast to MvpWithin, contrasts that will be loaded are less restricted in terms of their format; the only requisite is that they are nifti files. Notably, the MvpBetween format allows to vertically stack different kind of ‘feature-sets’ in a single MvpBetween object. For example, it is possible to, for a given set of subjects, stack a functional contrast (e.g. a high-load minus low-load functional contrast) with another functional contrast (e.g. a conflict minus no-conflict functional contrast) in order to use features from both sets to predict a certain psychometric or behavioral variable of the corresponding subjects (such as, e.g., intelligence). Also, the MvpBetween format allows to load (and stack!) VBM, TBSS, resting-state (to extract connectivity measures), and dual-regression data. More information can be found below in the API. A use case can be found on the main page of ReadTheDocs.

Also, functional-to-standard (i.e. convert2mni) and standard-to-functional (i.e. convert2epi) warp-functions for niftis are defined here, because they have caused circular import errors in the past.

class Mvp(X=None, y=None, mask=None, mask_thres=0)[source]

Bases: object

Mvp (multiVoxel Pattern) class. Creates an object, specialized for storing fMRI data that will be analyzed using machine learning or RSA-like analyses, that stores both the data (X: an array of samples by features, y: numeric labels corresponding to X’s classes/conditions) and the corresponding meta-data (e.g. nifti header, mask info, etc.).

Parameters:
  • X (ndarray) – A 2D numpy-array with rows indicating samples and columns indicating features.
  • y (list or ndarray) – Array/list with labels/targets corresponding to samples in X.
  • mask (str) – Absolute path to nifti-file that will mask (index) the patterns.
  • mask_thres (int or float) – Minimum value for mask (in cases of probabilistic masks).
Variables:
  • mask_shape (tuple) – Shape of mask that patterns will be indexed with.
  • nifti_header (Nifti1Header object) – Nifti-header from corresponding mask.
  • affine (ndarray) – Affine corresponding to nifti-mask.
  • voxel_idx (ndarray) – Array with integer-indices indicating which voxels are used in the patterns relative to whole-brain space. In other words, it allows to map back the patterns to a whole-brain orientation.
  • X (ndarray) – The actual patterns (2D: samples X features)
  • y (list or ndarray) – Array/list with labels/targets corresponding to samples in X.

Notes

This class is mainly meant to serve as a parent-class for MvpWithin and MvpBetween, but it can alternatively be used as an object to store a ‘custom’ multivariate-pattern set with meta-data.

update_mask(mask, threshold=0)[source]
write(path=None, name='mvp', backend='joblib')[source]

Writes the Mvp-object to disk.

Parameters:
  • path (str) – Absolute path where the file will be written to.
  • name (str) – Name of to-be-written file.
  • backend (str) – Which format will be used to save the files. Default is ‘joblib’, which conveniently saves the Mvp-object as one file. Alternatively, and if the Mvp-object is too large to be save with joblib, a data-header format will be used, in which the data (X) will be saved using Numpy and the meta-data (everythin except X) will be saved using joblib.
convert2epi(file2transform, reg_dir, out_dir=None, interpolation='trilinear', suffix='epi', overwrite=False)[source]

Transforms a nifti from mni152 (2mm) to EPI (native) format. Assuming that reg_dir is a directory with transformation-files (warps) including standard2example_func warps, this function uses nipype’s fsl interface to flirt a nifti to EPI format.

Parameters:
  • file2transform (str or list) – Absolute path(s) to nifti file(s) that needs to be transformed
  • reg_dir (str) – Absolute path to registration directory with warps
  • out_dir (str) – Absolute path to desired out directory. Default is same directory as the to-be transformed file.
  • interpolation (str) – Interpolation used by flirt. Default is ‘trilinear’.
  • suffix (str) – What to suffix the transformed file with (default : ‘epi’)
  • overwrite (bool) – Whether to overwrite existing transformed files
Returns:

out_all – Absolute path(s) to newly transformed file(s).

Return type:

list

convert2mni(file2transform, reg_dir, out_dir=None, interpolation='trilinear', suffix=None, overwrite=False, apply_warp=True)[source]

Transforms a nifti to mni152 (2mm) format. Assuming that reg_dir is a directory with transformation-files (warps) including example_func2standard warps, this function uses nipype’s fsl interface to flirt a nifti to mni format.

Parameters:
  • file2transform (str or list) – Absolute path to nifti file(s) that needs to be transformed
  • reg_dir (str) – Absolute path to registration directory with warps
  • out_dir (str) – Absolute path to desired out directory. Default is same directory as the to-be transformed file.
  • interpolation (str) – Interpolation used by flirt. Default is ‘trilinear’.
  • suffix (str) – What to append to name when converted (default : basename file2transform).
  • overwrite (bool) – Whether to overwrite already existing transformed file(s)
  • apply_warp (bool) – Whether to use the non-linear warp transform (if available).
Returns:

out_all – Absolute path(s) to newly transformed file(s).

Return type:

list

class MvpBetween(source, subject_idf='sub0???', remove_zeros=True, X=None, y=None, mask=None, mask_thres=0, subject_list=None)[source]

Bases: skbold.core.mvp.Mvp

Extracts and stores multivoxel pattern information across subjects. The MvpBetween class allows for the extraction and storage of multivoxel (MRI) pattern information across subjects. The MvpBetween class can handle various types of information, including functional contrasts, 3D (subject-specific) and 4D (subjects stacked) VBM and TBSS data, dual-regression data, and functional-connectivity data from resting-state scans (experimental).

Parameters:
  • source (dict) –

    Dictionary with types of data as keys and data-specific dictionaries as values. Keys can be ‘Contrast_*’ (indicating a 3D functional contrast), ‘4D_anat’ (for 4D anatomical - VBM/TBSS - files), ‘VBM’, ‘TBSS’, and ‘dual_reg’ (a subject-spedific 4D file with components as fourth dimension).

    The dictionary passed as values must include, for each data-type, a path with wildcards to the corresponding (subject-specific) data-file. Other, optional, key-value pairs per data-type can be assigned, including ‘mask’: ‘path’, to use data-type-specific masks.

    An example:

    >>> source = {}
    >>> path_emo = '~/data/sub0*/*.feat/stats/tstat1.nii.gz'
    >>> source['Contrast_emo'] = {'path': path_emo}
    >>> vbm_mask = '~/vbm_mask.nii.gz'
    >>> path_vbm = '~/data/sub0*/*vbm.nii.gz'
    >>> source['VBM'] = {'path': path_vbm, 'mask': vbm_mask}
    
  • subject_idf (str) – Subject-identifier. This identifier is used to extract subject-names from the globbed directories in the ‘path’ keys in source, so that it is known which pattern belongs to which subject. This way, MvpBetween can check which subjects contain complete data!
  • X (ndarray) – Not necessary to pass MvpWithin, but needs to be defined as it is needed in the super-constructor.
  • y (ndarray or list) – Labels or targets corresponding to the samples in X.
  • mask (str) – Absolute path to nifti-file that will be used as a common mask. Note: this will only be applied if its shape corresponds to the to-be-indexed data. Otherwise, no mask is applied. Also, this mask is ‘overridden’ if source[data_type] contains a ‘mask’ key, which implies that this particular data-type has a custom mask.
  • mask_threshold (int or float) – Minimum value to binarize the mask when it’s probabilistic.
Variables:
  • mask_shape (tuple) – Shape of mask that patterns will be indexed with.
  • nifti_header (list of Nifti1Header objects) – Nifti-headers from original data-types.
  • affine (list of ndarray) – Affines corresponding to nifti-masks of each data-type.
  • X (ndarray) – The actual patterns (2D: samples X features)
  • y (list or ndarray) – Array/list with labels/targets corresponding to samples in X.
  • common_subjects (list) – List of subject-names that have complete data specified in source.
  • featureset_id (ndarray) – Array with integers of size X.shape[1] (i.e. the amount of features in X). Each unique integer, starting at 0, refers to a different feature-set.
  • voxel_idx (ndarray) –

    Array with integers of size X.shape[1]. Per feature-set, these voxel- indices allow the features to be mapped back to whole-brain space. For example, to map back the features in X from feature set 1 to MNI152 (2mm) space, do:

    >>> mni_vol = np.zeros((91, 109, 91))
    >>> tmp_idx = mvp.featureset_id == 0
    >>> mni_vol[mvp.featureset_id[tmp_idx]] = mvp.X[0, tmp_idx]
    
  • data_shape (list of tuples) – Original (whole-brain) shape of the loaded data, per data-type.
  • data_name (list of str) – List of names of data-types.
add_y(file_path, col_name, sep='\t', index_col=0, normalize=False, remove=None, ensure_balanced=False, nan_strategy='remove', **kwargs)[source]

Sets y attribute to an outcome-variable (target).

Parameters:
  • file_path (str) – Absolute path to spreadsheet-like file including the outcome var.
  • col_name (str) – Column name in spreadsheet containing the outcome variable
  • sep (str) – Separator to parse the spreadsheet-like file.
  • index_col (int) – Which column to use as index (should correspond to subject-name).
  • normalize (bool) – Whether to normalize (0 mean, unit std) the outcome variable.
  • remove (int or float or str) – Removes instances in which y == remove from MvpBetween object.
  • ensure_balanced (bool) – Whether to ensure balanced classes (if True, done by undersampling the majority class).
  • nan_strategy (str) – Strategy on how to deal with NaNs. Default: ‘remove’. Also, a specific string, int, or float can be specified when you want to impute a specific value. Other options, see: sklearn.preprocessing.Imputer.
  • **kwargs – Arbitrary keyword arguments passed to pandas read_csv.
apply_binarization_params(param_file, ensure_balanced=False)[source]

Applies binarization-parameters to y.

binarize_y(params, save_path=None, ensure_balanced=False)[source]

Binarizes mvp’s y-attribute using a specified method.

Parameters:
  • params (dict) –

    The outcome variable (y) will be binarized along the key-value pairs in the params-argument. Options:

    >>> params = {'type': 'percentile', 'high': 75, 'low': 25}
    >>> params = {'type': 'zscore', 'std': 1}
    >>> params = {'type': 'constant', 'cutoff': 10}
    >>> params = {'type': 'median'}
    
  • save_path (str) – If not None (default), this should be an absolute path referring to where the binarization-params should be saved.
  • ensure_balanced (bool) – Whether to ensure balanced classes (if True, done by undersampling the majority class).
create()[source]

Extracts and stores data as specified in source.

Raises:ValueError – If data-type is not one of [‘VBM’, ‘TBSS’, ‘4D_anat*’, ‘dual_reg’, ‘Contrast*’]
run_searchlight(out_dir, name='sl_results', n_folds=10, radius=5, mask=None, estimator=None, **kwargs)[source]

Runs a searchlight on the mvp object.

Parameters:
  • out_dir (str) – Path to which to save the searchlight results
  • name (str) – Name for the searchlight-results-file (nifti)
  • n_folds (int) – The amount of folds in sklearn’s StratifiedKFold.
  • radius (int/list) – Radius for the searchlight. If list, it iterates over radii.
  • mask (str) – Path to mask to apply to mvp. If nothing is listed, it will use the masks applied when the mvp was created.
  • estimator (sklearn estimator or pipeline) – Estimator to use in the classification process.
  • **kwargs – Other keyword arguments for initializing nilearn’s searchlight object (see nilearn.github.io/decoding/searchlight.html).
split(file_path, col_name, target, sep='\t', index_col=0, nan_strategy='train', **kwargs)[source]

Splits an MvpBetween object based on some external index.

Parameters:
  • file_path (str) – Absolute path to spreadsheet-like file including the outcome var.
  • col_name (str) – Column name in spreadsheet containing the outcome variable
  • target (str or int or float) – Target to which the data in col_name needs to be compared to, in order to create an index.
  • sep (str) – Separator to parse the spreadsheet-like file.
  • index_col (int) – Which column to use as index (should correspond to subject-name).
  • nan_strategy (str) – Which value to impute if the labeling is absent. Default: ‘train’.
  • **kwargs – Arbitrary keyword arguments passed to pandas read_csv.
update_sample(idx)[source]

Updates the data matrix and associated attributes.

write_4D(path=None, return_nimg=False)[source]

Writes a 4D nifti (subs = 4th dimension) of X.

Parameters:
  • path (str) – Absolute path to save nifti to.
  • return_nimg (bool) – Whether to actually return the Nifti1-image object.
class MvpWithin(source, read_labels=True, remove_contrast=[], invert_selection=None, ref_space='epi', statistic='tstat', remove_zeros=True, X=None, y=None, mask=None, mask_threshold=0)[source]

Bases: skbold.core.mvp.Mvp

Extracts and stores subject-specific single-trial multivoxel-patterns The MvpWithin class allows for the extraction of subject-specific single-trial, multivoxel fMRI patterns from a FSL feat-directory.

Parameters:
  • source (str) – An absolute path to a subject-specific first-level FEAT directory.
  • read_labels (bool) – Whether to read the labels/targets (i.e. y) from the contrast names defined in the design.con file.
  • remove_contrast (list) – Given that all contrasts (COPEs) are loaded from the FEAT-directory, this argument can be used to remove irrelevant contrasts (e.g. contrasts of nuisance predictors). Entries in remove_contrast do not have to literal; they may be a substring of the full name of the contrast.
  • invert_selection (bool) – Sometimes, instead of loading in all contrasts and excluding some, you might want to load only a single or a couple contrasts, and exclude all other. By setting invert_selection to True, it treats the remove_contrast variable as a list of contrasts to include.
  • ref_space (str) – Indicates in which ‘space’ the patterns will be stored. The default is ‘epi’, indicating that the patterns will be loaded and stored in subject-specific (native) functional space. The other option is ‘mni’, which indicates that MvpWithin will first transform contrasts to MNI152 (2mm) space before it loads them. This option assumes that a ‘reg’ directory is present in the .feat-directory, including warp-files from functional to mni space (i.e. example_func2standara.nii.gz).
  • statistic (str) – Which statistic (beta = (CO)PE, tstat, zstat, etc.) from FEAT directories to use as patterns.
  • remove_zeros (bool) – Whether to remove features (i.e. voxels) which are 0 across all trials (due to, e.g., being located outside the brain).
  • X (ndarray) – Not necessary to pass MvpWithin, but needs to be defined as it is needed in the super-constructor.
  • y (ndarray or list) – Labels or targets corresponding to the samples in X. This can be used when read_labels is set to False.
  • mask (str) – Absolute path to nifti-file that will be used as mask.
  • mask_threshold (int or float) – Minimum value to binarize the mask when it’s probabilistic.
Variables:
  • mask_shape (tuple) – Shape of mask that patterns will be indexed with.
  • nifti_header (Nifti1Header object) – Nifti-header from corresponding mask.
  • affine (ndarray) – Affine corresponding to nifti-mask.
  • voxel_idx (ndarray) – Array with integer-indices indicating which voxels are used in the patterns relative to whole-brain space. In other words, it allows to map back the patterns to a whole-brain orientation.
  • X (ndarray) – The actual patterns (2D: samples X features)
  • y (list or ndarray) – Array/list with labels/targets corresponding to samples in X.
  • contrast_labels (list) – List of names corresponding to the y-values.
create()[source]

Extracts (meta-)data from FEAT-directory given appropriate settings during initialization.

Raises:
  • ValueError – If the ‘source’-directory doesn’t exist.
  • ValueError – If the number of loaded contrasts does not equal the number of extracted labels.