skbold.postproc package

The postproc subpackage contains all off skbold’s ‘postprocessing’ tools. Most prominently, it contains the MvpResults objects (both MvpResultsClassification and MvpResultsRegression) which can be used in analyses to keep track of model performance across iterations/folds (in cross-validation). Additionally, it allows for keeping track of feature-scores (e.g. f-values from the univariate feature selection procedure) or model weights (e.g. SVM-coefficients). These coefficients can kept track of as raw weights [1] or as ‘forward-transformed’ weights [2].

The postproc subpackage additionally contains the function ‘extract_roi_info’, which allows to calculate the amount of voxels (and other statistics) per ROI in a single statistical brain map and output a csv-file.

The cluster_size_threshold function allows you to set voxels to zero which do not belong to a cluster of a given extent/size. This is NOT a statistical procedure (like GRF thresholding), but merely a tool for visualization purposes.

References

[1]Stelzer, J., Buschmann, T., Lohmann, G., Margulies, D.S., Trampel,

R., and Turner, R. (2014). Prioritizing spatial accuracy in high-resolution fMRI data using multivariate feature weight mapping. Front. Neurosci., http://dx.doi.org/10.3389/fnins.2014.00066.

[2]Haufe, S., Meineck, F., Gorger, K., Dahne, S., Haynes, J-D.,

Blankertz, B., and Biessmann, F. et al. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87, 96-110.

extract_roi_info(statfile, stat_name=None, roi_type='unilateral', per_cluster=True, cluster_engine='scipy', min_clust_size=20, stat_threshold=None, mask_threshold=20, save_indices=True, verbose=True)[source]

Extracts information per ROI for a given statistics-file. Reads in a thresholded (!) statistics-file (such as a thresholded z- or t-stat from a FSL first-level directory) and calculates for a set of ROIs the number of significant voxels included and its maximum value (+ coordinates). Saves a csv-file in the same directory as the statistics-file. Assumes that the statistics file is in MNI152 2mm space.

Parameters:
  • statfile (str) – Absolute path to statistics-file (nifti) that needs to be evaluated.
  • stat_name (str) – Name for the contrast/stat-file that is being analyzed.
  • roi_type (str) – Whether to use unilateral or bilateral masks (thus far, only Harvard- Oxford atlas masks are supported.)
  • per_cluster (bool) – Whether to evaluate the statistics-file as a whole (per_cluster=False) or per cluster separately (per_cluster=True).
  • cluster_engine (str) – Which ‘engine’ to use for clustering; can be ‘scipy’ (default), using scipy.ndimage.measurements.label, or ‘fsl’ (using FSL’s cluster commmand).
  • min_clust_size (int) – Minimum cluster size (i.e. clusters with fewer voxels than this number are discarded; also, ROIs containing fewer voxels than this will not be listed on the CSV.
  • stat_threshold (int or float) – If the stat-file contains uncorrected data, stat_threshold can be used to set a lower bound.
  • mask_threshold (bool) – Threshold for probabilistics masks, such as the Harvard-Oxford masks. Default of 25 is chosen as this minimizes overlap between adjacent masks while still covering most of the entire brain.
  • save_indices (bool) – Whether to save the indices (coordinates) of peaks of clusters.
  • verbose (bool) – Whether to print some output regarding the parsing process.
Returns:

df – Dataframe corresponding to the written csv-file.

Return type:

Dataframe

class MvpResultsClassification(mvp, n_iter, feature_scoring='fwm', verbose=False, out_path=None)[source]

Bases: skbold.postproc.mvp_results.MvpResults

MvpResults class specifically for classification analyses.

Parameters:
  • mvp (mvp-object) – Necessary to extract some metadata from.
  • n_iter (int) – Number of folds that will be kept track of.
  • out_path (str) – Path to save results to.
  • feature_scoring (str) – Which method to use to calculate feature-scores with. Can be: 1) ‘coef’: keep track of raw voxel-weights (coefficients) 2) ‘forward’: transform raw voxel-weights to corresponding forward- model (see Haufe et al. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87, 96-110.)
  • verbose (bool) – Whether to print extra output.
compute_scores()[source]

Computes scores across folds.

update(test_idx, y_pred, pipeline=None)[source]

Updates with information from current fold.

Parameters:
  • test_idx (ndarray) – Indices of current test-trials.
  • y_pred (ndarray) – Predictions of current test-trials.
  • values (ndarray) – Values of features for model in the current fold. This can be the entire pipeline (in this case, it is extracted automaticlly). When a pipeline is passed, the idx-parameter does not have to be passed.
  • idx (ndarray) – Index mapping the ‘values’ back to whole-brain space.
class MvpResultsRegression(mvp, n_iter, feature_scoring='', verbose=False, out_path=None)[source]

Bases: skbold.postproc.mvp_results.MvpResults

MvpResults class specifically for Regression analyses.

Parameters:
  • mvp (mvp-object) – Necessary to extract some metadata from.
  • n_iter (int) – Number of folds that will be kept track of.
  • out_path (str) – Path to save results to.
  • feature_scoring (str) – Which method to use to calculate feature-scores with. Can be: 1) ‘coef’: keep track of raw voxel-weights (coefficients) 2) ‘forward’: transform raw voxel-weights to corresponding forward- model (see Haufe et al. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87, 96-110.)
  • verbose (bool) – Whether to print extra output.

:param .. warning:: Has not been tested with MvpWithin!:

compute_scores()[source]

Computes scores across folds.

update(test_idx, y_pred, pipeline=None)[source]

Updates with information from current fold.

Parameters:
  • test_idx (ndarray) – Indices of current test-trials.
  • y_pred (ndarray) – Predictions of current test-trials.
  • pipeline (scikit-learn Pipeline object) – pipeline from which relevant scores/coefficients will be extracted.
class MvpAverageResults(out_dir, type='classification')[source]

Bases: object

Averages results from MVPA analyses on, for example, different subjects or different ROIs.

Parameters:out_dir (str) – Absolute path to directory where the results will be saved.
compute(mvp_list, identifiers, metric='f1', h0=0.5)[source]
write(path, name='average_results')[source]
cluster_size_threshold(data, thresh=None, min_size=20, save=False)[source]

Removes clusters smaller than a prespecified number in a stat-file.

Parameters:
  • data (numpy-array or str) – 3D Numpy-array with statistic-value or a string to a path pointing to a nifti-file with statistic values.
  • thresh (int, float) – Initial threshold to binarize the image and extract clusters.
  • min_size (int) – Minimum size (i.e. amount of voxels) of cluster. Any cluster with fewer voxels than this amount is set to zero (‘removed’).
  • save (bool) – If data is a file-path, this parameter determines whether the cluster- corrected file is saved to disk again.