skbold.postproc package¶

The postproc subpackage contains all off skbold’s ‘postprocessing’ tools. Most prominently, it contains the MvpResults objects (both MvpResultsClassification and MvpResultsRegression) which can be used in analyses to keep track of model performance across iterations/folds (in cross-validation). Additionally, it allows for keeping track of feature-scores (e.g. f-values from the univariate feature selection procedure) or model weights (e.g. SVM-coefficients). These coefficients can kept track of as raw weights [1]_ or as ‘forward-transformed’ weights [2]_.

The postproc subpackage additionally contains the function ‘extract_roi_info’, which allows to calculate the amount of voxels (and other statistics) per ROI in a single statistical brain map and output a csv-file.

The cluster_size_threshold function allows you to set voxels to zero which do not belong to a cluster of a given extent/size. This is NOT a statistical procedure (like GRF thresholding), but merely a tool for visualization purposes.

References

[1]	Stelzer, J., Buschmann, T., Lohmann, G., Margulies, D.S., Trampel,

R., and Turner, R. (2014). Prioritizing spatial accuracy in high-resolution fMRI data using multivariate feature weight mapping. Front. Neurosci., http://dx.doi.org/10.3389/fnins.2014.00066.

[2]	Haufe, S., Meineck, F., Gorger, K., Dahne, S., Haynes, J-D.,

Blankertz, B., and Biessmann, F. et al. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87, 96-110.

extract_roi_info(statfile, stat_name=None, roi_type='unilateral', per_cluster=True, cluster_engine='scipy', min_clust_size=20, stat_threshold=None, mask_threshold=20, save_indices=True, verbose=True)[source]¶

Extracts information per ROI for a given statistics-file. Reads in a thresholded (!) statistics-file (such as a thresholded z- or t-stat from a FSL first-level directory) and calculates for a set of ROIs the number of significant voxels included and its maximum value (+ coordinates). Saves a csv-file in the same directory as the statistics-file. Assumes that the statistics file is in MNI152 2mm space.

Parameters:	statfile (str) – Absolute path to statistics-file (nifti) that needs to be evaluated. stat_name (str) – Name for the contrast/stat-file that is being analyzed. roi_type (str) – Whether to use unilateral or bilateral masks (thus far, only Harvard- Oxford atlas masks are supported.) per_cluster (bool) – Whether to evaluate the statistics-file as a whole (per_cluster=False) or per cluster separately (per_cluster=True). cluster_engine (str) – Which ‘engine’ to use for clustering; can be ‘scipy’ (default), using scipy.ndimage.measurements.label, or ‘fsl’ (using FSL’s cluster commmand). min_clust_size (int) – Minimum cluster size (i.e. clusters with fewer voxels than this number are discarded; also, ROIs containing fewer voxels than this will not be listed on the CSV. stat_threshold (int or float) – If the stat-file contains uncorrected data, stat_threshold can be used to set a lower bound. mask_threshold (bool) – Threshold for probabilistics masks, such as the Harvard-Oxford masks. Default of 25 is chosen as this minimizes overlap between adjacent masks while still covering most of the entire brain. save_indices (bool) – Whether to save the indices (coordinates) of peaks of clusters. verbose (bool) – Whether to print some output regarding the parsing process.
Returns:	df – Dataframe corresponding to the written csv-file.
Return type:	Dataframe

class MvpResults(mvp, n_iter, type_model='classification', feature_scoring=None, confmat=False, verbose=False, **metrics)[source]¶

Bases: object

Class to keep track of model evaluation metrics and feature scores. See the ReadTheDocs homepage for more information on its API and use.

Parameters:

mvp (mvp-object) – Necessary to extract some metadata from.
n_iter (int) – Number of folds that will be kept track of.
type_model (str) – Either ‘classification’ or ‘regression’
feature_scoring (str) – Which method to use to calculate feature-scores with. Can be: 1) ‘fwm’: feature weight mapping [1]_ - keep track of raw voxel-weights (coefficients) 2) ‘forward’: transform raw voxel-weights to corresponding forward- model [2]_.
confmat (bool) – Whether to keep track of the confusion-matrix across folds (only relevant for type_model=’classification’)
verbose (bool) – Whether to print extra output.
**metrics (keyword-arguments) – Keyword arguments of the form: name_metric: metric_function; any metric from scikit-learn works (or other metrics, as long as they have two input args, y_true and y_pred).

References

[1]	Stelzer, J., Buschmann, T., Lohmann, G., Margulies, D.S., Trampel, R., and Turner, R. (2014). Prioritizing spatial accuracy in high-resolution fMRI data using multivariate feature weight mapping. Front. Neurosci., http://dx.doi.org/10.3389/fnins.2014.00066.

[2]	Haufe, S., Meineck, F., Gorger, K., Dahne, S., Haynes, J-D., Blankertz, B., and Biessmann, F. et al. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87, 96-110.

compute_scores(multiclass='ovr', maps_to_tstat=True)[source]¶: Computes scores across folds.

load_model(path, param=None)[source]¶

Load model or pipeline from disk.

Parameters:	path (str) – Absolute path to model. param (str) – Which, if any, specific param needs to be loaded.

save_model(model, out_path)[source]¶

Method to serialize model(s) to disk.

Parameters:	model (pipeline or scikit-learn object.) – Model to be saved.

update(test_idx, y_pred, pipeline=None)[source]¶

Updates with information from current fold.

Parameters:	test_idx (ndarray) – Indices of current test-trials. y_pred (ndarray) – Predictions of current test-trials. pipeline (scikit-learn Pipeline object) – pipeline from which relevant scores/coefficients will be extracted.

write(out_path, confmat=True, to_tstat=True, multiclass='ovr')[source]¶

Writes results to disk.

Parameters:	out_path (str) – Where to save the results to feature_viz (bool) – Whether to write out (and optionally return) feature-visualization information confmat (bool) – Whether to write out (and optionally return) the confusion-matrix (across folds). to_tstat (bool) – Whether to convert averaged feature-scores to t-tstats (by dividing them by sqrt(score.std(axis=0)).

class MvpAverageResults(mvp_results_list, identifiers=None)[source]¶

Bases: object

Averages results from MVPA analyses on, for example, different subjects or different ROIs.

Parameters:	mvp_results_list (list) – List with MvpResults objects (after updating across folds) identifiers (list of str) – List of identifiers (e.g. subject-name) that correspond to the different MvpResults objects

compute_statistics(metric='accuracy', h0=0.5)[source]¶

Computes statistics across MvpResults objects

Parameters:	metric (str) – Which metric should be used in the MvpResults dataframes h0 (float) – The null-hypothesis in terms of model performance (e.g. accuracy equals 1 / n_classes)

write(path, name='average_results')[source]¶

cluster_size_threshold(data, thresh=None, min_size=20, save=False)[source]¶

Removes clusters smaller than a prespecified number in a stat-file.

Parameters:

data (numpy-array or str) – 3D Numpy-array with statistic-value or a string to a path pointing to a nifti-file with statistic values.
thresh (int, float) – Initial threshold to binarize the image and extract clusters.
min_size (int) – Minimum size (i.e. amount of voxels) of cluster. Any cluster with fewer voxels than this amount is set to zero (‘removed’).
save (bool) – If data is a file-path, this parameter determines whether the cluster- corrected file is saved to disk again.

class PrevalenceInference(obs, perms, P2=100000, gamma0=0.5, alpha=0.05)[source]¶

Bases: object

Class that performs PrevalenceInference based on the paper by Allefeld, Gorgen, & Haynes (2016), NeuroImage.

Parameters:

obs (numpy ndarray) – A 2D array of shape [N (subjects) x K (voxels)], or a 1D array of shape [N, 1].
perms (numpy ndarray) – A 3D array of shape [N (subjects) x K (voxels) x P1 (first level permutations)], or a 2D array of shape [N x P1]
P2 (int) – Number of second level permutations to run
gamma0 (float) – What prevalence inference null (gamma < gamma0) to test
alpha (float) – Significance level for hypothesis testing

Examples

>>> from skbold.postproc import PrevalenceInference
>>> import numpy as np
>>> N, K, P1 = 20, (40, 40, 38), 15
>>> obs = np.random.normal(loc=0.55, scale=0.05, size=(N, np.prod(K)))
>>> perms = np.random.normal(loc=0.5, scale=0.05, size=(N, np.prod(K), P1))
>>> pvi = PrevalenceInference(obs=obs, perms=perms, P2=100000, gamma0=05,
                              alpha=0.05)
>>> pvi.run()
Running with parameters:
    N = 20
    K = 60800
    P1 = 15
    P2 = 100000

__init__(obs, perms, P2=100000, gamma0=0.5, alpha=0.05)[source]¶: Initializes PrevalenceInference object.

run()[source]¶: Runs actual prevalence inference algorithm.

write(path)[source]¶

Writes results from Prevalence Inference procedure to disk.

Parameters:	path (str) – Where to write the results to disk