skbold.postproc package¶
The postproc subpackage contains all off skbold’s ‘postprocessing’ tools. Most prominently, it contains the MvpResults objects (both MvpResultsClassification and MvpResultsRegression) which can be used in analyses to keep track of model performance across iterations/folds (in crossvalidation). Additionally, it allows for keeping track of featurescores (e.g. fvalues from the univariate feature selection procedure) or model weights (e.g. SVMcoefficients). These coefficients can kept track of as raw weights [1]_ or as ‘forwardtransformed’ weights [2]_.
The postproc subpackage additionally contains the function ‘extract_roi_info’, which allows to calculate the amount of voxels (and other statistics) per ROI in a single statistical brain map and output a csvfile.
The cluster_size_threshold function allows you to set voxels to zero which do not belong to a cluster of a given extent/size. This is NOT a statistical procedure (like GRF thresholding), but merely a tool for visualization purposes.
References
[1]  Stelzer, J., Buschmann, T., Lohmann, G., Margulies, D.S., Trampel, 
R., and Turner, R. (2014). Prioritizing spatial accuracy in highresolution fMRI data using multivariate feature weight mapping. Front. Neurosci., http://dx.doi.org/10.3389/fnins.2014.00066.
[2]  Haufe, S., Meineck, F., Gorger, K., Dahne, S., Haynes, JD., 
Blankertz, B., and Biessmann, F. et al. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87, 96110.

extract_roi_info
(statfile, stat_name=None, roi_type='unilateral', per_cluster=True, cluster_engine='scipy', min_clust_size=20, stat_threshold=None, mask_threshold=20, save_indices=True, verbose=True)[source]¶ Extracts information per ROI for a given statisticsfile. Reads in a thresholded (!) statisticsfile (such as a thresholded z or tstat from a FSL firstlevel directory) and calculates for a set of ROIs the number of significant voxels included and its maximum value (+ coordinates). Saves a csvfile in the same directory as the statisticsfile. Assumes that the statistics file is in MNI152 2mm space.
Parameters:  statfile (str) – Absolute path to statisticsfile (nifti) that needs to be evaluated.
 stat_name (str) – Name for the contrast/statfile that is being analyzed.
 roi_type (str) – Whether to use unilateral or bilateral masks (thus far, only Harvard Oxford atlas masks are supported.)
 per_cluster (bool) – Whether to evaluate the statisticsfile as a whole (per_cluster=False) or per cluster separately (per_cluster=True).
 cluster_engine (str) – Which ‘engine’ to use for clustering; can be ‘scipy’ (default), using scipy.ndimage.measurements.label, or ‘fsl’ (using FSL’s cluster commmand).
 min_clust_size (int) – Minimum cluster size (i.e. clusters with fewer voxels than this number are discarded; also, ROIs containing fewer voxels than this will not be listed on the CSV.
 stat_threshold (int or float) – If the statfile contains uncorrected data, stat_threshold can be used to set a lower bound.
 mask_threshold (bool) – Threshold for probabilistics masks, such as the HarvardOxford masks. Default of 25 is chosen as this minimizes overlap between adjacent masks while still covering most of the entire brain.
 save_indices (bool) – Whether to save the indices (coordinates) of peaks of clusters.
 verbose (bool) – Whether to print some output regarding the parsing process.
Returns: df – Dataframe corresponding to the written csvfile.
Return type: Dataframe

class
MvpResults
(mvp, n_iter, type_model='classification', feature_scoring=None, confmat=False, verbose=False, **metrics)[source]¶ Bases:
object
Class to keep track of model evaluation metrics and feature scores. See the ReadTheDocs homepage for more information on its API and use.
Parameters:  mvp (mvpobject) – Necessary to extract some metadata from.
 n_iter (int) – Number of folds that will be kept track of.
 type_model (str) – Either ‘classification’ or ‘regression’
 feature_scoring (str) – Which method to use to calculate featurescores with. Can be: 1) ‘fwm’: feature weight mapping [1]_  keep track of raw voxelweights (coefficients) 2) ‘forward’: transform raw voxelweights to corresponding forward model [2]_.
 confmat (bool) – Whether to keep track of the confusionmatrix across folds (only relevant for type_model=’classification’)
 verbose (bool) – Whether to print extra output.
 **metrics (keywordarguments) – Keyword arguments of the form: name_metric: metric_function; any metric from scikitlearn works (or other metrics, as long as they have two input args, y_true and y_pred).
References
[1] Stelzer, J., Buschmann, T., Lohmann, G., Margulies, D.S., Trampel, R., and Turner, R. (2014). Prioritizing spatial accuracy in highresolution fMRI data using multivariate feature weight mapping. Front. Neurosci., http://dx.doi.org/10.3389/fnins.2014.00066. [2] Haufe, S., Meineck, F., Gorger, K., Dahne, S., Haynes, JD., Blankertz, B., and Biessmann, F. et al. (2014). On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage, 87, 96110. 
load_model
(path, param=None)[source]¶ Load model or pipeline from disk.
Parameters:  path (str) – Absolute path to model.
 param (str) – Which, if any, specific param needs to be loaded.

save_model
(model, out_path)[source]¶ Method to serialize model(s) to disk.
Parameters: model (pipeline or scikitlearn object.) – Model to be saved.

update
(test_idx, y_pred, pipeline=None)[source]¶ Updates with information from current fold.
Parameters:  test_idx (ndarray) – Indices of current testtrials.
 y_pred (ndarray) – Predictions of current testtrials.
 pipeline (scikitlearn Pipeline object) – pipeline from which relevant scores/coefficients will be extracted.

write
(out_path, confmat=True, to_tstat=True, multiclass='ovr')[source]¶ Writes results to disk.
Parameters:  out_path (str) – Where to save the results to
 feature_viz (bool) – Whether to write out (and optionally return) featurevisualization information
 confmat (bool) – Whether to write out (and optionally return) the confusionmatrix (across folds).
 to_tstat (bool) – Whether to convert averaged featurescores to ttstats (by dividing them by sqrt(score.std(axis=0)).

class
MvpAverageResults
(mvp_results_list, identifiers=None)[source]¶ Bases:
object
Averages results from MVPA analyses on, for example, different subjects or different ROIs.
Parameters:  mvp_results_list (list) – List with MvpResults objects (after updating across folds)
 identifiers (list of str) – List of identifiers (e.g. subjectname) that correspond to the different MvpResults objects

cluster_size_threshold
(data, thresh=None, min_size=20, save=False)[source]¶ Removes clusters smaller than a prespecified number in a statfile.
Parameters:  data (numpyarray or str) – 3D Numpyarray with statisticvalue or a string to a path pointing to a niftifile with statistic values.
 thresh (int, float) – Initial threshold to binarize the image and extract clusters.
 min_size (int) – Minimum size (i.e. amount of voxels) of cluster. Any cluster with fewer voxels than this amount is set to zero (‘removed’).
 save (bool) – If data is a filepath, this parameter determines whether the cluster corrected file is saved to disk again.

class
PrevalenceInference
(obs, perms, P2=100000, gamma0=0.5, alpha=0.05)[source]¶ Bases:
object
Class that performs PrevalenceInference based on the paper by Allefeld, Gorgen, & Haynes (2016), NeuroImage.
Parameters:  obs (numpy ndarray) – A 2D array of shape [N (subjects) x K (voxels)], or a 1D array of shape [N, 1].
 perms (numpy ndarray) – A 3D array of shape [N (subjects) x K (voxels) x P1 (first level permutations)], or a 2D array of shape [N x P1]
 P2 (int) – Number of second level permutations to run
 gamma0 (float) – What prevalence inference null (gamma < gamma0) to test
 alpha (float) – Significance level for hypothesis testing
Examples
>>> from skbold.postproc import PrevalenceInference >>> import numpy as np >>> N, K, P1 = 20, (40, 40, 38), 15 >>> obs = np.random.normal(loc=0.55, scale=0.05, size=(N, np.prod(K))) >>> perms = np.random.normal(loc=0.5, scale=0.05, size=(N, np.prod(K), P1)) >>> pvi = PrevalenceInference(obs=obs, perms=perms, P2=100000, gamma0=05, alpha=0.05) >>> pvi.run() Running with parameters: N = 20 K = 60800 P1 = 15 P2 = 100000