skbold.preproc package¶

class
LabelFactorizer
(grouping)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Transforms labels according to a given factorial grouping.
Factorizes/encodes labels based on part of the string label. For example, the labelvector [‘A_1’, ‘A_2’, ‘B_1’, ‘B_2’] can be grouped based on letter (A/B) or number (1/2).
Parameters: grouping (List of str) – List with identifiers for condition names as strings Variables: new_labels (list) – List with new labels. 
transform
(y, X=None)[source]¶ Transforms labelvector given a grouping.
Parameters:  y (List/ndarray of str) – List of ndarray with strings indicating labelnames
 X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:  y_new (ndarray) – array with transformed ylabels
 X_new (ndarray) – array with transformed data of shape = [n_samples, n_features] given new factorial grouping/design.


class
MajorityUndersampler
(verbose=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Undersamples the majorityclass(es) by selecting random samples.
Parameters: verbose (bool) – Whether to print downsamples number of samples.

class
LabelBinarizer
(params)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin

class
ConfoundRegressor
(confound, X, cross_validate=True, precise=False, stack_intercept=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Fits a confound onto each feature in X and returns their residuals.

__init__
(confound, X, cross_validate=True, precise=False, stack_intercept=True)[source]¶ Regresses out a variable (confound) from each feature in X.
Parameters:  confound (numpy array) – Array of length (n_samples, n_confounds) to regress out of each feature; May have multiple columns for multiple confounds.
 X (numpy array) – Array of length (n_samples, n_features), from which the confound will be regressed. This is used to determine how the confoundmodels should be crossvalidated (which is necessary to use in in scikitlearn Pipelines).
 cross_validate (bool) – Whether to crossvalidate the confoundparameters (y~confound) estimated from the trainset to the test set (cross_validate=True) or whether to fit the confound regressor separately on the testset (cross_validate=False). Setting this parameter to True is equivalent to “foldwise confound regression” (FwCR) as described in our paper (https://www.biorxiv.org/content/early/2018/03/28/290684). Setting this parameter to False, however, is NOT equivalent to “whole dataset confound regression” (WDCR) as it does not apply confound regression to the full dataset, but simply refits the confound model on the testset. We recommend setting this parameter to True.
 precise (bool) – Transformerobjects in scikitlearn only allow to pass the data (X) and optionally the target (y) to the fit and transform methods. However, we need to index the confound accordingly as well. To do so, we compare the X during initialization (self.X) with the X passed to fit/transform. As such, we can infer which samples are passed to the methods and index the confound accordingly. When setting precise to True, the arrays are compared featurewise, which is accurate, but relatively slow. When setting precise to False, it will infer the index by looking at the sum of all the features, which is less accurate, but much faster. For dense data, this should work just fine. Also, to aid the accuracy, we remove the features which are constant (0) across samples.
 stack_intercept (bool) – Whether to stack an intercept to the confound (default is True)
Variables: weights (numpy array) – Array with weights for the confound(s).
