skbold.preproc package¶

class
LabelFactorizer
(grouping)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Transforms labels according to a given factorial grouping.
Factorizes/encodes labels based on part of the string label. For example, the labelvector [‘A_1’, ‘A_2’, ‘B_1’, ‘B_2’] can be grouped based on letter (A/B) or number (1/2).
Parameters: grouping (List of str) – List with identifiers for condition names as strings Variables: new_labels (list) – List with new labels. 
transform
(y, X=None)[source]¶ Transforms labelvector given a grouping.
Parameters:  y (List/ndarray of str) – List of ndarray with strings indicating labelnames
 X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:  y_new (ndarray) – array with transformed ylabels
 X_new (ndarray) – array with transformed data of shape = [n_samples, n_features] given new factorial grouping/design.


class
MajorityUndersampler
(verbose=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Undersamples the majorityclass(es) by selecting random samples.
Parameters: verbose (bool) – Whether to print downsamples number of samples.

class
LabelBinarizer
(params)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin

class
ConfoundRegressor
(confound, X, cross_validate=True, stack_intercept=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Fits a confound onto each feature in X and returns their residuals.

__init__
(confound, X, cross_validate=True, stack_intercept=True)[source]¶ Regresses out a variable (confound) from each feature in X.
Parameters:  confound (numpy array) – Array of length (n_samples, n_confounds) to regress out of each feature; May have multiple columns for multiple confounds.
 X (numpy array) – Array of length (n_samples, n_features), from which the confound will be regressed. This is used to determine how the confoundmodels should be crossvalidated (which is necessary to use in in scikitlearn Pipelines).
 cross_validate (bool) – Whether to crossvalidate the confoundparameters (y~confound) estimated from the trainset to the test set (cross_validate=True) or whether to fit the confound regressor separately on the testset (cross_validate=False); we recommend setting this to True to get an unbiased estimate.
 stack_intercept (bool) – Whether to stack an intercept to the confound (default is True)
Variables: weights (numpy array) – Array with weights for the confound(s).
