skbold.preproc package¶

class LabelFactorizer(grouping)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Transforms labels according to a given factorial grouping.

Factorizes/encodes labels based on part of the string label. For example, the label-vector [‘A_1’, ‘A_2’, ‘B_1’, ‘B_2’] can be grouped based on letter (A/B) or number (1/2).

Parameters:	grouping (List of str) – List with identifiers for condition names as strings
Variables:	new_labels (list) – List with new labels.

fit(y=None, X=None)[source]¶: Does nothing, but included to be used in sklearn’s Pipeline.

get_new_labels()[source]¶: Returns new labels based on factorization.

transform(y, X=None)[source]¶

Transforms label-vector given a grouping.

Parameters:

y (List/ndarray of str) – List of ndarray with strings indicating label-names
X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]

Returns:

y_new (ndarray) – array with transformed y-labels
X_new (ndarray) – array with transformed data of shape = [n_samples, n_features] given new factorial grouping/design.

class MajorityUndersampler(verbose=False)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Undersamples the majority-class(es) by selecting random samples.

Parameters:	verbose (bool) – Whether to print downsamples number of samples.

__init__(verbose=False)[source]¶: Initializes MajorityUndersampler object.

fit(X=None, y=None)[source]¶: Does nothing, but included for scikit-learn pipelines.

transform(X, y)[source]¶

Downsamples majority-class(es).

Parameters:	X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:	X – Transformed array of shape = [n_samples, n_features] given the indices calculated during fit().
Return type:	ndarray

class LabelBinarizer(params)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(params)[source]¶: Initializes LabelBinarizer object.

fit(X=None, y=None)[source]¶: Does nothing, but included for scikit-learn pipelines.

transform(X, y)[source]¶

Binarizes y-attribute.

Parameters:	X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:	X – Transformed array of shape = [n_samples, n_features] given the indices calculated during fit().
Return type:	ndarray

class ConfoundRegressor(confound, X, cross_validate=True, stack_intercept=True)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Fits a confound onto each feature in X and returns their residuals.

__init__(confound, X, cross_validate=True, stack_intercept=True)[source]¶

Regresses out a variable (confound) from each feature in X.

Parameters:

confound (numpy array) – Array of length (n_samples, n_confounds) to regress out of each feature; May have multiple columns for multiple confounds.
X (numpy array) – Array of length (n_samples, n_features), from which the confound will be regressed. This is used to determine how the confound-models should be cross-validated (which is necessary to use in in scikit-learn Pipelines).
cross_validate (bool) – Whether to cross-validate the confound-parameters (y~confound) estimated from the train-set to the test set (cross_validate=True) or whether to fit the confound regressor separately on the test-set (cross_validate=False); we recommend setting this to True to get an unbiased estimate.
stack_intercept (bool) – Whether to stack an intercept to the confound (default is True)

Variables:

weights (numpy array) – Array with weights for the confound(s).

fit(X, y=None)[source]¶

Fits the confound-regressor to X.

Parameters:	X (numpy array) – An array of shape (n_samples, n_features), which should correspond to your train-set only! y (None) – Included for compatibility; does nothing.

transform(X)[source]¶

Regresses out confound from X.

Parameters:	X (numpy array) – An array of shape (n_samples, n_features), which should correspond to your train-set only!
Returns:	X_new – ndarray with confound-regressed features
Return type:	ndarray