skbold.preproc package

class LabelFactorizer(grouping)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Transforms labels according to a given factorial grouping.

Factorizes/encodes labels based on part of the string label. For example, the label-vector [‘A_1’, ‘A_2’, ‘B_1’, ‘B_2’] can be grouped based on letter (A/B) or number (1/2).

Parameters:grouping (List of str) – List with identifiers for condition names as strings
Variables:new_labels (list) – List with new labels.
fit(y=None, X=None)[source]

Does nothing, but included to be used in sklearn’s Pipeline.

get_new_labels()[source]

Returns new labels based on factorization.

transform(y, X=None)[source]

Transforms label-vector given a grouping.

Parameters:
  • y (List/ndarray of str) – List of ndarray with strings indicating label-names
  • X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:

  • y_new (ndarray) – array with transformed y-labels
  • X_new (ndarray) – array with transformed data of shape = [n_samples, n_features] given new factorial grouping/design.

class MajorityUndersampler(verbose=False)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Undersamples the majority-class(es) by selecting random samples.

Parameters:verbose (bool) – Whether to print downsamples number of samples.
__init__(verbose=False)[source]

Initializes MajorityUndersampler object.

fit(X=None, y=None)[source]

Does nothing, but included for scikit-learn pipelines.

transform(X, y)[source]

Downsamples majority-class(es).

Parameters:X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:X – Transformed array of shape = [n_samples, n_features] given the indices calculated during fit().
Return type:ndarray
class LabelBinarizer(params)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

__init__(params)[source]

Initializes LabelBinarizer object.

fit(X=None, y=None)[source]

Does nothing, but included for scikit-learn pipelines.

transform(X, y)[source]

Binarizes y-attribute.

Parameters:X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:X – Transformed array of shape = [n_samples, n_features] given the indices calculated during fit().
Return type:ndarray
class ConfoundRegressor(confound, X, cross_validate=True, stack_intercept=True)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Fits a confound onto each feature in X and returns their residuals.

__init__(confound, X, cross_validate=True, stack_intercept=True)[source]

Regresses out a variable (confound) from each feature in X.

Parameters:
  • confound (numpy array) – Array of length (n_samples, n_confounds) to regress out of each feature; May have multiple columns for multiple confounds.
  • X (numpy array) – Array of length (n_samples, n_features), from which the confound will be regressed. This is used to determine how the confound-models should be cross-validated (which is necessary to use in in scikit-learn Pipelines).
  • cross_validate (bool) – Whether to cross-validate the confound-parameters (y~confound) estimated from the train-set to the test set (cross_validate=True) or whether to fit the confound regressor separately on the test-set (cross_validate=False); we recommend setting this to True to get an unbiased estimate.
  • stack_intercept (bool) – Whether to stack an intercept to the confound (default is True)
Variables:

weights (numpy array) – Array with weights for the confound(s).

fit(X, y=None)[source]

Fits the confound-regressor to X.

Parameters:
  • X (numpy array) – An array of shape (n_samples, n_features), which should correspond to your train-set only!
  • y (None) – Included for compatibility; does nothing.
transform(X)[source]

Regresses out confound from X.

Parameters:X (numpy array) – An array of shape (n_samples, n_features), which should correspond to your train-set only!
Returns:X_new – ndarray with confound-regressed features
Return type:ndarray