skbold.preproc package¶
-
class
LabelFactorizer
(grouping)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Transforms labels according to a given factorial grouping.
Factorizes/encodes labels based on part of the string label. For example, the label-vector [‘A_1’, ‘A_2’, ‘B_1’, ‘B_2’] can be grouped based on letter (A/B) or number (1/2).
Parameters: grouping (List of str) – List with identifiers for condition names as strings Variables: new_labels (list) – List with new labels. -
transform
(y, X=None)[source]¶ Transforms label-vector given a grouping.
Parameters: - y (List/ndarray of str) – List of ndarray with strings indicating label-names
- X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns: - y_new (ndarray) – array with transformed y-labels
- X_new (ndarray) – array with transformed data of shape = [n_samples, n_features] given new factorial grouping/design.
-
-
class
MajorityUndersampler
(verbose=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Undersamples the majority-class(es) by selecting random samples.
Parameters: verbose (bool) – Whether to print downsamples number of samples.
-
class
LabelBinarizer
(params)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
-
class
ConfoundRegressor
(confound, X, cross_validate=True, stack_intercept=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Fits a confound onto each feature in X and returns their residuals.
-
__init__
(confound, X, cross_validate=True, stack_intercept=True)[source]¶ Regresses out a variable (confound) from each feature in X.
Parameters: - confound (numpy array) – Array of length (n_samples, n_confounds) to regress out of each feature; May have multiple columns for multiple confounds.
- X (numpy array) – Array of length (n_samples, n_features), from which the confound will be regressed. This is used to determine how the confound-models should be cross-validated (which is necessary to use in in scikit-learn Pipelines).
- cross_validate (bool) – Whether to cross-validate the confound-parameters (y~confound) estimated from the train-set to the test set (cross_validate=True) or whether to fit the confound regressor separately on the test-set (cross_validate=False); we recommend setting this to True to get an unbiased estimate.
- stack_intercept (bool) – Whether to stack an intercept to the confound (default is True)
Variables: weights (numpy array) – Array with weights for the confound(s).
-