skbold.feature_selection package

The transformer subpackage provides several scikit-learn style transformers that perform feature selection and/or extraction of multivoxel fMRI patterns. Most of them are specifically constructed with fMRI data in mind, and thus often need an Mvp object during initialization to extract necessary metadata. All comply with the scikit-learn API, using fit() and transform() methods.

class GenericUnivariateSelect(score_func=<function f_classif>, mode='percentile', param=1e-05)[source]

Bases: sklearn.feature_selection.univariate_selection._BaseFilter

Univariate feature selector with configurable strategy.

Updated version from scikit-learn: http://scikit-learn.org/`.

Parameters:
  • score_func (callable) – Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). For modes ‘percentile’ or ‘kbest’ it can return a single array scores.
  • mode ({'percentile', 'k_best', 'fpr', 'fdr', 'fwe', 'cutoff'}) – Feature selection mode.
  • param (float or int depending on the feature selection mode) – Parameter of the corresponding mode.
Variables:
  • scores (array-like, shape=(n_features,)) – Scores of features.
  • pvalues (array-like, shape=(n_features,)) – p-values of feature scores, None if score_func returned scores only.
class SelectAboveCutoff(cutoff, score_func=<function f_classif>)[source]

Bases: sklearn.feature_selection.univariate_selection._BaseFilter

Filter: Select features with a score above some cutoff.

Parameters:
  • cutoff (int/float) – Cutoff for feature-scores to be selected.
  • score_func (callable) – Function that takes a 2D array X (samples x features) and returns a score reflecting a univariate difference (higher is better).
fisher_criterion_score(X, y, norm='l1', balance=False)[source]

Calculates fisher score.

See [1]_ for more info.

References

[1] P. E. H. R. O. Duda and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2001.

Parameters:
  • X ({array-like, sparse matrix} shape = (n_samples, n_features)) – The set of regressors that will be tested sequentially.
  • y (array of shape(n_samples)) – The data matrix
  • norm (str) – Whether to use the l1-norm or l2-norm.
Returns:

scores_ – Fisher criterion scores for each feature.

Return type:

array, shape=(n_features,)

class IncrementalFeatureCombiner(scores, cutoff)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Indexes a set of features with a number of (sorted) features.

Parameters:
  • scores (ndarray) – Array of shape = n_features, or [n_features, n_class] in case of soft/hard voting in, e.g., a roi_stacking_classifier (see classifiers.roi_stacking_classifier).
  • cutoff (int or float) – If int, it refers the absolute number of features included, sorted from high to low (w.r.t. scores). If float, it selects a proportion of features.
fit(X, y=None)[source]

Fits IncrementalFeatureCombiner transformer.

Parameters:X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
transform(X, y=None)[source]

Transforms a pattern (X) given the indices calculated during fit().

Parameters:X (ndarray) – Numeric (float) array of shape = [n_samples, n_features]
Returns:X – Transformed array of shape = [n_samples, n_features] given the indices calculated during fit().
Return type:ndarray