Skip to main content

EstimatorConfig

class fseval.config.EstimatorConfig(
name: str=MISSING,
estimator: Any=None,
load_cache: Optional[CacheUsage]=CacheUsage.allow,
save_cache: Optional[CacheUsage]=CacheUsage.allow,
_estimator_type: str=MISSING,
multioutput: bool=False,
multioutput_only: bool=False,
requires_positive_X: bool=False,
estimates_feature_importances: bool=False,
estimates_feature_support: bool=False,
estimates_feature_ranking: bool=False,
estimates_target: bool=False,
)

Configures an estimator: a Feature Ranker, Feature Selector or a validation estimator.

In the case of Feature Rankers/Selectors, set one of estimates_feature_importances, estimates_feature_support or estimates_feature_ranking to True. In the case of a validation estimator, set estimates_target to True.

Attributes:

name : strHuman-friendly name of this estimator.
estimator : AnyThe estimator. Must be a dictionary with a key _target_, pointing to the class that is to be instantiated. All other properties in the dictionary will be passed to the estimator constructor. e.g
load_cache : Optional[CacheUsage]How to handle loading a cached version of the estimator, in a pickle file. e.g. to ignore cache, or force using it. To be used in combination with PipelineConfig.storage. See CacheUsage.
save_cache : Optional[CacheUsage]How to handle saving the fit estimator as a pickle file, such to facilitate caching. To be used in combination with PipelineConfig.storage. See CacheUsage.
_estimator_type : strEither 'classifier', 'regressor' or 'clusterer'. See the sklearn.
multioutput : boolWhether this estimator supports multioutput datasets.
multioutput_only : boolIf this estimator only supports multioutput datasets.
requires_positive_X : boolWhether the estimator fails if X contains negative values.
estimates_feature_importances : boolWhether the estimator estimates feature importances. For example, in the case of 2 features, the estimator can set self.feature_importances_ = [0.9, 0.1], implying the estimator found the first feature the most useful. Alternatively, the coef_ attribute can also be read and interpreted as a feature importance vector.
estimates_feature_support : boolWhether the estimator estimates feature support. A feature support vector indicates which features to include in a feature subset yes/no. In other words, it must be a boolean vector. It is to be set on the estimator support_ attribute. Estimating the feature support vector : s synonymous with performing feature selection. e.g self.support_ = [True, False], meaning to include only the first feature in a feature subset.
estimates_feature_ranking : boolWhether the estimator ranks the features in a specific order. Is similar to feature importance, but does not estimate exact importance quantities, i.e. that are proportional to each other. An estimator : an set the ranking using the ranking_ attribute. e.g self.ranking_ = [1, 0], such to indicate that the first feature ranks the highest.

Examples

Example config for ReliefF feature selection using skrebate:

Ranker ReliefF

conf/ranker/relieff.yaml
name: ReliefF
estimator:
_target_: skrebate.ReliefF
_estimator_type: classifier
estimates_feature_importances: true

Then use with ranker=relieff on the commandline.

Ranker Boruta

Example config for Boruta using boruta_py:

conf/ranker/boruta.yaml
name: Boruta
estimator:
_target_: boruta.boruta_py.BorutaPy
estimator:
_target_: sklearn.ensemble.RandomForestClassifier
n_estimators: auto
_estimator_type: classifier
multioutput: false
estimates_feature_importances: false
estimates_feature_support: true
estimates_feature_ranking: true

Then use with ranker=boruta on the commandline.

Validator k-NN

Example config for a validation estimator, e.g. k-NN:

conf/validator/knn.yaml
name: k-NN
estimator:
_target_: sklearn.neighbors.KNeighborsClassifier
_estimator_type: classifier
multioutput: false
estimates_target: true

Then use with validator=knn on the commandline.

More examples

See more example definitions of rankers and validators in the repository.