EstimatorConfig
class fseval.config.EstimatorConfig(
name: str=MISSING,
estimator: Any=None,
load_cache: Optional[CacheUsage]=CacheUsage.allow,
save_cache: Optional[CacheUsage]=CacheUsage.allow,
_estimator_type: str=MISSING,
multioutput: bool=False,
multioutput_only: bool=False,
requires_positive_X: bool=False,
estimates_feature_importances: bool=False,
estimates_feature_support: bool=False,
estimates_feature_ranking: bool=False,
estimates_target: bool=False,
)
Configures an estimator: a Feature Ranker, Feature Selector or a validation estimator.
In the case of Feature Rankers/Selectors, set one of
estimates_feature_importances
, estimates_feature_support
or
estimates_feature_ranking
to True. In the case of a validation estimator,
set estimates_target
to True.
Attributes:
name : str | Human-friendly name of this estimator. |
estimator : Any | The estimator. Must be a dictionary with a key _target_ , pointing to the class that is to be instantiated. All other properties in the dictionary will be passed to the estimator constructor. e.g |
load_cache : Optional[CacheUsage] | How to handle loading a cached version of the estimator, in a pickle file. e.g. to ignore cache, or force using it. To be used in combination with PipelineConfig.storage . See CacheUsage. |
save_cache : Optional[CacheUsage] | How to handle saving the fit estimator as a pickle file, such to facilitate caching. To be used in combination with PipelineConfig.storage . See CacheUsage. |
_estimator_type : str | Either 'classifier', 'regressor' or 'clusterer'. See the sklearn. |
multioutput : bool | Whether this estimator supports multioutput datasets. |
multioutput_only : bool | If this estimator only supports multioutput datasets. |
requires_positive_X : bool | Whether the estimator fails if X contains negative values. |
estimates_feature_importances : bool | Whether the estimator estimates feature importances. For example, in the case of 2 features, the estimator can set self.feature_importances_ = [0.9, 0.1] , implying the estimator found the first feature the most useful. Alternatively, the coef_ attribute can also be read and interpreted as a feature importance vector. |
estimates_feature_support : bool | Whether the estimator estimates feature support. A feature support vector indicates which features to include in a feature subset yes/no. In other words, it must be a boolean vector. It is to be set on the estimator support_ attribute. Estimating the feature support vector : s synonymous with performing feature selection. e.g self.support_ = [True, False] , meaning to include only the first feature in a feature subset. |
estimates_feature_ranking : bool | Whether the estimator ranks the features in a specific order. Is similar to feature importance, but does not estimate exact importance quantities, i.e. that are proportional to each other. An estimator : an set the ranking using the ranking_ attribute. e.g self.ranking_ = [1, 0] , such to indicate that the first feature ranks the highest. |
Examples
Example config for ReliefF feature selection using skrebate:
Ranker ReliefF
- YAML
- Structured Config
conf/ranker/relieff.yaml
name: ReliefF
estimator:
_target_: skrebate.ReliefF
_estimator_type: classifier
estimates_feature_importances: true
from hydra.core.config_store import ConfigStore
from fseval.config import EstimatorConfig
cs = ConfigStore.instance()
relieff = EstimatorConfig(
name="ReliefF",
estimator=dict(
_target_="skrebate.ReliefF"
),
_estimator_type="classifier",
estimates_feature_importances=True,
)
cs.store(group="ranker", name="relieff", node=relieff)
Then use with ranker=relieff
on the commandline.
Ranker Boruta
Example config for Boruta using boruta_py:
- YAML
- Structured Config
conf/ranker/boruta.yaml
name: Boruta
estimator:
_target_: boruta.boruta_py.BorutaPy
estimator:
_target_: sklearn.ensemble.RandomForestClassifier
n_estimators: auto
_estimator_type: classifier
multioutput: false
estimates_feature_importances: false
estimates_feature_support: true
estimates_feature_ranking: true
from hydra.core.config_store import ConfigStore
from fseval.config import EstimatorConfig
cs = ConfigStore.instance()
boruta = EstimatorConfig(
name="Boruta",
estimator=dict(
_target_="boruta.boruta_py.BorutaPy",
estimator=dict(
_target_="sklearn.ensemble.RandomForestClassifier"
),
n_estimators="auto"
),
_estimator_type="classifier",
multioutput=False,
estimates_feature_importances=False,
estimates_feature_support=True,
estimates_feature_ranking=True,
)
cs.store(group="ranker", name="boruta", node=boruta)
Then use with ranker=boruta
on the commandline.
Validator k-NN
Example config for a validation estimator, e.g. k-NN:
- YAML
- Structured Config
conf/validator/knn.yaml
name: k-NN
estimator:
_target_: sklearn.neighbors.KNeighborsClassifier
_estimator_type: classifier
multioutput: false
estimates_target: true
from hydra.core.config_store import ConfigStore
from fseval.config import EstimatorConfig
cs = ConfigStore.instance()
knn = EstimatorConfig(
name="k-NN",
estimator=dict(
_target_="sklearn.neighbors.KNeighborsClassifier",
),
_estimator_type="classifier",
multioutput=False,
estimates_target=True,
)
cs.store(group="ranker", name="knn", node=knn)
Then use with validator=knn
on the commandline.
More examples
See more example definitions of rankers and validators in the repository.