PipelineConfig

class fseval.config.PipelineConfig(
    dataset: DatasetConfig=MISSING,   
    cv: CrossValidatorConfig=MISSING,
    resample: ResampleConfig=MISSING,
    ranker: EstimatorConfig=MISSING,
    validator: EstimatorConfig=MISSING,
    storage: StorageConfig=MISSING,
    callbacks: Dict[str, Any]=field(default_factory=lambda: {}),
    metrics: Dict[str, Any]=field(default_factory=lambda: {}),
    n_bootstraps: int=1,
    n_jobs: Optional[int]=1,
    all_features_to_select: str="range(1, min(50, p) + 1)",
    defaults: List[Any] = field(
        default_factory=lambda: [
            "_self_",
            {"dataset": MISSING},
            {"cv": "kfold"},
            {"resample": "shuffle"},
            {"ranker": MISSING},
            {"validator": MISSING},
            {"storage": "local"},
            {"callbacks": []},
            {"metrics": ["feature_importances", "ranking_scores", "validation_scores"]},
            {"override hydra/job_logging": "colorlog"},
            {"override hydra/hydra_logging": "colorlog"},
        ]
    )
)

The complete configuration needed to run the fseval pipeline.

Attributes:


`dataset` : DatasetConfig	Determines the dataset to use for this experiment.
`cv` : CrossValidatorConfig	The CV method and split to use in this experiment.
`resample` : ResampleConfig	Dataset resampling; e.g. with or without replacement.
`ranker` : EstimatorConfig	A Feature Ranker or Feature Selector.
`validator` : EstimatorConfig	Some estimator to validate the feature subsets.
`storage` : StorageConfig	A storage method used to store the fit estimators.
`callbacks` : Dict[str, Any]	Callbacks. Provide hooks for storing the config or results.
`metrics` : Dict[str, Any]	Metrics allow custom computation after any pipeline stage.
`n_bootstraps` : int	Amount of 'bootstraps' to run. A bootstrap means running the pipeline again but with a resampled (see `resample`) version of the dataset. This allows estimating stability, for example.
`n_jobs` : Optional[int]	Amount of CPU's to use for computing each bootstrap. This thus distributes the amount of bootstraps over CPU's.
`all_features_to_select` : str	Determines the feature subsets to validate with the validation estimator. The format of this parameter is a string that can contain an arbitrary Python expression, that must evaluate to a `List[int]` object. Each number in the list is passed to the `sklearn.feature_selection.SelectFromModel` as the `max_features` parameter. For example: `all_features_to_select="[1, 2]"` means two feature subsets are evaluated with the validation estimator - the first with only the highest ranked feature and the second with the two highest ranked features. For example: `all_features_to_select="range(1, p + 1)"` means that all feature subsets are evaluated. By default, this parameter is set to `all_features_to_select="range(1, min(50, p) + 1)"`, meaning at most 50 subsets containing the highest ranked features are validated.
`defaults` : List[Any]	Default values for the above. See Hydra docs on Defaults List.

Experiments can be configured in two ways.

Using YAML files stored in a directory
Using Python (Structured Configs)

Examples

YAML
Structured Config

conf/my_config.yaml
defaults:
  - base_pipeline_config
  - _self_
  - override dataset: synthetic
  - override validator: knn
  - override /callbacks:
      - to_sql

n_bootstraps: 1

conf/my_config.py
from omegaconf import MISSING

from fseval.config import PipelineConfig

# To set PipelineConfig defaults in a Structured Config, we must redefine the entire
# defaults list.
my_config = PipelineConfig(
    n_bootstraps=1,
    defaults=[
        "_self_",
        {"dataset": "synthetic"},
        {"cv": "kfold"},
        {"resample": "shuffle"},
        {"ranker": MISSING},
        {"validator": "knn"},
        {"storage": "local"},
        {"callbacks": ["to_sql"]},
        {"metrics": ["feature_importances", "ranking_scores", "validation_scores"]},
        {"override hydra/job_logging": "colorlog"},
        {"override hydra/hydra_logging": "colorlog"},
    ],
)

Using the override keyword is required when overriding a config group. See more here.

PipelineConfig

Examples​

Examples