CrossValidatorConfig
class fseval.config.CrossValidatorConfig(
name: str=MISSING,
splitter: Any=None,
fold: int=0,
)
Cross Validation is used to improve the reliability of the ranking and validation results. A CV method can be defined like so:
Provides an interface as how to define a Cross Validation method. The CV is applied at the beginning of the pipeline, so both the ranker and validator only get to see the training dataset. The test dataset is used for scoring, i.e. for determining the validation estimator scores.
Attributes:
name : str | Human-friendly name for this CV method. |
splitter : Any | The cross validation splitter function. Must contain a _target_ attribute which instantiates to an object that has a split method with the following signature def split(self, X, y=None, groups=None) . See BaseCrossValidator and BaseShuffleSplit. |
fold : int | The fold to use in this specific run of the pipeline. e.g. you can use python my_benchmark.py --multirun cv=kfold cv.splitter.n_spits=5 cv.fold=range(0,5) to run a complete 5-fold CV scheme. |
Available CV methods
Built-in to fseval are kfold
and train_test_split
.
K-Fold CV
cv=kfold
conf/cv/kfold.yaml
name: K-Fold
splitter:
_target_: sklearn.model_selection.KFold
n_splits: 5
shuffle: True
random_state: 0
For example: to use 10-fold CV, set cv.splitter.n_splits=10
.
Train/test split
cv=train_test_split
conf/cv/train_test_split.yaml
name: Train/test split
splitter:
_target_: sklearn.model_selection.ShuffleSplit
n_splits: 1
test_size: 0.25
random_state: 0
⚙️ Custom CV
For example, we can implement Leave One Out cross validation like so:
- YAML
- Structured Config
conf/cv/loocv.yaml
name: Leave One Out
splitter:
_target_: sklearn.model_selection.LeaveOneOut
from hydra.core.config_store import ConfigStore
loocv = CrossValidatorConfig(
name="Leave One Out",
splitter=dict(
_target_="sklearn.model_selection.LeaveOneOut",
)
)
cs = ConfigStore.instance()
cs.store(name="loocv", node=loocv, group="cv")
Which can then be used by setting cv=loocv
in the commandline.