Skip to main content

CrossValidatorConfig

class fseval.config.CrossValidatorConfig(
name: str=MISSING,
splitter: Any=None,
fold: int=0,
)

Cross Validation is used to improve the reliability of the ranking and validation results. A CV method can be defined like so:

Provides an interface as how to define a Cross Validation method. The CV is applied at the beginning of the pipeline, so both the ranker and validator only get to see the training dataset. The test dataset is used for scoring, i.e. for determining the validation estimator scores.

Attributes:

name : strHuman-friendly name for this CV method.
splitter : AnyThe cross validation splitter function. Must contain a _target_ attribute which instantiates to an object that has a split method with the following signature def split(self, X, y=None, groups=None). See BaseCrossValidator and BaseShuffleSplit.
fold : intThe fold to use in this specific run of the pipeline. e.g. you can use python my_benchmark.py --multirun cv=kfold cv.splitter.n_spits=5 cv.fold=range(0,5) to run a complete 5-fold CV scheme.

Available CV methods

Built-in to fseval are kfold and train_test_split.

K-Fold CV

cv=kfold

conf/cv/kfold.yaml
name: K-Fold
splitter:
_target_: sklearn.model_selection.KFold
n_splits: 5
shuffle: True
random_state: 0

For example: to use 10-fold CV, set cv.splitter.n_splits=10.

Train/test split

cv=train_test_split

conf/cv/train_test_split.yaml
name: Train/test split
splitter:
_target_: sklearn.model_selection.ShuffleSplit
n_splits: 1
test_size: 0.25
random_state: 0

⚙️ Custom CV

For example, we can implement Leave One Out cross validation like so:

conf/cv/loocv.yaml
name: Leave One Out
splitter:
_target_: sklearn.model_selection.LeaveOneOut

Which can then be used by setting cv=loocv in the commandline.