Skip to main content

ResampleConfig

class fseval.config.ResampleConfig(
name: str=MISSING,
replace: bool=False,
sample_size: Any=None,
random_state: Optional[int]=None,
stratify: Optional[List]=None,
)

Resampling can be used to take random samples from the dataset, with- or without replacement. Resampling is performed after the CV split.

A usecase would be to perform bootstrapping. This means, that we can run the pipeline multiple times using the same configuration, but with different resamplings of the dataset. In this way, we can measure the stability of a feature ranking algorithm.

Attributes:

name : strHuman-friendly name of the resampling method.
replace : boolWhether to use resampling with replacement, yes or no.
sample_size : AnyCan be one of two types. Either a float from [0.0 to 1.0], such to select a fraction of the dataset to be sampled. Or, an int from [1 to n_samples] can be used. This is the amount of exact samples to be selected.
random_state : Optional[int]Optionally, one might fix a random state to be used in the resampling process. In this way, results can be reproduced.
stratify : Optional[List]Whether to use stratified resampling. See sklearn.utils.resample for more information.

Available resampling methods

Built-in resampling methods are shuffle and bootstrap.

Bootstrap

Takes random samples with replacement. By default, uses resamples back to the amount of original dataset samples, using sample_size=1.00.

conf/resample/bootstrap.yaml
name: Bootstrap
replace: true
sample_size: 1.00

On the command line, use with resample=bootstrap.

Shuffle

This resampling method shuffles the dataset samples.

conf/resample/shuffle.yaml
name: Shuffle
replace: false
sample_size: 1.00

On the command line, use with resample=shuffle.

⚙️ Custom resampling

Adding a custom resampling method can be done by implementing the ResampleConfig interface.

For example, define:

conf/resample/custom_bootstrap.yaml
name: Bootstrap with half of samples
replace: true
sample_size: 0.50

Which can then be used using resample=custom_bootstrap on the commandline.

Or, in the config file:

conf/my_config.yaml
defaults:
- base_pipeline_config
- _self_
- override resample: custom_bootstrap