ResampleConfig
class fseval.config.ResampleConfig(
name: str=MISSING,
replace: bool=False,
sample_size: Any=None,
random_state: Optional[int]=None,
stratify: Optional[List]=None,
)
Resampling can be used to take random samples from the dataset, with- or without replacement. Resampling is performed after the CV split.
A usecase would be to perform bootstrapping. This means, that we can run the pipeline multiple times using the same configuration, but with different resamplings of the dataset. In this way, we can measure the stability of a feature ranking algorithm.
Attributes:
name : str | Human-friendly name of the resampling method. |
replace : bool | Whether to use resampling with replacement, yes or no. |
sample_size : Any | Can be one of two types. Either a float from [0.0 to 1.0], such to select a fraction of the dataset to be sampled. Or, an int from [1 to n_samples] can be used. This is the amount of exact samples to be selected. |
random_state : Optional[int] | Optionally, one might fix a random state to be used in the resampling process. In this way, results can be reproduced. |
stratify : Optional[List] | Whether to use stratified resampling. See sklearn.utils.resample for more information. |
Available resampling methods
Built-in resampling methods are shuffle
and bootstrap
.
Bootstrap
Takes random samples with replacement. By default, uses resamples back to the amount of original dataset samples, using sample_size=1.00
.
name: Bootstrap
replace: true
sample_size: 1.00
On the command line, use with resample=bootstrap
.
Shuffle
This resampling method shuffles the dataset samples.
name: Shuffle
replace: false
sample_size: 1.00
On the command line, use with resample=shuffle
.
⚙️ Custom resampling
Adding a custom resampling method can be done by implementing the ResampleConfig
interface.
For example, define:
- YAML
- Structured Config
name: Bootstrap with half of samples
replace: true
sample_size: 0.50
from hydra.core.config_store import ConfigStore
from fseval.config import ResampleConfig
cs = ConfigStore.instance()
cs.store(
group="resample",
name="custom_bootstrap",
node=ResampleConfig(
name="Bootstrap with half of samples",
replace=True,
sample_size=0.50,
),
)
Which can then be used using resample=custom_bootstrap
on the commandline.
Or, in the config file:
defaults:
- base_pipeline_config
- _self_
- override resample: custom_bootstrap