Getting started

To get started, there's two main resources.

A Google Colab
The ⚡️ Quick start guide below 👇🏻

⚡️ Quick start

Let's run our first experiment. The goal will be to compare two feature selectors ANOVA F-Value and Mutual Info.

First, install fseval:

pip install fseval

Installing from source

If for some reason you would not like to install fseval from the PyPi package index using pip install like above, you can also install fseval right from its Git source. Execute the following:

git clone https://github.com/dunnkers/fseval.git
cd fseval
pip install -r requirements.txt
pip install .

You should now be able to continue in the same way as before ✓.

Now you can decide whether you want to define your configuration in YAML or in Python. Choose whatever you find most convenient.

YAML
Structured Config

Download the example configuration: quick-start-yaml.zip

Then, cd into the example directory. You should now have the following files:

Download the example configuration: quick-start-structured-configs.zip

You should now have the following files:

We can now decide how to export the results. We can upload our results to a live SQL database. For now, let's choose a local database. SQLite is good for this.

sql_con=sqlite:////Users/dunnkers/Downloads/results.sqlite # any well-defined database URL

Relative vs absolute paths

If you define a relative database URL, like sql_con=sqlite:///./results.sqlite, the results will be saved right where Hydra stores its individual run files. In other words, multiple .sqlite files are stored in the ./multirun subfolders.

To prevent this, and store all results in 1 .sqlite file, use an absolute path, like above. But preferably, you are using a proper running database - see the recipes for more instructions on this.

We are now ready to run an experiment. In a terminal, cd into the unzipped example directory and run the following:

python benchmark.py --multirun ranker='glob(*)' +callbacks.to_sql.url=$sql_con

And our experiment starts running 👏🏻!

Using --multirun combined with ranker='glob(*)' we instructed fseval to run experiments for all available rankers. The results are now stored in a SQLite database.

$ tree ~/Downloads
/Users/dunnkers/Downloads
└── results.sqlite

0 directories, 1 file

We can open the data using DB Browser for SQLite. We can access the validation scores in the validation_scores table:

validation data

In the example above, the graph plots the feature subset size (n_features_to_select) vs. classification accuracy (score).

For our two feature selectors, ANOVA F value vs. Mutual Info, we can now see which gets the highest classification accuracy with which feature subset. Using fseval, we can easily compare many feature- selectors or rankers, and at a large scale 🙏🏻.

Getting started

⚡️ Quick start​

⚡️ Quick start