Skip to main content

Motivation

fseval helps you benchmark Feature Selection and Feature Ranking algorithms. Any algorithm that ranks features in importance.

It comes useful if you are one of the following types of users:

  1. Feature Selection / Feature Ranker algorithm authors. You are the author of a novel Feature Selection algorithm. Now, you have to prove the performance of your algorithm against other competitors. Therefore, you are going to run a large-scale benchmark. Many authors, however, spend much time rewriting similar pipelines to benchmark their algorithms. fseval helps you run benchmarks in a structured manner, on supercomputer clusters or on the cloud.
  2. Interpretable AI method authors. You wrote a new Interpretable AI method that aims to find out which features are most meaningful by ranking them. Now, the challenge is to find out how well your method ranked those features. fseval can help with this.
  3. Machine Learning practitioners. You have a dataset and want to find out with exactly what features your models will perform best. You can use fseval to try multiple Feature Selection or Feature Ranking algorithms.

Statement of Need

About benchmarking Feature Selection and Feature Ranking algorithms

Feature Selection (FS) and Feature Ranking (FR) are extensively researched topics within machine learning (Venkatesh et al, 2019, Guyon et al, 2003). FS methods determine subsets of relevant features in a dataset, whereas FR methods rank the features in a dataset relative to each other in terms of their relevance. When a new FS or FR method is developed, a benchmarking scheme is necessary to empirically validate its effectiveness. Often, the benchmark is conducted as follows: features are ranked by importance, then the predictive quality of the feature subsets containing the top ranked features is evaluated using a validation estimator. Some studies let the competing FS or FR algorithms pick out a fixed number of top k features and validate the performance of that feature subset (Roffo et al, 2015, Zhao et al, 2007, Bradley et al, 1998), whilst others evaluate multiple subsets of increasing cardinality containing the highest ranked features (Wojtas et al, 2022, Bennasar et al, 2015, Gu et al, 2012, Peng et al, 2005, Kira et al, 2005, Almuallim et al, 1991). FS algorithms that only make a binary prediction on which features to keep, are always evaluated in the former way.

There is a clear case for performing Feature Selection, as it has been shown to improve classification performance in many tasks, especially those with a large number of features and limited observations. In those applications, it is difficult to determine which FS method is suitable in the general case. Therefore, large empirical comparisons of several FS methods and classifiers are routinely performed. For instance, in microarray data (Cilia et al, 2019), medical imaging (Sun et al, 2019, Tohka et al, 2016, Ashok et al, 2016), and text classification (Liu et al, 2017, Kou et al, 2020). Therefore, it is valuable to find out emperically which FR- or FS method works best. This requires running a benchmark to do so.

fseval is an open-source Python package that helps researchers perform such benchmarks efficiently by eliminating the need for implementing benchmarking pipelines from scratch to test new methods. The pipeline only requires a well-defined configuration file to run - the rest of the pipeline is automatically executed. Because the entire experiment setup is deterministic and captured in a configuration file, results of any experiment can be reproduced given the configuration file. This can be very convenient to researchers in order to prove the integrity of their benchmarks.

  • The target audiences are researchers in the domains of Feature Selection and Feature Ranking, as well as businesses that are looking for the best FR- or FS method to use for their use case.
  • The scope of fseval is limited to handle tabular datasets for the classification and regression objectives.

Key features 🚀

Most importantly, fseval has the following in store for you.

  • Easily benchmark Feature Ranking algorithms
  • Built on Hydra
  • Support for distributed systems (SLURM through the Submitit launcher, AWS support through the Ray launcher)
  • Reproducible experiments (your entire experiment can be described and reproduced by 1 YAML file)
  • Send experiment results directly to a dashboard (integration with Weights and Biases is built-in)
  • Export your data to any SQL database (integration with SQLAlchemy)