Interpretable Models via Pairwise Permutations Algorithm

Troy Maasland, João Pereira, Diogo Bastos, Marcus de Goffau, Max Nieuwdorp, Aeilko H. Zwinderman, Evgeni Levin

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)

Abstract

One of the most common pitfalls often found in high dimensional biological data sets are correlations between the features. This may lead to statistical and machine learning methodologies overvaluing or undervaluing these correlated predictors, while the truly relevant ones are ignored. In this paper, we will define a new method called pairwise permutation algorithm (PPA) with the aim of mitigating the correlation bias in feature importance values. Firstly, we provide a theoretical foundation, which builds upon previous work on permutation importance. PPA is then applied to a toy data set, where we demonstrate its ability to correct the correlation effect. We further test PPA on a microbiome shotgun dataset, to show that the PPA is already able to obtain biological relevant biomarkers.
Original languageEnglish
Title of host publicationMachine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2021, Proceedings
EditorsMichael Kamp, Irena Koprinska, Adrien Bibal, Tassadit Bouadi, Benoît Frénay, Luis Galárraga, José Oramas, Linara Adilova, Yamuna Krishnamurthy, Bo Kang, Christine Largeron, Jefrey Lijffijt, Tiphaine Viard, Pascal Welke, Massimiliano Ruocco, Erlend Aune, Claudio Gallicchio, Gregor Schiele, Franz Pernkopf, Michaela Blott, Holger Fröning, Günther Schindler, Riccardo Guidotti, Anna Monreale, Salvatore Rinzivillo, Przemyslaw Biecek, Eirini Ntoutsi, Mykola Pechenizkiy, Bodo Rosenhahn, Christopher Buckley, Daniela Cialfi, Pablo Lanillos, Maxwell Ramstead, Tim Verbelen, Pedro M. Ferreira, Giuseppina Andresini, Donato Malerba, Ibéria Medeiros, Philippe Fournier-Viger, M. Saqib Nawaz, Sebastian Ventura, Meng Sun, Min Zhou, Valerio Bitetta, Ilaria Bordino, Andrea Ferretti, Francesco Gullo, Giovanni Ponti, Lorenzo Severini, Rita Ribeiro, João Gama, Ricard Gavaldà, Lee Cooper, Naghmeh Ghazaleh, Jonas Richiardi, Damian Roqueiro, Diego Saldana Miranda, Konstantinos Sechidis, Guilherme Graça
PublisherSpringer Science and Business Media Deutschland GmbH
Pages15-25
Volume1524 CCIS
ISBN (Print)9783030937355
DOIs
Publication statusPublished - 2021
Event21st European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021 - Virtual, Online
Duration: 13 Sept 202117 Sept 2021

Publication series

NameCommunications in Computer and Information Science

Conference

Conference21st European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021
CityVirtual, Online
Period13/09/202117/09/2021

Cite this