TY - GEN
T1 - Interpretable Models via Pairwise Permutations Algorithm
AU - Maasland, Troy
AU - Pereira, João
AU - Bastos, Diogo
AU - de Goffau, Marcus
AU - Nieuwdorp, Max
AU - Zwinderman, Aeilko H.
AU - Levin, Evgeni
PY - 2021
Y1 - 2021
N2 - One of the most common pitfalls often found in high dimensional biological data sets are correlations between the features. This may lead to statistical and machine learning methodologies overvaluing or undervaluing these correlated predictors, while the truly relevant ones are ignored. In this paper, we will define a new method called pairwise permutation algorithm (PPA) with the aim of mitigating the correlation bias in feature importance values. Firstly, we provide a theoretical foundation, which builds upon previous work on permutation importance. PPA is then applied to a toy data set, where we demonstrate its ability to correct the correlation effect. We further test PPA on a microbiome shotgun dataset, to show that the PPA is already able to obtain biological relevant biomarkers.
AB - One of the most common pitfalls often found in high dimensional biological data sets are correlations between the features. This may lead to statistical and machine learning methodologies overvaluing or undervaluing these correlated predictors, while the truly relevant ones are ignored. In this paper, we will define a new method called pairwise permutation algorithm (PPA) with the aim of mitigating the correlation bias in feature importance values. Firstly, we provide a theoretical foundation, which builds upon previous work on permutation importance. PPA is then applied to a toy data set, where we demonstrate its ability to correct the correlation effect. We further test PPA on a microbiome shotgun dataset, to show that the PPA is already able to obtain biological relevant biomarkers.
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85126198448&origin=inward
U2 - https://doi.org/10.1007/978-3-030-93736-2_2
DO - https://doi.org/10.1007/978-3-030-93736-2_2
M3 - Conference contribution
SN - 9783030937355
VL - 1524 CCIS
T3 - Communications in Computer and Information Science
SP - 15
EP - 25
BT - Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2021, Proceedings
A2 - Kamp, Michael
A2 - Koprinska, Irena
A2 - Bibal, Adrien
A2 - Bouadi, Tassadit
A2 - Frénay, Benoît
A2 - Galárraga, Luis
A2 - Oramas, José
A2 - Adilova, Linara
A2 - Krishnamurthy, Yamuna
A2 - Kang, Bo
A2 - Largeron, Christine
A2 - Lijffijt, Jefrey
A2 - Viard, Tiphaine
A2 - Welke, Pascal
A2 - Ruocco, Massimiliano
A2 - Aune, Erlend
A2 - Gallicchio, Claudio
A2 - Schiele, Gregor
A2 - Pernkopf, Franz
A2 - Blott, Michaela
A2 - Fröning, Holger
A2 - Schindler, Günther
A2 - Guidotti, Riccardo
A2 - Monreale, Anna
A2 - Rinzivillo, Salvatore
A2 - Biecek, Przemyslaw
A2 - Ntoutsi, Eirini
A2 - Pechenizkiy, Mykola
A2 - Rosenhahn, Bodo
A2 - Buckley, Christopher
A2 - Cialfi, Daniela
A2 - Lanillos, Pablo
A2 - Ramstead, Maxwell
A2 - Verbelen, Tim
A2 - Ferreira, Pedro M.
A2 - Andresini, Giuseppina
A2 - Malerba, Donato
A2 - Medeiros, Ibéria
A2 - Fournier-Viger, Philippe
A2 - Nawaz, M. Saqib
A2 - Ventura, Sebastian
A2 - Sun, Meng
A2 - Zhou, Min
A2 - Bitetta, Valerio
A2 - Bordino, Ilaria
A2 - Ferretti, Andrea
A2 - Gullo, Francesco
A2 - Ponti, Giovanni
A2 - Severini, Lorenzo
A2 - Ribeiro, Rita
A2 - Gama, João
A2 - Gavaldà, Ricard
A2 - Cooper, Lee
A2 - Ghazaleh, Naghmeh
A2 - Richiardi, Jonas
A2 - Roqueiro, Damian
A2 - Saldana Miranda, Diego
A2 - Sechidis, Konstantinos
A2 - Graça, Guilherme
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21st European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021
Y2 - 13 September 2021 through 17 September 2021
ER -