Bagged K-means clustering of metabolome data

J. A. Hageman, R. A. van den Berg, J. A. Westerhuis, H. C. J. Hoefsloot, A. K. Smilde

Research output: Contribution to JournalReview articleAcademicpeer-review

11 Citations (Scopus)


Clustering of metabolomics data can be hampered by noise originating from biological variation, physical sampling error and analytical error. Using data analysis methods which are not specially suited for dealing with noisy data will yield sub optimal solutions. Bootstrap aggregating (bagging) is a resampling technique that can deal with noise and improves accuracy. This paper demonstrates the possibilities for bagged clustering applied to metabolomics data. The metabolomics data used in this paper is computer-generated with the human red blood cell model. Perturbing this model can be done in several ways. In this paper, inhibition experiments are mimicked inhibiting enzyme activity to 10% of its original value. Comparing bagged K-means clustering to ordinary K-means, the number of metabolites switching clusters under the influence of heteroscedastic noise is lower if bagging is used. This favors bagged K-means above ordinary K-means clustering when dealing with noisy metabolomics data. A special validation scheme, independent of the addition of noise, has been devised to demonstrate the positive effects of bagging on clustering
Original languageEnglish
Pages (from-to)211-220
JournalCritical reviews in analytical chemistry
Issue number3-4
Publication statusPublished - 2006

Cite this