TY - JOUR
T1 - Use of support vector machines for disease risk prediction in genome-wide association studies: concerns and opportunities
AU - Mittag, Florian
AU - Büchel, Finja
AU - Saad, Mohamad
AU - Jahn, Andreas
AU - Schulte, Claudia
AU - Bochdanovits, Zoltan
AU - Simón-Sánchez, Javier
AU - Nalls, Mike A.
AU - Keller, Margaux
AU - Hernandez, Dena G.
AU - Gibbs, J. Raphael
AU - Lesage, Suzanne
AU - Brice, Alexis
AU - Heutink, Peter
AU - Martinez, Maria
AU - Wood, Nicholas W.
AU - Hardy, John
AU - Singleton, Andrew B.
AU - Zell, Andreas
AU - Gasser, Thomas
AU - Sharma, Manu
AU - AUTHOR GROUP
AU - Nalls, Michael A.
AU - Plagnol, Vincent
AU - Sheerin, Una-Marie
AU - Sveinbjörnsdóttir, Sigurlaug
AU - Arepalli, Sampath
AU - Barker, Roger
AU - Ben-Shlomo, Yoav
AU - Berendse, Henk W.
AU - Berg, Daniela
AU - Bhatia, Kailash
AU - de Bie, Rob M. A.
AU - Biffi, Alessandro
AU - Bloem, Bas
AU - Bonin, Michael
AU - Bras, Jose M.
AU - Brockmann, Kathrin
AU - Brooks, Janet
AU - Burn, David J.
AU - Charlesworth, Gavin
AU - Chen, Honglei
AU - Chinnery, Patrick F.
AU - Chong, Sean
AU - Clarke, Carl E.
AU - Cookson, Mark R.
AU - Cooper, J. Mark
AU - Corvol, Jean Christophe
AU - Counsell, Carl
AU - Damier, Philippe
AU - Velseboer, Daan
PY - 2012
Y1 - 2012
N2 - The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ~0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ~0.56; heritability ~38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/
AB - The success of genome-wide association studies (GWAS) in deciphering the genetic architecture of complex diseases has fueled the expectations whether the individual risk can also be quantified based on the genetic architecture. So far, disease risk prediction based on top-validated single-nucleotide polymorphisms (SNPs) showed little predictive value. Here, we applied a support vector machine (SVM) to Parkinson disease (PD) and type 1 diabetes (T1D), to show that apart from magnitude of effect size of risk variants, heritability of the disease also plays an important role in disease risk prediction. Furthermore, we performed a simulation study to show the role of uncommon (frequency 1-5%) as well as rare variants (frequency <1%) in disease etiology of complex diseases. Using a cross-validation model, we were able to achieve predictions with an area under the receiver operating characteristic curve (AUC) of ~0.88 for T1D, highlighting the strong heritable component (∼90%). This is in contrast to PD, where we were unable to achieve a satisfactory prediction (AUC ~0.56; heritability ~38%). Our simulations showed that simultaneous inclusion of uncommon and rare variants in GWAS would eventually lead to feasible disease risk prediction for complex diseases such as PD. The used software is available at http://www.ra.cs.uni-tuebingen.de/software/MACLEAPS/
U2 - https://doi.org/10.1002/humu.22161
DO - https://doi.org/10.1002/humu.22161
M3 - Article
C2 - 22777693
SN - 1059-7794
VL - 33
SP - 1708
EP - 1718
JO - Human mutation
JF - Human mutation
IS - 12
ER -