Abstract
In polygenic score (PGS) analysis, the coefficient of determination (R2) is a key statistic to evaluate efficacy. R2 is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (hSNP2, the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R2. However, in real data analyses R2 has been reported to exceed hSNP2, which occurs in parallel with the observation that hSNP2 estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific hSNP2 exist, or if genetic correlations between cohorts are less than one, hSNP2 estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R2 will be greater than hSNP2 and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity.
Original language | English |
---|---|
Pages (from-to) | 1207-1215 |
Number of pages | 9 |
Journal | American journal of human genetics |
Volume | 110 |
Issue number | 7 |
DOIs | |
Publication status | Published - 6 Jul 2023 |
Keywords
- SNP-based heritability
- meta-analysis
- out-of-sample prediction R
- polygenic risk prediction
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver
}
In: American journal of human genetics, Vol. 110, No. 7, 06.07.2023, p. 1207-1215.
Research output: Contribution to journal › Article › Academic › peer-review
TY - JOUR
T1 - Polygenic risk prediction
T2 - why and when out-of-sample prediction R2 can exceed SNP-based heritability
AU - Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium
AU - Wang, Xiaotong
AU - Walker, Alicia
AU - Revez, Joana A.
AU - Ni, Guiyan
AU - Adams, Mark J.
AU - McIntosh, Andrew M.
AU - Wray, Naomi R.
AU - Ripke, Stephan
AU - Mattheisen, Manuel
AU - Trzaskowski, Maciej
AU - Byrne, Enda M.
AU - Abdellaoui, Abdel
AU - Agerbo, Esben
AU - Air, Tracy M.
AU - Andlauer, Till F.M.
AU - Bacanu, Silviu Alin
AU - Bækvad-Hansen, Marie
AU - Beekman, Aartjan T.F.
AU - Bigdeli, Tim B.
AU - Binder, Elisabeth B.
AU - Bryois, Julien
AU - Buttenschøn, Henriette N.
AU - Bybjerg-Grauholm, Jonas
AU - Cai, Na
AU - Castelao, Enrique
AU - Christensen, Jane Hvarregaard
AU - Clarke, Toni Kim
AU - Coleman, Jonathan R.I.
AU - Colodro-Conde, Lucía
AU - Couvy-Duchesne, Baptiste
AU - Craddock, Nick
AU - Crawford, Gregory E.
AU - Davies, Gail
AU - Degenhardt, Franziska
AU - Derks, Eske M.
AU - Direk, Nese
AU - Dolan, Conor V.
AU - Dunn, Erin C.
AU - Eley, Thalia C.
AU - Escott-Price, Valentina
AU - Hottenga, Jouke Jan
AU - Mbarek, Hamdi
AU - Middeldorp, Christel M.
AU - Milaneschi, Yuri
AU - Nivard, Michel G.
AU - Peyrot, Wouter J.
AU - Posthuma, Danielle
AU - Willemsen, Gonneke
AU - Boomsma, Dorret I.
AU - de Geus, E. J.C.
AU - Kiadeh, Farnush Farhadi Hassan
AU - Finucane, Hilary K.
AU - Foo, Jerome C.
AU - Forstner, Andreas J.
AU - Frank, Josef
AU - Gaspar, H. léna A.
AU - Gill, Michael
AU - Goes, Fernando S.
AU - Gordon, Scott D.
AU - Jansen, Rick
AU - Schoevers, Robert
AU - Smit, Johannes H.
AU - Penninx, Brenda W. J. H.
N1 - Funding Information: We acknowledge funding from the Australian National Health & Medical Research Council ( 1173790 , 1113400 ), Australian Research Council ( FL180100072 ), and the National Institute of Mental Health ( R01MH124871 , R01MH121545 ). This work would not have been possible without the contributions of the investigators who comprise the PGC-MDD working group. The procedures followed in the PGC-MDD working group were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and that proper informed consent was obtained. Data analysis was conducted under the University of Queensland Human Research Ethics Committee approval HE002938 . For a full list of acknowledgments and ethical statements of all individual cohorts, please see the original publications. The PGC has received major funding from the US National Institute of Mental Health and the US National Institute on Drug Abuse ( U01 MH109528 and U01 MH1095320 ). Some statistical analyses were carried out on the NL Genetic Cluster Computer ( http://www.geneticcluster.org/ ) hosted by SURFsara who support the PGC through grants to Danielle Posthuma. GWAS summary statistics from 23andMe were included in the meta-analyzed GWAS summary statistics. We thank the customers, research participants, and employees of 23andMe for making this work possible. The study protocol used by 23andMe was approved by an external AAHRPP-accredited institutional review board. The graphical abstract was created with BioRender.com . Funding Information: We acknowledge funding from the Australian National Health & Medical Research Council (1173790, 1113400), Australian Research Council (FL180100072), and the National Institute of Mental Health (R01MH124871, R01MH121545). This work would not have been possible without the contributions of the investigators who comprise the PGC-MDD working group. The procedures followed in the PGC-MDD working group were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and that proper informed consent was obtained. Data analysis was conducted under the University of Queensland Human Research Ethics Committee approval HE002938. For a full list of acknowledgments and ethical statements of all individual cohorts, please see the original publications. The PGC has received major funding from the US National Institute of Mental Health and the US National Institute on Drug Abuse (U01 MH109528 and U01 MH1095320). Some statistical analyses were carried out on the NL Genetic Cluster Computer (http://www.geneticcluster.org/) hosted by SURFsara who support the PGC through grants to Danielle Posthuma. GWAS summary statistics from 23andMe were included in the meta-analyzed GWAS summary statistics. We thank the customers, research participants, and employees of 23andMe for making this work possible. The study protocol used by 23andMe was approved by an external AAHRPP-accredited institutional review board. The graphical abstract was created with BioRender.com. Study motivation, N.R.W. A.M.McI.; theory, X.W. P.M.V. N.R.W.; simulations & analyses, X.W.; data preparation, A.W. J.A.R. G.N. M.J.A.; first draft, X.W. N.R.W. P.M.V.; final draft, all authors read and approved the manuscript. The authors declare no competing interests. Publisher Copyright: © 2023 American Society of Human Genetics
PY - 2023/7/6
Y1 - 2023/7/6
N2 - In polygenic score (PGS) analysis, the coefficient of determination (R2) is a key statistic to evaluate efficacy. R2 is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (hSNP2, the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R2. However, in real data analyses R2 has been reported to exceed hSNP2, which occurs in parallel with the observation that hSNP2 estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific hSNP2 exist, or if genetic correlations between cohorts are less than one, hSNP2 estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R2 will be greater than hSNP2 and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity.
AB - In polygenic score (PGS) analysis, the coefficient of determination (R2) is a key statistic to evaluate efficacy. R2 is the proportion of phenotypic variance explained by the PGS, calculated in a cohort that is independent of the genome-wide association study (GWAS) that provided estimates of allelic effect sizes. The SNP-based heritability (hSNP2, the proportion of total phenotypic variances attributable to all common SNPs) is the theoretical upper limit of the out-of-sample prediction R2. However, in real data analyses R2 has been reported to exceed hSNP2, which occurs in parallel with the observation that hSNP2 estimates tend to decline as the number of cohorts being meta-analyzed increases. Here, we quantify why and when these observations are expected. Using theory and simulation, we show that if heterogeneities in cohort-specific hSNP2 exist, or if genetic correlations between cohorts are less than one, hSNP2 estimates can decrease as the number of cohorts being meta-analyzed increases. We derive conditions when the out-of-sample prediction R2 will be greater than hSNP2 and show the validity of our derivations with real data from a binary trait (major depression) and a continuous trait (educational attainment). Our research calls for a better approach to integrating information from multiple cohorts to address issues of between-cohort heterogeneity.
KW - SNP-based heritability
KW - meta-analysis
KW - out-of-sample prediction R
KW - polygenic risk prediction
UR - http://www.scopus.com/inward/record.url?scp=85164270154&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85164270154&partnerID=8YFLogxK
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85164270154&origin=inward
UR - https://www.ncbi.nlm.nih.gov/pubmed/37379836
U2 - https://doi.org/10.1016/j.ajhg.2023.06.006
DO - https://doi.org/10.1016/j.ajhg.2023.06.006
M3 - Article
C2 - 37379836
SN - 0002-9297
VL - 110
SP - 1207
EP - 1215
JO - American journal of human genetics
JF - American journal of human genetics
IS - 7
ER -