TY - JOUR
T1 - A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework
AU - LeBlanc, Marissa
AU - Zuber, Verena
AU - Thompson, Wesley K.
AU - Andreassen, Ole A.
AU - Frigessi, Arnoldo
AU - Andreassen, Bettina Kulle
AU - Ripke, Stephan
AU - Neale, Benjamin M.
AU - Corvin, Aiden
AU - Walters, James T.R.
AU - Farh, Kai How
AU - Lee, Phil
AU - Bulik-Sullivan, Brendan
AU - Collier, David A.
AU - Huang, Hailiang
AU - Pers, Tune H.
AU - Agartz, Ingrid
AU - Agerbo, Esben
AU - Albus, Margot
AU - Alexander, Madeline
AU - Amin, Farooq
AU - Bacanu, Silviu A.
AU - Begemann, Martin
AU - Belliveau, Richard A.
AU - Bene, Judit
AU - Bevilacqua, Elizabeth
AU - Bigdeli, Tim B.
AU - Black, Donald W.
AU - Bruggeman, Richard
AU - Buccola, Nancy G.
AU - Buckner, Randy L.
AU - Cahn, Wiepke
AU - Cai, Guiqing
AU - Cairns, Murray J.
AU - Campion, Dominique
AU - Cantor, Rita M.
AU - Carr, Vaughan J.
AU - Carrera, Noa
AU - Catts, Stanley V.
AU - Chambert, Kimberly D.
AU - Chan, Raymond C.K.
AU - Chen, Ronald Y.L.
AU - Chen, Eric Y.H.
AU - Cheng, Wei
AU - Cheung, Eric F.C.
AU - Chong, Siow Ann
AU - Cloninger, C. Robert
AU - Cohen, David
AU - Cohen, Nadine
AU - GWAS
AU - Penninx, B.W.J.H.
AU - Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium
AU - de Haan, Lieuwe
PY - 2018/6/25
Y1 - 2018/6/25
N2 - Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.
AB - Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.
KW - Covariate-modulated false discovery rate
KW - Cross-phenotype association
KW - Data integration
KW - Meta-analysis with shared subjects
UR - http://www.scopus.com/inward/record.url?scp=85049066693&partnerID=8YFLogxK
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85049066693&origin=inward
UR - https://www.ncbi.nlm.nih.gov/pubmed/29940862
U2 - https://doi.org/10.1186/s12864-018-4859-7
DO - https://doi.org/10.1186/s12864-018-4859-7
M3 - Article
C2 - 29940862
SN - 1471-2164
VL - 19
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 494
ER -