A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

GWAS; B.W.J.H. Penninx; Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium

doi:https://doi.org/10.1186/s12864-018-4859-7

A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

GWAS, B.W.J.H. Penninx, Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium

Research output: Contribution to journal › Article › Academic › peer-review

25 Citations (Scopus)

Abstract

Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.

Original language	English
Article number	494
Journal	BMC Genomics
Volume	19
Issue number	1
DOIs	https://doi.org/10.1186/s12864-018-4859-7
Publication status	Published - 25 Jun 2018

Keywords

Covariate-modulated false discovery rate
Cross-phenotype association
Data integration
Meta-analysis with shared subjects

Access to Document

https://doi.org/10.1186/s12864-018-4859-7

Cite this

@article{30746f199d5c4303814174c35feb842f,

title = "A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework",

abstract = "Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.",

keywords = "Covariate-modulated false discovery rate, Cross-phenotype association, Data integration, Meta-analysis with shared subjects",

author = "Marissa LeBlanc and Verena Zuber and Thompson, {Wesley K.} and Andreassen, {Ole A.} and Arnoldo Frigessi and Andreassen, {Bettina Kulle} and Stephan Ripke and Neale, {Benjamin M.} and Aiden Corvin and Walters, {James T.R.} and Farh, {Kai How} and Phil Lee and Brendan Bulik-Sullivan and Collier, {David A.} and Hailiang Huang and Pers, {Tune H.} and Ingrid Agartz and Esben Agerbo and Margot Albus and Madeline Alexander and Farooq Amin and Bacanu, {Silviu A.} and Martin Begemann and Belliveau, {Richard A.} and Judit Bene and Elizabeth Bevilacqua and Bigdeli, {Tim B.} and Black, {Donald W.} and Richard Bruggeman and Buccola, {Nancy G.} and Buckner, {Randy L.} and Wiepke Cahn and Guiqing Cai and Cairns, {Murray J.} and Dominique Campion and Cantor, {Rita M.} and Carr, {Vaughan J.} and Noa Carrera and Catts, {Stanley V.} and Chambert, {Kimberly D.} and Chan, {Raymond C.K.} and Chen, {Ronald Y.L.} and Chen, {Eric Y.H.} and Wei Cheng and Cheung, {Eric F.C.} and Chong, {Siow Ann} and Cloninger, {C. Robert} and David Cohen and Nadine Cohen and GWAS and B.W.J.H. Penninx and {Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium} and {de Haan}, Lieuwe",

year = "2018",

month = jun,

day = "25",

doi = "https://doi.org/10.1186/s12864-018-4859-7",

language = "English",

volume = "19",

journal = "BMC Genomics",

issn = "1471-2164",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

AU - LeBlanc, Marissa

AU - Zuber, Verena

AU - Thompson, Wesley K.

AU - Andreassen, Ole A.

AU - Frigessi, Arnoldo

AU - Andreassen, Bettina Kulle

AU - Ripke, Stephan

AU - Neale, Benjamin M.

AU - Corvin, Aiden

AU - Walters, James T.R.

AU - Farh, Kai How

AU - Lee, Phil

AU - Bulik-Sullivan, Brendan

AU - Collier, David A.

AU - Huang, Hailiang

AU - Pers, Tune H.

AU - Agartz, Ingrid

AU - Agerbo, Esben

AU - Albus, Margot

AU - Alexander, Madeline

AU - Amin, Farooq

AU - Bacanu, Silviu A.

AU - Begemann, Martin

AU - Belliveau, Richard A.

AU - Bene, Judit

AU - Bevilacqua, Elizabeth

AU - Bigdeli, Tim B.

AU - Black, Donald W.

AU - Bruggeman, Richard

AU - Buccola, Nancy G.

AU - Buckner, Randy L.

AU - Cahn, Wiepke

AU - Cai, Guiqing

AU - Cairns, Murray J.

AU - Campion, Dominique

AU - Cantor, Rita M.

AU - Carr, Vaughan J.

AU - Carrera, Noa

AU - Catts, Stanley V.

AU - Chambert, Kimberly D.

AU - Chan, Raymond C.K.

AU - Chen, Ronald Y.L.

AU - Chen, Eric Y.H.

AU - Cheng, Wei

AU - Cheung, Eric F.C.

AU - Chong, Siow Ann

AU - Cloninger, C. Robert

AU - Cohen, David

AU - Cohen, Nadine

AU - GWAS

AU - Penninx, B.W.J.H.

AU - Schizophrenia and Bipolar Disorder Working Groups of the Psychiatric Genomics Consortium

AU - de Haan, Lieuwe

PY - 2018/6/25

Y1 - 2018/6/25

N2 - Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.

AB - Background: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr). Results: We propose a method for correcting for sample overlap at the summary statistic level. We quantify the expected amount of spurious correlation between the summary statistics from two GWAS due to sample overlap, and use this estimated correlation in a simple linear correction that adjusts the joint distribution of test statistics from the two GWAS. The correction is appropriate for GWAS with case-control or quantitative outcomes. Our simulations and data example show that without correcting for sample overlap, the cmfdr is not properly controlled, leading to an excessive number of false discoveries and an excessive false discovery proportion. Our correction for sample overlap is effective in that it restores proper control of the false discovery rate, at very little loss in power. Conclusions: With our proposed correction, it is possible to integrate GWAS summary statistics with overlapping samples in a statistical framework that is dependent on the joint distribution of the two GWAS.

KW - Covariate-modulated false discovery rate

KW - Cross-phenotype association

KW - Data integration

KW - Meta-analysis with shared subjects

UR - http://www.scopus.com/inward/record.url?scp=85049066693&partnerID=8YFLogxK

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85049066693&origin=inward

UR - https://www.ncbi.nlm.nih.gov/pubmed/29940862

U2 - https://doi.org/10.1186/s12864-018-4859-7

DO - https://doi.org/10.1186/s12864-018-4859-7

M3 - Article

C2 - 29940862

SN - 1471-2164

VL - 19

JO - BMC Genomics

JF - BMC Genomics

IS - 1

M1 - 494

ER -

A correction for sample overlap in genome-wide association studies in a polygenic pleiotropy-informed framework

Abstract

Keywords

Access to Document

Other files and links

Cite this