Extending the use of GWAS data by combining data from different genetic platforms

E. P. A. van Iperen; G. K. Hovingh; F. W. Asselbergs; A. H. Zwinderman

doi:https://doi.org/10.1371/journal.pone.0172082

Extending the use of GWAS data by combining data from different genetic platforms

E. P. A. van Iperen, G. K. Hovingh, F. W. Asselbergs, A. H. Zwinderman

Research output: Contribution to journal › Article › Academic › peer-review

5 Citations (Scopus)

Abstract

In the past decade many Genome-wide Association Studies (GWAS) were performed that discovered new associations between single-nucleotide polymorphisms (SNPs) and various phenotypes. Imputation methods are widely used in GWAS. They facilitate the phenotype association with variants that are not directly genotyped. Imputation methods can also be used to combine and analyse data genotyped on different genotyping arrays. In this study we investigated the imputation quality and efficiency of two different approaches of combining GWAS data from different genotyping platforms. We investigated whether combining data from different platforms before the actual imputation performs better than combining the data from different platforms after imputation. In total 979 unique individuals from the AMC-PAS cohort were genotyped on 3 different platforms. A total of 706 individuals were genotyped on the MetaboChip, a total of 757 individuals were genotyped on the 50K gene-centric Human CVD BeadChip, and a total of 955 individuals were genotyped on the HumanExome chip. A total of 397 individuals were genotyped on all 3 individual platforms. After pre-imputation quality control (QC), Minimac in combination with MaCH was used for the imputation of all samples with the 1,000 genomes reference panel. All imputed markers with an r2 value of <0.3 were excluded in our post-imputation QC. A total of 397 individuals were genotyped on all three platforms. All three datasets were carefully matched on strand, SNP ID and genomic coordinates. This resulted in a dataset of 979 unique individuals and a total of 258,925 unique markers. A total of 4,117,036 SNPs were available when imputation was performed before merging the three datasets. A total of 3,933,494 SNPs were available when imputation was done on the combined set. Our results suggest that imputation of individual datasets before merging performs slightly better than after combining the different datasets. Imputation of datasets genotyped by different platforms before merging generates more SNPs than imputation after putting the datasets together

Original language	English
Article number	e0172082
Journal	PLOS ONE
Volume	12
Issue number	2
DOIs	https://doi.org/10.1371/journal.pone.0172082
Publication status	Published - 2017

Access to Document

https://doi.org/10.1371/journal.pone.0172082

Cite this

@article{88232f1430f64b21ac8f8a6d53289045,

title = "Extending the use of GWAS data by combining data from different genetic platforms",

abstract = "In the past decade many Genome-wide Association Studies (GWAS) were performed that discovered new associations between single-nucleotide polymorphisms (SNPs) and various phenotypes. Imputation methods are widely used in GWAS. They facilitate the phenotype association with variants that are not directly genotyped. Imputation methods can also be used to combine and analyse data genotyped on different genotyping arrays. In this study we investigated the imputation quality and efficiency of two different approaches of combining GWAS data from different genotyping platforms. We investigated whether combining data from different platforms before the actual imputation performs better than combining the data from different platforms after imputation. In total 979 unique individuals from the AMC-PAS cohort were genotyped on 3 different platforms. A total of 706 individuals were genotyped on the MetaboChip, a total of 757 individuals were genotyped on the 50K gene-centric Human CVD BeadChip, and a total of 955 individuals were genotyped on the HumanExome chip. A total of 397 individuals were genotyped on all 3 individual platforms. After pre-imputation quality control (QC), Minimac in combination with MaCH was used for the imputation of all samples with the 1,000 genomes reference panel. All imputed markers with an r2 value of <0.3 were excluded in our post-imputation QC. A total of 397 individuals were genotyped on all three platforms. All three datasets were carefully matched on strand, SNP ID and genomic coordinates. This resulted in a dataset of 979 unique individuals and a total of 258,925 unique markers. A total of 4,117,036 SNPs were available when imputation was performed before merging the three datasets. A total of 3,933,494 SNPs were available when imputation was done on the combined set. Our results suggest that imputation of individual datasets before merging performs slightly better than after combining the different datasets. Imputation of datasets genotyped by different platforms before merging generates more SNPs than imputation after putting the datasets together",

author = "{van Iperen}, {E. P. A.} and Hovingh, {G. K.} and Asselbergs, {F. W.} and Zwinderman, {A. H.}",

year = "2017",

doi = "https://doi.org/10.1371/journal.pone.0172082",

language = "English",

volume = "12",

journal = "PLOS ONE",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "2",

}

TY - JOUR

T1 - Extending the use of GWAS data by combining data from different genetic platforms

AU - van Iperen, E. P. A.

AU - Hovingh, G. K.

AU - Asselbergs, F. W.

AU - Zwinderman, A. H.

PY - 2017

Y1 - 2017

N2 - In the past decade many Genome-wide Association Studies (GWAS) were performed that discovered new associations between single-nucleotide polymorphisms (SNPs) and various phenotypes. Imputation methods are widely used in GWAS. They facilitate the phenotype association with variants that are not directly genotyped. Imputation methods can also be used to combine and analyse data genotyped on different genotyping arrays. In this study we investigated the imputation quality and efficiency of two different approaches of combining GWAS data from different genotyping platforms. We investigated whether combining data from different platforms before the actual imputation performs better than combining the data from different platforms after imputation. In total 979 unique individuals from the AMC-PAS cohort were genotyped on 3 different platforms. A total of 706 individuals were genotyped on the MetaboChip, a total of 757 individuals were genotyped on the 50K gene-centric Human CVD BeadChip, and a total of 955 individuals were genotyped on the HumanExome chip. A total of 397 individuals were genotyped on all 3 individual platforms. After pre-imputation quality control (QC), Minimac in combination with MaCH was used for the imputation of all samples with the 1,000 genomes reference panel. All imputed markers with an r2 value of <0.3 were excluded in our post-imputation QC. A total of 397 individuals were genotyped on all three platforms. All three datasets were carefully matched on strand, SNP ID and genomic coordinates. This resulted in a dataset of 979 unique individuals and a total of 258,925 unique markers. A total of 4,117,036 SNPs were available when imputation was performed before merging the three datasets. A total of 3,933,494 SNPs were available when imputation was done on the combined set. Our results suggest that imputation of individual datasets before merging performs slightly better than after combining the different datasets. Imputation of datasets genotyped by different platforms before merging generates more SNPs than imputation after putting the datasets together

AB - In the past decade many Genome-wide Association Studies (GWAS) were performed that discovered new associations between single-nucleotide polymorphisms (SNPs) and various phenotypes. Imputation methods are widely used in GWAS. They facilitate the phenotype association with variants that are not directly genotyped. Imputation methods can also be used to combine and analyse data genotyped on different genotyping arrays. In this study we investigated the imputation quality and efficiency of two different approaches of combining GWAS data from different genotyping platforms. We investigated whether combining data from different platforms before the actual imputation performs better than combining the data from different platforms after imputation. In total 979 unique individuals from the AMC-PAS cohort were genotyped on 3 different platforms. A total of 706 individuals were genotyped on the MetaboChip, a total of 757 individuals were genotyped on the 50K gene-centric Human CVD BeadChip, and a total of 955 individuals were genotyped on the HumanExome chip. A total of 397 individuals were genotyped on all 3 individual platforms. After pre-imputation quality control (QC), Minimac in combination with MaCH was used for the imputation of all samples with the 1,000 genomes reference panel. All imputed markers with an r2 value of <0.3 were excluded in our post-imputation QC. A total of 397 individuals were genotyped on all three platforms. All three datasets were carefully matched on strand, SNP ID and genomic coordinates. This resulted in a dataset of 979 unique individuals and a total of 258,925 unique markers. A total of 4,117,036 SNPs were available when imputation was performed before merging the three datasets. A total of 3,933,494 SNPs were available when imputation was done on the combined set. Our results suggest that imputation of individual datasets before merging performs slightly better than after combining the different datasets. Imputation of datasets genotyped by different platforms before merging generates more SNPs than imputation after putting the datasets together

U2 - https://doi.org/10.1371/journal.pone.0172082

DO - https://doi.org/10.1371/journal.pone.0172082

M3 - Article

C2 - 28245255

SN - 1932-6203

VL - 12

JO - PLOS ONE

JF - PLOS ONE

IS - 2

M1 - e0172082

ER -