Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores

The Biobank Japan Project

doi:https://doi.org/10.1038/s41588-022-01036-9

Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores

The Biobank Japan Project

Research output: Contribution to journal › Article › Academic › peer-review

64 Citations (Scopus)

Abstract

Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.

Original language	English
Pages (from-to)	450-458
Number of pages	9
Journal	Nature Genetics
Volume	54
Issue number	4
Early online date	2022
DOIs	https://doi.org/10.1038/s41588-022-01036-9
Publication status	Published - Apr 2022

Access to Document

https://doi.org/10.1038/s41588-022-01036-9

Cite this

@article{9528ef810fd0427d98384bea06922895,

title = "Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores",

abstract = "Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.",

author = "Omer Weissbrod and Masahiro Kanai and Huwenbo Shi and Steven Gazal and Peyrot, {Wouter J.} and Khera, {Amit V.} and Yukinori Okada and Koichi Matsuda and Yuji Yamanashi and Yoichi Furukawa and Takayuki Morisaki and Yoshinori Murakami and Yoichiro Kamatani and Kaori Muto and Akiko Nagai and Wataru Obara and Ken Yamaji and Kazuhisa Takahashi and Satoshi Asai and Yasuo Takahashi and Takao Suzuki and Nobuaki Sinozaki and Hiroki Yamaguchi and Shiro Minami and Shigeo Murayama and Kozo Yoshimori and Satoshi Nagayama and Daisuke Obata and Masahiko Higashiyama and Akihide Masumoto and {The Biobank Japan Project} and Yukihiro Koretsune and Martin, {Alicia R.} and Finucane, {Hilary K.} and Price, {Alkes L.}",

note = "Funding Information: We thank A. Schoech and C. M{\'a}rquez-Luna for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and was funded by the National Institutes of Health (NIH; grant nos. U01 HG009379, U01 HG012009, R37 MH107649, R01 MH101244 and R01 HG006399). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. W.J.P. was supported by an NWO Veni grant (no. 91619152). A.R.M. was supported by the National Institute of Mental Health (grant no. K99/R00MH117229). H.K.F. was supported by E. and W. Schmidt. A.V.K. was supported by grants (nos. 1K08HG010155 and 1U01HG011719) from the National Human Genome Research Institute and a sponsored research agreement from IBM Research. Y.O. was supported by JSPS KAKENHI (grant nos. 19H01021 and 20K21834) and AMED (grant nos. JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006 and P21km0405217) and JST Moonshot R&D (grant nos. JPMJMS2021 and JPMJMS2024). Computational analyses were performed on the O2 High-Performance Compute Cluster at Harvard Medical School. Funding Information: We thank A. Schoech and C. M{\'a}rquez-Luna for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and was funded by the National Institutes of Health (NIH; grant nos. U01 HG009379, U01 HG012009, R37 MH107649, R01 MH101244 and R01 HG006399). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. W.J.P. was supported by an NWO Veni grant (no. 91619152). A.R.M. was supported by the National Institute of Mental Health (grant no. K99/R00MH117229). H.K.F. was supported by E. and W. Schmidt. A.V.K. was supported by grants (nos. 1K08HG010155 and 1U01HG011719) from the National Human Genome Research Institute and a sponsored research agreement from IBM Research. Y.O. was supported by JSPS KAKENHI (grant nos. 19H01021 and 20K21834) and AMED (grant nos. JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006 and P21km0405217) and JST Moonshot R&D (grant nos. JPMJMS2021 and JPMJMS2024). Computational analyses were performed on the O2 High-Performance Compute Cluster at Harvard Medical School. Publisher Copyright: {\textcopyright} 2022, The Author(s), under exclusive licence to Springer Nature America, Inc.",

year = "2022",

month = apr,

doi = "https://doi.org/10.1038/s41588-022-01036-9",

language = "English",

volume = "54",

pages = "450--458",

journal = "Nature Genetics",

issn = "1061-4036",

publisher = "Nature Publishing Group",

number = "4",

}

TY - JOUR

T1 - Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores

AU - Weissbrod, Omer

AU - Kanai, Masahiro

AU - Shi, Huwenbo

AU - Gazal, Steven

AU - Peyrot, Wouter J.

AU - Khera, Amit V.

AU - Okada, Yukinori

AU - Matsuda, Koichi

AU - Yamanashi, Yuji

AU - Furukawa, Yoichi

AU - Morisaki, Takayuki

AU - Murakami, Yoshinori

AU - Kamatani, Yoichiro

AU - Muto, Kaori

AU - Nagai, Akiko

AU - Obara, Wataru

AU - Yamaji, Ken

AU - Takahashi, Kazuhisa

AU - Asai, Satoshi

AU - Takahashi, Yasuo

AU - Suzuki, Takao

AU - Sinozaki, Nobuaki

AU - Yamaguchi, Hiroki

AU - Minami, Shiro

AU - Murayama, Shigeo

AU - Yoshimori, Kozo

AU - Nagayama, Satoshi

AU - Obata, Daisuke

AU - Higashiyama, Masahiko

AU - Masumoto, Akihide

AU - The Biobank Japan Project

AU - Koretsune, Yukihiro

AU - Martin, Alicia R.

AU - Finucane, Hilary K.

AU - Price, Alkes L.

N1 - Funding Information: We thank A. Schoech and C. Márquez-Luna for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and was funded by the National Institutes of Health (NIH; grant nos. U01 HG009379, U01 HG012009, R37 MH107649, R01 MH101244 and R01 HG006399). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. W.J.P. was supported by an NWO Veni grant (no. 91619152). A.R.M. was supported by the National Institute of Mental Health (grant no. K99/R00MH117229). H.K.F. was supported by E. and W. Schmidt. A.V.K. was supported by grants (nos. 1K08HG010155 and 1U01HG011719) from the National Human Genome Research Institute and a sponsored research agreement from IBM Research. Y.O. was supported by JSPS KAKENHI (grant nos. 19H01021 and 20K21834) and AMED (grant nos. JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006 and P21km0405217) and JST Moonshot R&D (grant nos. JPMJMS2021 and JPMJMS2024). Computational analyses were performed on the O2 High-Performance Compute Cluster at Harvard Medical School. Funding Information: We thank A. Schoech and C. Márquez-Luna for helpful discussions. This research was conducted using the UK Biobank resource under application no. 16549 and was funded by the National Institutes of Health (NIH; grant nos. U01 HG009379, U01 HG012009, R37 MH107649, R01 MH101244 and R01 HG006399). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. W.J.P. was supported by an NWO Veni grant (no. 91619152). A.R.M. was supported by the National Institute of Mental Health (grant no. K99/R00MH117229). H.K.F. was supported by E. and W. Schmidt. A.V.K. was supported by grants (nos. 1K08HG010155 and 1U01HG011719) from the National Human Genome Research Institute and a sponsored research agreement from IBM Research. Y.O. was supported by JSPS KAKENHI (grant nos. 19H01021 and 20K21834) and AMED (grant nos. JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006 and P21km0405217) and JST Moonshot R&D (grant nos. JPMJMS2021 and JPMJMS2024). Computational analyses were performed on the O2 High-Performance Compute Cluster at Harvard Medical School. Publisher Copyright: © 2022, The Author(s), under exclusive licence to Springer Nature America, Inc.

PY - 2022/4

Y1 - 2022/4

N2 - Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.

AB - Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85128487891&origin=inward

UR - https://www.ncbi.nlm.nih.gov/pubmed/35393596

UR - http://www.scopus.com/inward/record.url?scp=85128487891&partnerID=8YFLogxK

U2 - https://doi.org/10.1038/s41588-022-01036-9

DO - https://doi.org/10.1038/s41588-022-01036-9

M3 - Article

C2 - 35393596

SN - 1061-4036

VL - 54

SP - 450

EP - 458

JO - Nature Genetics

JF - Nature Genetics

IS - 4

ER -

Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores

Abstract

Access to Document

Other files and links

Cite this