A hybrid data harmonization workflow using word embeddings for the interlinking of heterogeneous cross-domain clinical data structures

Vasileios C. Pezoulas, Antonis Sakellarios, Marcus Kleber, Jos A. Bosch, Sander W. van der Laan, Femke Lamers, Terho Lehtimäki, Winfried März, Dimitrios I. Fotiadis

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)

Abstract

Retrospective data harmonization is an open issue in healthcare due to the emerging need to interlink data from multiple clinical centers with the absence of standardized data collection protocols. In this work, we present an automated data harmonization workflow which utilizes lexical and semantic analysis based on word embeddings and relational modeling to detect terminologies with common lexical and conceptual basis. The method is built on top of a knowledge base to enable the interlinking of heterogeneous cross-domain data. A case study is applied in two clinical domains, namely the cardiovascular disease (CVD) and the mental disorders, where the proposed method yielded matched terminologies with 85% precision in less execution time than the application of lexical analysis and manual mapping which yielded 10% less precision.

Original languageEnglish
Title of host publicationBHI 2021 - 2021 IEEE EMBS International Conference on Biomedical and Health Informatics, Proceedings
Subtitle of host publication2021 BHI conference proceedings : virtual conference, July 27-30, 2021
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages88-91
Number of pages4
ISBN (Electronic)9781665403580
ISBN (Print)9781665447706
DOIs
Publication statusPublished - 2021
Event2021 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2021 - Virtual, Online, Greece
Duration: 27 Jul 202130 Jul 2021

Publication series

NameBHI 2021 - 2021 IEEE EMBS International Conference on Biomedical and Health Informatics, Proceedings

Conference

Conference2021 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2021
Country/TerritoryGreece
CityVirtual, Online
Period27/07/202130/07/2021

Keywords

  • Cardiovascular diseases
  • Data harmonization
  • Lexical matching
  • Mental disorders
  • Semantic matching

Cite this