Performance assessment of ontology matching systems for FAIR data

Philip van Damme; Jesualdo Tomás Fernández-Breis; Nirupama Benis; Jose Antonio Miñarro-Gimenez; Nicolette F. de Keizer; Ronald Cornet

doi:https://doi.org/10.1186/s13326-022-00273-5

Performance assessment of ontology matching systems for FAIR data

Philip van Damme, Jesualdo Tomás Fernández-Breis, Nirupama Benis, Jose Antonio Miñarro-Gimenez, Nicolette F. de Keizer, Ronald Cornet

Research output: Contribution to journal › Article › Academic › peer-review

1 Citation (Scopus)

Abstract

Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings’ classes belonged to top-level classes that matched. Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.

Original language	English
Article number	19
Journal	Journal of Biomedical Semantics
Volume	13
Issue number	1
DOIs	https://doi.org/10.1186/s13326-022-00273-5
Publication status	Published - 1 Dec 2022

Keywords

FAIR data
Ontology matching
Rare diseases
Semantic interoperability

Access to Document

https://doi.org/10.1186/s13326-022-00273-5

Cite this

@article{5f65cf731d6c4beca83ac4250f17d3a0,

title = "Performance assessment of ontology matching systems for FAIR data",

abstract = "Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings{\textquoteright} classes belonged to top-level classes that matched. Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.",

keywords = "FAIR data, Ontology matching, Rare diseases, Semantic interoperability",

author = "{van Damme}, Philip and Fern{\'a}ndez-Breis, {Jesualdo Tom{\'a}s} and Nirupama Benis and Mi{\~n}arro-Gimenez, {Jose Antonio} and {de Keizer}, {Nicolette F.} and Ronald Cornet",

note = "Funding Information: This work has been funded by the European Union{\textquoteright}s Horizon 2020 research and innovation programme under the EJP RD COFUND-EJP N 825575. This research was also partially funded by the Ministerio de Econom{\'i}a, Industria y Competitividad, Gobierno de Espa{\~n}a, and the European Regional Development Fund through grant number TIN2017-85949-C2-1-R. ∘ Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

month = dec,

day = "1",

doi = "https://doi.org/10.1186/s13326-022-00273-5",

language = "English",

volume = "13",

journal = "Journal of Biomedical Semantics",

issn = "2041-1480",

publisher = "BioMed Central Ltd.",

number = "1",

}

TY - JOUR

T1 - Performance assessment of ontology matching systems for FAIR data

AU - van Damme, Philip

AU - Fernández-Breis, Jesualdo Tomás

AU - Benis, Nirupama

AU - Miñarro-Gimenez, Jose Antonio

AU - de Keizer, Nicolette F.

AU - Cornet, Ronald

N1 - Funding Information: This work has been funded by the European Union’s Horizon 2020 research and innovation programme under the EJP RD COFUND-EJP N 825575. This research was also partially funded by the Ministerio de Economía, Industria y Competitividad, Gobierno de España, and the European Regional Development Fund through grant number TIN2017-85949-C2-1-R. ∘ Publisher Copyright: © 2022, The Author(s).

PY - 2022/12/1

Y1 - 2022/12/1

N2 - Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings’ classes belonged to top-level classes that matched. Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.

AB - Background: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision. Results: We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings’ classes belonged to top-level classes that matched. Conclusions: Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.

KW - FAIR data

KW - Ontology matching

KW - Rare diseases

KW - Semantic interoperability

UR - http://www.scopus.com/inward/record.url?scp=85134266736&partnerID=8YFLogxK

U2 - https://doi.org/10.1186/s13326-022-00273-5

DO - https://doi.org/10.1186/s13326-022-00273-5

M3 - Article

C2 - 35841031

SN - 2041-1480

VL - 13

JO - Journal of Biomedical Semantics

JF - Journal of Biomedical Semantics

IS - 1

M1 - 19

ER -

Performance assessment of ontology matching systems for FAIR data

Abstract

Keywords

Access to Document

Other files and links

Cite this