Clustering clinical models from local electronic health records based on semantic similarity

Kirstine Rosenbeck Gøeg; Ronald Cornet; Stig Kjær Andersen

doi:https://doi.org/10.1016/j.jbi.2014.12.015

Clustering clinical models from local electronic health records based on semantic similarity

Kirstine Rosenbeck Gøeg, Ronald Cornet, Stig Kjær Andersen

Research output: Contribution to journal › Article › Academic › peer-review

16 Citations (Scopus)

Abstract

Clinical models in electronic health records are typically expressed as templates which support the multiple clinical workflows in which the system is used. The templates are often designed using local rather than standard information models and terminology, which hinders semantic interoperability. Semantic challenges can be solved by harmonizing and standardizing clinical models. However, methods supporting harmonization based on existing clinical models are lacking. One approach is to explore semantic similarity estimation as a basis of an analytical framework. Therefore, the aim of this study is to develop and apply methods for intrinsic similarity-estimation based analysis that can compare and give an overview of multiple clinical models. For a similarity estimate to be intrinsic it should be based on an established ontology, for which SNOMED CT was chosen. In this study, Lin similarity estimates and Sokal and Sneath similarity estimates were used together with two aggregation techniques (average and best-match-average respectively) resulting in a total of four methods. The similarity estimations are used to hierarchically cluster templates. The test material consists of templates from Danish and Swedish EHR systems. The test material was used to evaluate how the four different methods perform. The best-match-average aggregation technique performed better in terms of clustering similar templates than the average aggregation technique. No difference could be seen in terms of the choice of similarity estimate in this study, but the finding may be different for other datasets. The dendrograms resulting from the hierarchical clustering gave an overview of the templates and a basis of further analysis. Hierarchical clustering of templates based on SNOMED CT and semantic similarity estimation with best-match-average aggregation technique can be used for comparison and summarization of multiple templates. Consequently, it can provide a valuable tool for harmonization and standardization of clinical models

Original language	English
Pages (from-to)	294-304
Journal	Journal of biomedical informatics
Volume	54
DOIs	https://doi.org/10.1016/j.jbi.2014.12.015
Publication status	Published - 2015

Access to Document

https://doi.org/10.1016/j.jbi.2014.12.015

Cite this

@article{2e2d7de21b8e4b59afb4f4b5e978f7e5,

title = "Clustering clinical models from local electronic health records based on semantic similarity",

abstract = "Clinical models in electronic health records are typically expressed as templates which support the multiple clinical workflows in which the system is used. The templates are often designed using local rather than standard information models and terminology, which hinders semantic interoperability. Semantic challenges can be solved by harmonizing and standardizing clinical models. However, methods supporting harmonization based on existing clinical models are lacking. One approach is to explore semantic similarity estimation as a basis of an analytical framework. Therefore, the aim of this study is to develop and apply methods for intrinsic similarity-estimation based analysis that can compare and give an overview of multiple clinical models. For a similarity estimate to be intrinsic it should be based on an established ontology, for which SNOMED CT was chosen. In this study, Lin similarity estimates and Sokal and Sneath similarity estimates were used together with two aggregation techniques (average and best-match-average respectively) resulting in a total of four methods. The similarity estimations are used to hierarchically cluster templates. The test material consists of templates from Danish and Swedish EHR systems. The test material was used to evaluate how the four different methods perform. The best-match-average aggregation technique performed better in terms of clustering similar templates than the average aggregation technique. No difference could be seen in terms of the choice of similarity estimate in this study, but the finding may be different for other datasets. The dendrograms resulting from the hierarchical clustering gave an overview of the templates and a basis of further analysis. Hierarchical clustering of templates based on SNOMED CT and semantic similarity estimation with best-match-average aggregation technique can be used for comparison and summarization of multiple templates. Consequently, it can provide a valuable tool for harmonization and standardization of clinical models",

author = "G{\o}eg, {Kirstine Rosenbeck} and Ronald Cornet and Andersen, {Stig Kj{\ae}r}",

year = "2015",

doi = "https://doi.org/10.1016/j.jbi.2014.12.015",

language = "English",

volume = "54",

pages = "294--304",

journal = "Journal of biomedical informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Clustering clinical models from local electronic health records based on semantic similarity

AU - Gøeg, Kirstine Rosenbeck

AU - Cornet, Ronald

AU - Andersen, Stig Kjær

PY - 2015

Y1 - 2015

N2 - Clinical models in electronic health records are typically expressed as templates which support the multiple clinical workflows in which the system is used. The templates are often designed using local rather than standard information models and terminology, which hinders semantic interoperability. Semantic challenges can be solved by harmonizing and standardizing clinical models. However, methods supporting harmonization based on existing clinical models are lacking. One approach is to explore semantic similarity estimation as a basis of an analytical framework. Therefore, the aim of this study is to develop and apply methods for intrinsic similarity-estimation based analysis that can compare and give an overview of multiple clinical models. For a similarity estimate to be intrinsic it should be based on an established ontology, for which SNOMED CT was chosen. In this study, Lin similarity estimates and Sokal and Sneath similarity estimates were used together with two aggregation techniques (average and best-match-average respectively) resulting in a total of four methods. The similarity estimations are used to hierarchically cluster templates. The test material consists of templates from Danish and Swedish EHR systems. The test material was used to evaluate how the four different methods perform. The best-match-average aggregation technique performed better in terms of clustering similar templates than the average aggregation technique. No difference could be seen in terms of the choice of similarity estimate in this study, but the finding may be different for other datasets. The dendrograms resulting from the hierarchical clustering gave an overview of the templates and a basis of further analysis. Hierarchical clustering of templates based on SNOMED CT and semantic similarity estimation with best-match-average aggregation technique can be used for comparison and summarization of multiple templates. Consequently, it can provide a valuable tool for harmonization and standardization of clinical models

AB - Clinical models in electronic health records are typically expressed as templates which support the multiple clinical workflows in which the system is used. The templates are often designed using local rather than standard information models and terminology, which hinders semantic interoperability. Semantic challenges can be solved by harmonizing and standardizing clinical models. However, methods supporting harmonization based on existing clinical models are lacking. One approach is to explore semantic similarity estimation as a basis of an analytical framework. Therefore, the aim of this study is to develop and apply methods for intrinsic similarity-estimation based analysis that can compare and give an overview of multiple clinical models. For a similarity estimate to be intrinsic it should be based on an established ontology, for which SNOMED CT was chosen. In this study, Lin similarity estimates and Sokal and Sneath similarity estimates were used together with two aggregation techniques (average and best-match-average respectively) resulting in a total of four methods. The similarity estimations are used to hierarchically cluster templates. The test material consists of templates from Danish and Swedish EHR systems. The test material was used to evaluate how the four different methods perform. The best-match-average aggregation technique performed better in terms of clustering similar templates than the average aggregation technique. No difference could be seen in terms of the choice of similarity estimate in this study, but the finding may be different for other datasets. The dendrograms resulting from the hierarchical clustering gave an overview of the templates and a basis of further analysis. Hierarchical clustering of templates based on SNOMED CT and semantic similarity estimation with best-match-average aggregation technique can be used for comparison and summarization of multiple templates. Consequently, it can provide a valuable tool for harmonization and standardization of clinical models

U2 - https://doi.org/10.1016/j.jbi.2014.12.015

DO - https://doi.org/10.1016/j.jbi.2014.12.015

M3 - Article

C2 - 25557885

SN - 1532-0464

VL - 54

SP - 294

EP - 304

JO - Journal of biomedical informatics

JF - Journal of biomedical informatics

ER -