Determining and assessing characteristics of data element names impacting the performance of annotation using Usagi

Dutch ICU Data Sharing Against COVID-19 Collaborators

doi:https://doi.org/10.1016/j.ijmedinf.2023.105200

Determining and assessing characteristics of data element names impacting the performance of annotation using Usagi

Dutch ICU Data Sharing Against COVID-19 Collaborators

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Introduction: Hospitals generate large amounts of data and this data is generally modeled and labeled in a proprietary way, hampering its exchange and integration. Manually annotating data element names to internationally standardized data element identifiers is a time-consuming effort. Tools can support performing this task automatically. This study aimed to determine what factors influence the quality of automatic annotations. Methods: Data element names were used from the Dutch COVID-19 ICU Data Warehouse containing data on intensive care patients with COVID-19 from 25 hospitals in the Netherlands. In this data warehouse, the data had been merged using a proprietary terminology system while also storing the original hospital labels (synonymous names). Usagi, an OHDSI annotation tool, was used to perform the annotation for the data. A gold standard was used to determine if Usagi made correct annotations. Logistic regression was used to determine if the number of characters, number of words, match score (Usagi's certainty) and hospital label origin influenced Usagi's performance to annotate correctly. Results: Usagi automatically annotated 30.5% of the data element names correctly and 5.5% of the synonymous names. The match score is the best predictor for Usagi finding the correct annotation. It was determined that the AUC of data element names was 0.651 and 0.752 for the synonymous names respectively. The AUC for the individual hospital label origins varied between 0.460 to 0.905. Discussion: The results show that Usagi performed better to annotate the data element names than the synonymous names. The hospital origin in the synonymous names dataset was associated with the amount of correctly annotated concepts. Hospitals that performed better had shorter synonymous names and fewer words. Using shorter data element names or synonymous names should be considered to optimize the automatic annotating process. Overall, the performance of Usagi is too poor to completely rely on for automatic annotation.

Original language	English
Article number	105200
Journal	International Journal of Medical Informatics
Volume	178
DOIs	https://doi.org/10.1016/j.ijmedinf.2023.105200
Publication status	Published - 1 Oct 2023

Keywords

Data annotation
Data interoperability
Data quality
Data standardization
OMOP CDM
Usagi

Access to Document

https://doi.org/10.1016/j.ijmedinf.2023.105200

Cite this

@article{c60ea72408324fa5871f64037107cbc6,

title = "Determining and assessing characteristics of data element names impacting the performance of annotation using Usagi",

abstract = "Introduction: Hospitals generate large amounts of data and this data is generally modeled and labeled in a proprietary way, hampering its exchange and integration. Manually annotating data element names to internationally standardized data element identifiers is a time-consuming effort. Tools can support performing this task automatically. This study aimed to determine what factors influence the quality of automatic annotations. Methods: Data element names were used from the Dutch COVID-19 ICU Data Warehouse containing data on intensive care patients with COVID-19 from 25 hospitals in the Netherlands. In this data warehouse, the data had been merged using a proprietary terminology system while also storing the original hospital labels (synonymous names). Usagi, an OHDSI annotation tool, was used to perform the annotation for the data. A gold standard was used to determine if Usagi made correct annotations. Logistic regression was used to determine if the number of characters, number of words, match score (Usagi's certainty) and hospital label origin influenced Usagi's performance to annotate correctly. Results: Usagi automatically annotated 30.5% of the data element names correctly and 5.5% of the synonymous names. The match score is the best predictor for Usagi finding the correct annotation. It was determined that the AUC of data element names was 0.651 and 0.752 for the synonymous names respectively. The AUC for the individual hospital label origins varied between 0.460 to 0.905. Discussion: The results show that Usagi performed better to annotate the data element names than the synonymous names. The hospital origin in the synonymous names dataset was associated with the amount of correctly annotated concepts. Hospitals that performed better had shorter synonymous names and fewer words. Using shorter data element names or synonymous names should be considered to optimize the automatic annotating process. Overall, the performance of Usagi is too poor to completely rely on for automatic annotation.",

keywords = "Data annotation, Data interoperability, Data quality, Data standardization, OMOP CDM, Usagi",

author = "{de Groot}, Rowdy and P{\"u}ttmann, {Daniel P.} and {Dutch ICU Data Sharing Against COVID-19 Collaborators} and Fleuren, {Lucas M.} and Thoral, {Patrick J.} and Elbers, {Paul W. G.} and {de Keizer}, {Nicolette F.} and Ronald Cornet",

note = "Funding Information: The Dutch ICU Data Sharing Against COVID-19 Collaborators: Diederik Gommers, MD, PhD, Department of Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands. Olaf L. Cremer, MD, PhD, Intensive Care, UMC Utrecht, Utrecht, The Netherlands. Rob J. Bosman, MD, ICU, OLVG, Amsterdam, The Netherlands. Sander Rigter, MD, Department of Anesthesiology and Intensive Care, St. Antonius Hospital, Nieuwegein, The Netherlands. Evert-Jan Wils, MD, PhD, Department of Intensive Care, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands. Tim Frenzel, MD, PhD, Department of Intensive Care Medicine, Radboud University Medical Center, Nijmegen, The Netherlands. Dave A. Dongelmans, MD, PhD, Department of Intensive Care Medicine, Amsterdam UMC, Amsterdam, The Netherlands. Remko de Jong, MD, Intensive Care, Bovenij Ziekenhuis, Amsterdam, The Netherlands. Marco A.A. Peters, MD, Intensive Care, Canisius Wilhelmina Ziekenhuis, Nijmegen, The Netherlands. Marlijn J.A. Kamps, MD, Intensive Care, Catharina Ziekenhuis Eindhoven, Eindhoven, The Netherlands. Dharmanand Ramnarain, MD, Department of Intensive Care, ETZ Tilburg, Tilburg, The Netherlands. Ralph Nowitzky, MD, Intensive Care, HagaZiekenhuis, Den Haag, The Netherlands. Fleur G.C.A. Nooteboom, MD, Intensive Care, Laurentius Ziekenhuis, Roermond, The Netherlands. Wouter de Ruijter, MD, PhD, Department of Intensive Care Medicine, Northwest Clinics, Alkmaar, The Netherlands. Louise C. Urlings-Strop, MD, PhD, Intensive Care, Reinier de Graaf Gasthuis, Delft, The Netherlands. Ellen G.M. Smit, MD, Intensive Care, Spaarne Gasthuis, Haarlem en Hoofddorp, The Netherlands. D. Jannet Mehagnoul-Schipper, MD, PhD, Intensive Care, VieCuri Medisch Centrum, Venlo, The Netherlands. Tom Dormans, MD, PhD, Intensive care, Zuyderland MC, Heerlen, The Netherlands. Cornelis P.C. de Jager, MD, PhD, Department of Intensive Care, Jeroen Bosch Ziekenhuis, Den Bosch, The Netherlands. Stefaan H.A. Hendriks, MD, Intensive Care, Albert Schweitzerziekenhuis, Dordrecht, The Netherlands. Sefanja Achterberg, MD, PhD, ICU, Haaglanden Medisch Centrum, Den Haag, The Netherlands. Evelien Oostdijk, MD, PhD, ICU, Maasstad Ziekenhuis Rotterdam, Rotterdam, The Netherlands. Auke C. Reidinga, MD, ICU, BWC, Martiniziekenhuis, Groningen. Barbara Festen-Spanjer, MD, Intensive Care, Ziekenhuis Gelderse Vallei, Ede, The Netherlands. Gert B. Brunnekreef, MD, Department of Intensive Care, Ziekenhuisgroep Twente, Almelo, The Netherlands. Alexander D. Cornet, MD, PhD, FRCP, Department of Intensive Care, Medisch Spectrum Twente, Enschede, The Netherlands. Walter van den Tempel, MD, Department of Intensive Care, Ikazia Ziekenhuis Rotterdam, Rotterdam, The Netherlands. Age D. Boelens, MD, Anesthesiology, Antonius Ziekenhuis Sneek, Sneek, The Netherlands. Peter Koetsier, MD, Intensive Care, Medisch Centrum Leeuwarden, Leeuwarden, The Netherlands. Judith Lens, MD, ICU, IJsselland Ziekenhuis, Capelle aan den IJssel, The Netherlands. Harald J. Faber, MD, ICU, WZA, Assen, The Netherlands. A. Karakus, MD, Department of Intensive Care, Diakonessenhuis Hospital, Utrecht, The Netherlands. Robert Entjes, MD, Department of Intensive Care, Adrz, Goes, The Netherlands. Paul de Jong, MD, Department of Anesthesia and Intensive Care, Slingeland Ziekenhuis, Doetinchem, The Netherlands. Thijs C.D. Rettig, MD, PhD, Department of Anesthesiology, Intensive Care and Pain Medicine, Amphia Ziekenhuis, Breda, The Netherlands. Sesmu Arbous, MD, PhD, Intensivist, LUMC, Leiden, The Netherlands. Tariq A. Dam, MD, Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam Medical Data Science, Amsterdam UMC. Sebastiaan J.J. Vonk, MSc, Pacmed, Amsterdam, The Netherlands. Tomas Machado, Pacmed, Amsterdam, The Netherlands. Willem E. Herter, BSc, Pacmed, Amsterdam, The Netherlands. From collaborating hospitals having shared data: Julia Koeter, MD, Intensive Care, Canisius Wilhelmina Ziekenhuis, Nijmegen, The Netherlands. Roger van Rietschote, Business Intelligence, Haaglanden MC, Den Haag, The Netherlands. M.C. Reuland, MD, Department of Intensive Care Medicine, Amsterdam UMC, Universiteit van Amsterdam, Amsterdam, The Netherlands. Laura van Manen, MD, Department of Intensive Care, BovenIJ Ziekenhuis, Amsterdam, The Netherlands. Leon Montenij, MD, PhD, Department of Anesthesiology, Pain Management and Intensive Care, Catharina Ziekenhuis Eindhoven, Eindhoven, The Netherlands. Jasper van Bommel, MD, PhD, Department of Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands. Roy van den Berg, Department of Intensive Care, ETZ Tilburg, Tilburg, The Netherlands. Ellen van Geest, Department of ICMT, Haga Ziekenhuis, Den Haag, The Netherlands. Anisa Hana, MD, PhD, Intensive Care, Laurentius Ziekenhuis, Roermond, The Netherlands. B. van den Bogaard, MD, PhD, ICU, OLVG, Amsterdam, The Netherlands. Prof. Peter Pickkers, Department of Intensive Care Medicine, Radboud University Medical Centre, Nijmegen, The Netherlands. Pim van der Heiden, MD, PhD, Intensive Care, Reinier de Graaf Gasthuis, Delft, The Netherlands. Claudia (C.W.) van Gemeren, MD, Intensive Care, Spaarne Gasthuis, Haarlem en Hoofddorp, The Netherlands. Arend Jan Meinders, MD, Department of Internal Medicine and Intensive Care, St Antonius Hospital, Nieuwegein, The Netherlands. Martha de Bruin, MD, Department of Intensive Care, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands. Emma Rademaker, MD, MSc, Department of Intensive Care, UMC Utrecht, Utrecht, The Netherlands. Frits H.M. van Osch, PhD, Department of Clinical Epidemiology, VieCuri Medisch Centrum, Venlo, The Netherlands. Martijn de Kruif, MD, PhD, Department of Pulmonology, Zuyderland MC, Heerlen, The Netherlands. Nicolas Schroten, MD, Intensive Care, Albert Schweitzerziekenhuis, Dordrecht, The Netherlands. Klaas Sierk Arnold, MD, Anesthesiology, Antonius Ziekenhuis Sneek, Sneek, The Netherlands. J.W. Fijen, MD, PhD, Department of Intensive Care, Diakonessenhuis Hospital, Utrecht, The Netherland. Jacomar J.M. van Koesveld, MD, ICU, IJsselland Ziekenhuis, Capelle aan den IJssel, The Netherlands. Koen S. Simons, MD, PhD, Department of Intensive Care, Jeroen Bosch Ziekenhuis, Den Bosch, The Netherlands. Joost Labout, MD, PhD, ICU, Maasstad Ziekenhuis Rotterdam, The Netherlands. Bart van de Gaauw, Martini ziekenhuis, Groningen, The Netherlands. Michael Kuiper, Intensive Care, Medisch Centrum Leeuwarden, Leeuwarden, The Netherlands. Albertus Beishuizen, MD, PhD, Department of Intensive Care, Medisch Spectrum Twente, Enschede, The Netherlands. Dennis Geutjes, Department of Information Technology, Slingeland Ziekenhuis, Doetinchem, The Netherlands. Johan Lutisan, MD, ICU, WZA, Assen, The Netherlands. Bart P. Grady, MD, PhD, Department of Intensive Care, Ziekenhuisgroep Twente, Almelo, The Netherlands. Remko van den Akker, Intensive Care, Adrz, Goes, The Netherlands. Tom A. Rijpstra, MD, Department of Anesthesiology, Intensive Care and Pain Medicine, Amphia Ziekenhuis, Breda, The Netherlands. Wim G. Boersma, MD, PhD, Department of Pulmonology, Northwest Clinics, Alkmaar, The Netherlands. From collaborating hospitals having signed the data sharing agreement: Dani{\"e}l Pretorius, MD, Department of Intensive Care Medicine, Hospital St Jansdal, Harderwijk, The Netherlands. Menno Beukema, MD, Department of Intensive Care, Streekziekenhuis Koningin Beatrix, Winterswijk, The Netherlands. Bram Simons, MD, Intensive Care, Bravis Ziekenhuis, Bergen op Zoom en Roosendaal, The Netherlands. A.A. Rijkeboer, MD, ICU, Flevoziekenhuis, Almere, The Netherlands. Marcel Aries, MD, PhD, MUMC+, University Maastricht, Maastricht, The Netherlands. Niels C. Gritters van den Oever, MD, Intensive Care, Treant Zorggroep, Emmen, The Netherlands. Martijn van Tellingen, MD, EDIC, Department of Intensive Care Medicine, afdeling Intensive Care, ziekenhuis Tjongerschans, Heerenveen, The Netherlands. Annemieke Dijkstra, MD, Department of Intensive Care Medicine, Het Van Weel-Bethesda Ziekenhuis, Dirksland, The Netherlands. Rutger van Raalte, Department of Intensive Care, Tergooi hospital, Hilversum, The Netherlands. Publisher Copyright: {\textcopyright} 2023 The Author(s)",

year = "2023",

month = oct,

day = "1",

doi = "https://doi.org/10.1016/j.ijmedinf.2023.105200",

language = "English",

volume = "178",

journal = "International Journal of Medical Informatics",

issn = "1386-5056",

publisher = "Elsevier Ireland Ltd",

}

TY - JOUR

T1 - Determining and assessing characteristics of data element names impacting the performance of annotation using Usagi

AU - de Groot, Rowdy

AU - Püttmann, Daniel P.

AU - Dutch ICU Data Sharing Against COVID-19 Collaborators

AU - Fleuren, Lucas M.

AU - Thoral, Patrick J.

AU - Elbers, Paul W. G.

AU - de Keizer, Nicolette F.

AU - Cornet, Ronald

N1 - Funding Information: The Dutch ICU Data Sharing Against COVID-19 Collaborators: Diederik Gommers, MD, PhD, Department of Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands. Olaf L. Cremer, MD, PhD, Intensive Care, UMC Utrecht, Utrecht, The Netherlands. Rob J. Bosman, MD, ICU, OLVG, Amsterdam, The Netherlands. Sander Rigter, MD, Department of Anesthesiology and Intensive Care, St. Antonius Hospital, Nieuwegein, The Netherlands. Evert-Jan Wils, MD, PhD, Department of Intensive Care, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands. Tim Frenzel, MD, PhD, Department of Intensive Care Medicine, Radboud University Medical Center, Nijmegen, The Netherlands. Dave A. Dongelmans, MD, PhD, Department of Intensive Care Medicine, Amsterdam UMC, Amsterdam, The Netherlands. Remko de Jong, MD, Intensive Care, Bovenij Ziekenhuis, Amsterdam, The Netherlands. Marco A.A. Peters, MD, Intensive Care, Canisius Wilhelmina Ziekenhuis, Nijmegen, The Netherlands. Marlijn J.A. Kamps, MD, Intensive Care, Catharina Ziekenhuis Eindhoven, Eindhoven, The Netherlands. Dharmanand Ramnarain, MD, Department of Intensive Care, ETZ Tilburg, Tilburg, The Netherlands. Ralph Nowitzky, MD, Intensive Care, HagaZiekenhuis, Den Haag, The Netherlands. Fleur G.C.A. Nooteboom, MD, Intensive Care, Laurentius Ziekenhuis, Roermond, The Netherlands. Wouter de Ruijter, MD, PhD, Department of Intensive Care Medicine, Northwest Clinics, Alkmaar, The Netherlands. Louise C. Urlings-Strop, MD, PhD, Intensive Care, Reinier de Graaf Gasthuis, Delft, The Netherlands. Ellen G.M. Smit, MD, Intensive Care, Spaarne Gasthuis, Haarlem en Hoofddorp, The Netherlands. D. Jannet Mehagnoul-Schipper, MD, PhD, Intensive Care, VieCuri Medisch Centrum, Venlo, The Netherlands. Tom Dormans, MD, PhD, Intensive care, Zuyderland MC, Heerlen, The Netherlands. Cornelis P.C. de Jager, MD, PhD, Department of Intensive Care, Jeroen Bosch Ziekenhuis, Den Bosch, The Netherlands. Stefaan H.A. Hendriks, MD, Intensive Care, Albert Schweitzerziekenhuis, Dordrecht, The Netherlands. Sefanja Achterberg, MD, PhD, ICU, Haaglanden Medisch Centrum, Den Haag, The Netherlands. Evelien Oostdijk, MD, PhD, ICU, Maasstad Ziekenhuis Rotterdam, Rotterdam, The Netherlands. Auke C. Reidinga, MD, ICU, BWC, Martiniziekenhuis, Groningen. Barbara Festen-Spanjer, MD, Intensive Care, Ziekenhuis Gelderse Vallei, Ede, The Netherlands. Gert B. Brunnekreef, MD, Department of Intensive Care, Ziekenhuisgroep Twente, Almelo, The Netherlands. Alexander D. Cornet, MD, PhD, FRCP, Department of Intensive Care, Medisch Spectrum Twente, Enschede, The Netherlands. Walter van den Tempel, MD, Department of Intensive Care, Ikazia Ziekenhuis Rotterdam, Rotterdam, The Netherlands. Age D. Boelens, MD, Anesthesiology, Antonius Ziekenhuis Sneek, Sneek, The Netherlands. Peter Koetsier, MD, Intensive Care, Medisch Centrum Leeuwarden, Leeuwarden, The Netherlands. Judith Lens, MD, ICU, IJsselland Ziekenhuis, Capelle aan den IJssel, The Netherlands. Harald J. Faber, MD, ICU, WZA, Assen, The Netherlands. A. Karakus, MD, Department of Intensive Care, Diakonessenhuis Hospital, Utrecht, The Netherlands. Robert Entjes, MD, Department of Intensive Care, Adrz, Goes, The Netherlands. Paul de Jong, MD, Department of Anesthesia and Intensive Care, Slingeland Ziekenhuis, Doetinchem, The Netherlands. Thijs C.D. Rettig, MD, PhD, Department of Anesthesiology, Intensive Care and Pain Medicine, Amphia Ziekenhuis, Breda, The Netherlands. Sesmu Arbous, MD, PhD, Intensivist, LUMC, Leiden, The Netherlands. Tariq A. Dam, MD, Department of Intensive Care Medicine, Laboratory for Critical Care Computational Intelligence, Amsterdam Medical Data Science, Amsterdam UMC. Sebastiaan J.J. Vonk, MSc, Pacmed, Amsterdam, The Netherlands. Tomas Machado, Pacmed, Amsterdam, The Netherlands. Willem E. Herter, BSc, Pacmed, Amsterdam, The Netherlands. From collaborating hospitals having shared data: Julia Koeter, MD, Intensive Care, Canisius Wilhelmina Ziekenhuis, Nijmegen, The Netherlands. Roger van Rietschote, Business Intelligence, Haaglanden MC, Den Haag, The Netherlands. M.C. Reuland, MD, Department of Intensive Care Medicine, Amsterdam UMC, Universiteit van Amsterdam, Amsterdam, The Netherlands. Laura van Manen, MD, Department of Intensive Care, BovenIJ Ziekenhuis, Amsterdam, The Netherlands. Leon Montenij, MD, PhD, Department of Anesthesiology, Pain Management and Intensive Care, Catharina Ziekenhuis Eindhoven, Eindhoven, The Netherlands. Jasper van Bommel, MD, PhD, Department of Intensive Care, Erasmus Medical Center, Rotterdam, The Netherlands. Roy van den Berg, Department of Intensive Care, ETZ Tilburg, Tilburg, The Netherlands. Ellen van Geest, Department of ICMT, Haga Ziekenhuis, Den Haag, The Netherlands. Anisa Hana, MD, PhD, Intensive Care, Laurentius Ziekenhuis, Roermond, The Netherlands. B. van den Bogaard, MD, PhD, ICU, OLVG, Amsterdam, The Netherlands. Prof. Peter Pickkers, Department of Intensive Care Medicine, Radboud University Medical Centre, Nijmegen, The Netherlands. Pim van der Heiden, MD, PhD, Intensive Care, Reinier de Graaf Gasthuis, Delft, The Netherlands. Claudia (C.W.) van Gemeren, MD, Intensive Care, Spaarne Gasthuis, Haarlem en Hoofddorp, The Netherlands. Arend Jan Meinders, MD, Department of Internal Medicine and Intensive Care, St Antonius Hospital, Nieuwegein, The Netherlands. Martha de Bruin, MD, Department of Intensive Care, Franciscus Gasthuis & Vlietland, Rotterdam, The Netherlands. Emma Rademaker, MD, MSc, Department of Intensive Care, UMC Utrecht, Utrecht, The Netherlands. Frits H.M. van Osch, PhD, Department of Clinical Epidemiology, VieCuri Medisch Centrum, Venlo, The Netherlands. Martijn de Kruif, MD, PhD, Department of Pulmonology, Zuyderland MC, Heerlen, The Netherlands. Nicolas Schroten, MD, Intensive Care, Albert Schweitzerziekenhuis, Dordrecht, The Netherlands. Klaas Sierk Arnold, MD, Anesthesiology, Antonius Ziekenhuis Sneek, Sneek, The Netherlands. J.W. Fijen, MD, PhD, Department of Intensive Care, Diakonessenhuis Hospital, Utrecht, The Netherland. Jacomar J.M. van Koesveld, MD, ICU, IJsselland Ziekenhuis, Capelle aan den IJssel, The Netherlands. Koen S. Simons, MD, PhD, Department of Intensive Care, Jeroen Bosch Ziekenhuis, Den Bosch, The Netherlands. Joost Labout, MD, PhD, ICU, Maasstad Ziekenhuis Rotterdam, The Netherlands. Bart van de Gaauw, Martini ziekenhuis, Groningen, The Netherlands. Michael Kuiper, Intensive Care, Medisch Centrum Leeuwarden, Leeuwarden, The Netherlands. Albertus Beishuizen, MD, PhD, Department of Intensive Care, Medisch Spectrum Twente, Enschede, The Netherlands. Dennis Geutjes, Department of Information Technology, Slingeland Ziekenhuis, Doetinchem, The Netherlands. Johan Lutisan, MD, ICU, WZA, Assen, The Netherlands. Bart P. Grady, MD, PhD, Department of Intensive Care, Ziekenhuisgroep Twente, Almelo, The Netherlands. Remko van den Akker, Intensive Care, Adrz, Goes, The Netherlands. Tom A. Rijpstra, MD, Department of Anesthesiology, Intensive Care and Pain Medicine, Amphia Ziekenhuis, Breda, The Netherlands. Wim G. Boersma, MD, PhD, Department of Pulmonology, Northwest Clinics, Alkmaar, The Netherlands. From collaborating hospitals having signed the data sharing agreement: Daniël Pretorius, MD, Department of Intensive Care Medicine, Hospital St Jansdal, Harderwijk, The Netherlands. Menno Beukema, MD, Department of Intensive Care, Streekziekenhuis Koningin Beatrix, Winterswijk, The Netherlands. Bram Simons, MD, Intensive Care, Bravis Ziekenhuis, Bergen op Zoom en Roosendaal, The Netherlands. A.A. Rijkeboer, MD, ICU, Flevoziekenhuis, Almere, The Netherlands. Marcel Aries, MD, PhD, MUMC+, University Maastricht, Maastricht, The Netherlands. Niels C. Gritters van den Oever, MD, Intensive Care, Treant Zorggroep, Emmen, The Netherlands. Martijn van Tellingen, MD, EDIC, Department of Intensive Care Medicine, afdeling Intensive Care, ziekenhuis Tjongerschans, Heerenveen, The Netherlands. Annemieke Dijkstra, MD, Department of Intensive Care Medicine, Het Van Weel-Bethesda Ziekenhuis, Dirksland, The Netherlands. Rutger van Raalte, Department of Intensive Care, Tergooi hospital, Hilversum, The Netherlands. Publisher Copyright: © 2023 The Author(s)

PY - 2023/10/1

Y1 - 2023/10/1

N2 - Introduction: Hospitals generate large amounts of data and this data is generally modeled and labeled in a proprietary way, hampering its exchange and integration. Manually annotating data element names to internationally standardized data element identifiers is a time-consuming effort. Tools can support performing this task automatically. This study aimed to determine what factors influence the quality of automatic annotations. Methods: Data element names were used from the Dutch COVID-19 ICU Data Warehouse containing data on intensive care patients with COVID-19 from 25 hospitals in the Netherlands. In this data warehouse, the data had been merged using a proprietary terminology system while also storing the original hospital labels (synonymous names). Usagi, an OHDSI annotation tool, was used to perform the annotation for the data. A gold standard was used to determine if Usagi made correct annotations. Logistic regression was used to determine if the number of characters, number of words, match score (Usagi's certainty) and hospital label origin influenced Usagi's performance to annotate correctly. Results: Usagi automatically annotated 30.5% of the data element names correctly and 5.5% of the synonymous names. The match score is the best predictor for Usagi finding the correct annotation. It was determined that the AUC of data element names was 0.651 and 0.752 for the synonymous names respectively. The AUC for the individual hospital label origins varied between 0.460 to 0.905. Discussion: The results show that Usagi performed better to annotate the data element names than the synonymous names. The hospital origin in the synonymous names dataset was associated with the amount of correctly annotated concepts. Hospitals that performed better had shorter synonymous names and fewer words. Using shorter data element names or synonymous names should be considered to optimize the automatic annotating process. Overall, the performance of Usagi is too poor to completely rely on for automatic annotation.

AB - Introduction: Hospitals generate large amounts of data and this data is generally modeled and labeled in a proprietary way, hampering its exchange and integration. Manually annotating data element names to internationally standardized data element identifiers is a time-consuming effort. Tools can support performing this task automatically. This study aimed to determine what factors influence the quality of automatic annotations. Methods: Data element names were used from the Dutch COVID-19 ICU Data Warehouse containing data on intensive care patients with COVID-19 from 25 hospitals in the Netherlands. In this data warehouse, the data had been merged using a proprietary terminology system while also storing the original hospital labels (synonymous names). Usagi, an OHDSI annotation tool, was used to perform the annotation for the data. A gold standard was used to determine if Usagi made correct annotations. Logistic regression was used to determine if the number of characters, number of words, match score (Usagi's certainty) and hospital label origin influenced Usagi's performance to annotate correctly. Results: Usagi automatically annotated 30.5% of the data element names correctly and 5.5% of the synonymous names. The match score is the best predictor for Usagi finding the correct annotation. It was determined that the AUC of data element names was 0.651 and 0.752 for the synonymous names respectively. The AUC for the individual hospital label origins varied between 0.460 to 0.905. Discussion: The results show that Usagi performed better to annotate the data element names than the synonymous names. The hospital origin in the synonymous names dataset was associated with the amount of correctly annotated concepts. Hospitals that performed better had shorter synonymous names and fewer words. Using shorter data element names or synonymous names should be considered to optimize the automatic annotating process. Overall, the performance of Usagi is too poor to completely rely on for automatic annotation.

KW - Data annotation

KW - Data interoperability

KW - Data quality

KW - Data standardization

KW - OMOP CDM

KW - Usagi

UR - http://www.scopus.com/inward/record.url?scp=85170290378&partnerID=8YFLogxK

U2 - https://doi.org/10.1016/j.ijmedinf.2023.105200

DO - https://doi.org/10.1016/j.ijmedinf.2023.105200

M3 - Article

C2 - 37703800

SN - 1386-5056

VL - 178

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

M1 - 105200

ER -

Determining and assessing characteristics of data element names impacting the performance of annotation using Usagi

Abstract

Keywords

Access to Document

Other files and links

Cite this