Methods for evaluation of medical terminological systems - A literature review and a case study

D.G.T. Arts; R. Cornet; E. de Jonge; N.F. de Keizer

Methods for evaluation of medical terminological systems - A literature review and a case study

D.G.T. Arts, R. Cornet, E. de Jonge, N.F. de Keizer

Research output: Contribution to journal › Article › Academic › peer-review

13 Citations (Scopus)

Abstract

OBJECTIVES: The usability of terminological systems (TSs) strongly depends on the coverage and correctness of their content. The objective of this study was to create a literature overview of aspects related to the content of TSs and of methods for the evaluation of the content of TSs. The extent to which these methods overlap or complement each other is investigated. METHODS: We reviewed literature and composed definitions for aspects of the evaluation of the content of TSs. Of the methods described in literature three were selected: 1) Concept matching in which two samples of concepts representing a) documentation of reasons for admission in daily care practice and b) aggregation of patient groups for research, are looked up in the TS in order to assess its coverage; 2) Formal algorithmic evaluation in which reasoning on the formally represented content is used to detect inconsistencies; and 3) Expert review in which a random sample of concepts are checked for incorrect and incomplete terms and relations. These evaluation methods were applied in a case study on the locally developed TS DICE (Diagnoses for Intensive Care Evaluation). RESULTS: None of the applied methods covered all the aspects of the content of a TS. The results of concept matching differed for the two use cases (63% vs. 52% perfect matches). Expert review revealed many more errors and incompleteness than formal algorithmic evaluation. CONCLUSIONS: To evaluate the content of a TS, using a combination of evaluation methods is preferable. Different representative samples, reflecting the uses of TSs, lead to different results for concept matching. Expert review appears to be very valuable, but time consuming. Formal algorithmic evaluation has the potential to decrease the workload of human reviewers but detects only logical inconsistencies. Further research is required to exploit the potentials of formal algorithmic evaluation

Original language	Undefined/Unknown
Pages (from-to)	616-625
Journal	Methods of information in medicine
Volume	44
Issue number	5
Publication status	Published - 2005

Keywords

AMC wi-eigen

Access to Document

https://pure.uva.nl/ws/files/3922534/45624_207164y.pdf

Cite this

@article{5d7c7384c91c4269889ee01a90461e90,

title = "Methods for evaluation of medical terminological systems - A literature review and a case study",

abstract = "OBJECTIVES: The usability of terminological systems (TSs) strongly depends on the coverage and correctness of their content. The objective of this study was to create a literature overview of aspects related to the content of TSs and of methods for the evaluation of the content of TSs. The extent to which these methods overlap or complement each other is investigated. METHODS: We reviewed literature and composed definitions for aspects of the evaluation of the content of TSs. Of the methods described in literature three were selected: 1) Concept matching in which two samples of concepts representing a) documentation of reasons for admission in daily care practice and b) aggregation of patient groups for research, are looked up in the TS in order to assess its coverage; 2) Formal algorithmic evaluation in which reasoning on the formally represented content is used to detect inconsistencies; and 3) Expert review in which a random sample of concepts are checked for incorrect and incomplete terms and relations. These evaluation methods were applied in a case study on the locally developed TS DICE (Diagnoses for Intensive Care Evaluation). RESULTS: None of the applied methods covered all the aspects of the content of a TS. The results of concept matching differed for the two use cases (63% vs. 52% perfect matches). Expert review revealed many more errors and incompleteness than formal algorithmic evaluation. CONCLUSIONS: To evaluate the content of a TS, using a combination of evaluation methods is preferable. Different representative samples, reflecting the uses of TSs, lead to different results for concept matching. Expert review appears to be very valuable, but time consuming. Formal algorithmic evaluation has the potential to decrease the workload of human reviewers but detects only logical inconsistencies. Further research is required to exploit the potentials of formal algorithmic evaluation",

keywords = "AMC wi-eigen",

author = "D.G.T. Arts and R. Cornet and {de Jonge}, E. and {de Keizer}, N.F.",

note = "delen [20-4-2006 KW]",

year = "2005",

language = "Undefined/Unknown",

volume = "44",

pages = "616--625",

journal = "Methods of information in medicine",

issn = "0026-1270",

publisher = "Schattauer GmbH",

number = "5",

}

TY - JOUR

T1 - Methods for evaluation of medical terminological systems - A literature review and a case study

AU - Arts, D.G.T.

AU - Cornet, R.

AU - de Jonge, E.

AU - de Keizer, N.F.

N1 - delen [20-4-2006 KW]

PY - 2005

Y1 - 2005

N2 - OBJECTIVES: The usability of terminological systems (TSs) strongly depends on the coverage and correctness of their content. The objective of this study was to create a literature overview of aspects related to the content of TSs and of methods for the evaluation of the content of TSs. The extent to which these methods overlap or complement each other is investigated. METHODS: We reviewed literature and composed definitions for aspects of the evaluation of the content of TSs. Of the methods described in literature three were selected: 1) Concept matching in which two samples of concepts representing a) documentation of reasons for admission in daily care practice and b) aggregation of patient groups for research, are looked up in the TS in order to assess its coverage; 2) Formal algorithmic evaluation in which reasoning on the formally represented content is used to detect inconsistencies; and 3) Expert review in which a random sample of concepts are checked for incorrect and incomplete terms and relations. These evaluation methods were applied in a case study on the locally developed TS DICE (Diagnoses for Intensive Care Evaluation). RESULTS: None of the applied methods covered all the aspects of the content of a TS. The results of concept matching differed for the two use cases (63% vs. 52% perfect matches). Expert review revealed many more errors and incompleteness than formal algorithmic evaluation. CONCLUSIONS: To evaluate the content of a TS, using a combination of evaluation methods is preferable. Different representative samples, reflecting the uses of TSs, lead to different results for concept matching. Expert review appears to be very valuable, but time consuming. Formal algorithmic evaluation has the potential to decrease the workload of human reviewers but detects only logical inconsistencies. Further research is required to exploit the potentials of formal algorithmic evaluation

AB - OBJECTIVES: The usability of terminological systems (TSs) strongly depends on the coverage and correctness of their content. The objective of this study was to create a literature overview of aspects related to the content of TSs and of methods for the evaluation of the content of TSs. The extent to which these methods overlap or complement each other is investigated. METHODS: We reviewed literature and composed definitions for aspects of the evaluation of the content of TSs. Of the methods described in literature three were selected: 1) Concept matching in which two samples of concepts representing a) documentation of reasons for admission in daily care practice and b) aggregation of patient groups for research, are looked up in the TS in order to assess its coverage; 2) Formal algorithmic evaluation in which reasoning on the formally represented content is used to detect inconsistencies; and 3) Expert review in which a random sample of concepts are checked for incorrect and incomplete terms and relations. These evaluation methods were applied in a case study on the locally developed TS DICE (Diagnoses for Intensive Care Evaluation). RESULTS: None of the applied methods covered all the aspects of the content of a TS. The results of concept matching differed for the two use cases (63% vs. 52% perfect matches). Expert review revealed many more errors and incompleteness than formal algorithmic evaluation. CONCLUSIONS: To evaluate the content of a TS, using a combination of evaluation methods is preferable. Different representative samples, reflecting the uses of TSs, lead to different results for concept matching. Expert review appears to be very valuable, but time consuming. Formal algorithmic evaluation has the potential to decrease the workload of human reviewers but detects only logical inconsistencies. Further research is required to exploit the potentials of formal algorithmic evaluation

KW - AMC wi-eigen

M3 - Article

C2 - 16400369

SN - 0026-1270

VL - 44

SP - 616

EP - 625

JO - Methods of information in medicine

JF - Methods of information in medicine

IS - 5

ER -