TY - JOUR
T1 - Clinicians are right not to like Cohen's kappa
AU - de Vet, H.C.W.
AU - Mokkink, L.B.
AU - Terwee, C.B.
AU - Hoekstra, O.S.
AU - Knol, D.L.
PY - 2013
Y1 - 2013
N2 - Clinicians are interested in observer variation in terms of the probability of other raters (interobserver) or themselves (intraobserver) obtaining the same answer. Cohen's κ is commonly used in the medical literature to express such agreement in categorical outcomes. The value of Cohen's κ, however, is not sufficiently informative because it is a relative measure, while the clinician's question of observer variation calls for an absolute measure. Using an example in which the observed agreement and κ lead to different conclusions, we illustrate that percentage agreement is an absolute measure (a measure of agreement) and that κ is a relative measure (a measure of reliability). For the data to be useful for clinicians, measures of agreement should be used. The proportion of specific agreement, expressing the agreement separately for the positive and the negative ratings, is the most appropriate measure for conveying the relevant information in a 2 × 2 table and is most informative for clinicians.
AB - Clinicians are interested in observer variation in terms of the probability of other raters (interobserver) or themselves (intraobserver) obtaining the same answer. Cohen's κ is commonly used in the medical literature to express such agreement in categorical outcomes. The value of Cohen's κ, however, is not sufficiently informative because it is a relative measure, while the clinician's question of observer variation calls for an absolute measure. Using an example in which the observed agreement and κ lead to different conclusions, we illustrate that percentage agreement is an absolute measure (a measure of agreement) and that κ is a relative measure (a measure of reliability). For the data to be useful for clinicians, measures of agreement should be used. The proportion of specific agreement, expressing the agreement separately for the positive and the negative ratings, is the most appropriate measure for conveying the relevant information in a 2 × 2 table and is most informative for clinicians.
U2 - https://doi.org/10.1136/bmj.f2125
DO - https://doi.org/10.1136/bmj.f2125
M3 - Article
SN - 0959-535X
VL - 346
SP - 1
EP - 7
JO - British medical journal
JF - British medical journal
M1 - f2125
ER -