Development and reliability of performance indicators for measuring adherence to a guideline for depression by insurance physicians

A.J.M. Schellart; F. Zwerver; D.L. Knol; J.R. Anema; A.J. van der Beek

doi:https://doi.org/10.3109/09638288.2011.579222

Development and reliability of performance indicators for measuring adherence to a guideline for depression by insurance physicians

A.J.M. Schellart, F. Zwerver, D.L. Knol, J.R. Anema, A.J. van der Beek

Research output: Contribution to journal › Article › Academic › peer-review

5 Citations (Scopus)

Abstract

Introduction.We wanted to measure adherence to the guideline for depression in disability assessments. The research questions we addressed were: How can we develop performance indicators (PIs) for adherence to the Dutch guideline for disability assessment of patients with depression and how can we measure the quality of the scores? What is the inter-rater reliability of these PIs? What is the quality of the PI scores? Methods.PIs, developed by the researchers, were reviewed on various aspects, by a panel of seven experts in several consulting rounds. After adjustments, senior insurance physicians (IPs) attended two training sessions and scored the PIs on 10 different simulated case reports. Two researchers developed proxy 'gold standard' scores for these 10 case reports. To assess the inter-rater reliability and the quality of the scores, we calculated the intra-class correlations (ICC) and 95% confidence intervals (CI) of the PI scores and of the PI scores compared to the proxy 'gold standard', respectively. Results.Six specific and relevant PIs resulted from the consultation of the panel of experts. The PI scores for the 10 case reports, rated by seven (of the eight) senior IPs who completed both training sessions, showed that the PIs were not reliable at individual level (ICC=0.543; 95% CI 0.4260.642). However, the ICC became more reliable as an average of two raters was calculated (ICC=0.704). The ICC of the PI scores with the proxy 'gold standard' was 0.538 (95% CI 0.4190.640), but the quality was higher when calculated as an average of two raters (ICC=0.700). Conclusion.The PIs for adherence to the guideline were sufficiently reliable, and the quality of their scores was adequate if at least two well-trained raters were involved. The senior IPs evaluated the feasibility of the PIs as good, with a prerequisite of sufficient training. This method may be interesting for measuring guideline adherence and quality of disability assessments in general. © 2011 Informa UK, Ltd.

Original language	English
Pages (from-to)	2535-2543
Journal	Disability and rehabilitation
Volume	33
DOIs	https://doi.org/10.3109/09638288.2011.579222
Publication status	Published - 2011

Access to Document

https://doi.org/10.3109/09638288.2011.579222

Cite this

@article{bb40e133971c41d9b7c73e3b364cb4e2,

title = "Development and reliability of performance indicators for measuring adherence to a guideline for depression by insurance physicians",

abstract = "Introduction.We wanted to measure adherence to the guideline for depression in disability assessments. The research questions we addressed were: How can we develop performance indicators (PIs) for adherence to the Dutch guideline for disability assessment of patients with depression and how can we measure the quality of the scores? What is the inter-rater reliability of these PIs? What is the quality of the PI scores? Methods.PIs, developed by the researchers, were reviewed on various aspects, by a panel of seven experts in several consulting rounds. After adjustments, senior insurance physicians (IPs) attended two training sessions and scored the PIs on 10 different simulated case reports. Two researchers developed proxy 'gold standard' scores for these 10 case reports. To assess the inter-rater reliability and the quality of the scores, we calculated the intra-class correlations (ICC) and 95% confidence intervals (CI) of the PI scores and of the PI scores compared to the proxy 'gold standard', respectively. Results.Six specific and relevant PIs resulted from the consultation of the panel of experts. The PI scores for the 10 case reports, rated by seven (of the eight) senior IPs who completed both training sessions, showed that the PIs were not reliable at individual level (ICC=0.543; 95% CI 0.4260.642). However, the ICC became more reliable as an average of two raters was calculated (ICC=0.704). The ICC of the PI scores with the proxy 'gold standard' was 0.538 (95% CI 0.4190.640), but the quality was higher when calculated as an average of two raters (ICC=0.700). Conclusion.The PIs for adherence to the guideline were sufficiently reliable, and the quality of their scores was adequate if at least two well-trained raters were involved. The senior IPs evaluated the feasibility of the PIs as good, with a prerequisite of sufficient training. This method may be interesting for measuring guideline adherence and quality of disability assessments in general. {\textcopyright} 2011 Informa UK, Ltd.",

author = "A.J.M. Schellart and F. Zwerver and D.L. Knol and J.R. Anema and {van der Beek}, A.J.",

year = "2011",

doi = "https://doi.org/10.3109/09638288.2011.579222",

language = "English",

volume = "33",

pages = "2535--2543",

journal = "Disability and rehabilitation",

issn = "0963-8288",

publisher = "Informa Healthcare",

}

TY - JOUR

T1 - Development and reliability of performance indicators for measuring adherence to a guideline for depression by insurance physicians

AU - Schellart, A.J.M.

AU - Zwerver, F.

AU - Knol, D.L.

AU - Anema, J.R.

AU - van der Beek, A.J.

PY - 2011

Y1 - 2011

N2 - Introduction.We wanted to measure adherence to the guideline for depression in disability assessments. The research questions we addressed were: How can we develop performance indicators (PIs) for adherence to the Dutch guideline for disability assessment of patients with depression and how can we measure the quality of the scores? What is the inter-rater reliability of these PIs? What is the quality of the PI scores? Methods.PIs, developed by the researchers, were reviewed on various aspects, by a panel of seven experts in several consulting rounds. After adjustments, senior insurance physicians (IPs) attended two training sessions and scored the PIs on 10 different simulated case reports. Two researchers developed proxy 'gold standard' scores for these 10 case reports. To assess the inter-rater reliability and the quality of the scores, we calculated the intra-class correlations (ICC) and 95% confidence intervals (CI) of the PI scores and of the PI scores compared to the proxy 'gold standard', respectively. Results.Six specific and relevant PIs resulted from the consultation of the panel of experts. The PI scores for the 10 case reports, rated by seven (of the eight) senior IPs who completed both training sessions, showed that the PIs were not reliable at individual level (ICC=0.543; 95% CI 0.4260.642). However, the ICC became more reliable as an average of two raters was calculated (ICC=0.704). The ICC of the PI scores with the proxy 'gold standard' was 0.538 (95% CI 0.4190.640), but the quality was higher when calculated as an average of two raters (ICC=0.700). Conclusion.The PIs for adherence to the guideline were sufficiently reliable, and the quality of their scores was adequate if at least two well-trained raters were involved. The senior IPs evaluated the feasibility of the PIs as good, with a prerequisite of sufficient training. This method may be interesting for measuring guideline adherence and quality of disability assessments in general. © 2011 Informa UK, Ltd.

AB - Introduction.We wanted to measure adherence to the guideline for depression in disability assessments. The research questions we addressed were: How can we develop performance indicators (PIs) for adherence to the Dutch guideline for disability assessment of patients with depression and how can we measure the quality of the scores? What is the inter-rater reliability of these PIs? What is the quality of the PI scores? Methods.PIs, developed by the researchers, were reviewed on various aspects, by a panel of seven experts in several consulting rounds. After adjustments, senior insurance physicians (IPs) attended two training sessions and scored the PIs on 10 different simulated case reports. Two researchers developed proxy 'gold standard' scores for these 10 case reports. To assess the inter-rater reliability and the quality of the scores, we calculated the intra-class correlations (ICC) and 95% confidence intervals (CI) of the PI scores and of the PI scores compared to the proxy 'gold standard', respectively. Results.Six specific and relevant PIs resulted from the consultation of the panel of experts. The PI scores for the 10 case reports, rated by seven (of the eight) senior IPs who completed both training sessions, showed that the PIs were not reliable at individual level (ICC=0.543; 95% CI 0.4260.642). However, the ICC became more reliable as an average of two raters was calculated (ICC=0.704). The ICC of the PI scores with the proxy 'gold standard' was 0.538 (95% CI 0.4190.640), but the quality was higher when calculated as an average of two raters (ICC=0.700). Conclusion.The PIs for adherence to the guideline were sufficiently reliable, and the quality of their scores was adequate if at least two well-trained raters were involved. The senior IPs evaluated the feasibility of the PIs as good, with a prerequisite of sufficient training. This method may be interesting for measuring guideline adherence and quality of disability assessments in general. © 2011 Informa UK, Ltd.

U2 - https://doi.org/10.3109/09638288.2011.579222

DO - https://doi.org/10.3109/09638288.2011.579222

M3 - Article

C2 - 21585252

SN - 0963-8288

VL - 33

SP - 2535

EP - 2543

JO - Disability and rehabilitation

JF - Disability and rehabilitation

ER -