Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study

AUTHOR GROUP

doi:https://doi.org/10.1016/j.annemergmed.2016.12.025

Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study

AUTHOR GROUP

Research output: Contribution to journal › Article › Academic › peer-review

31 Citations (Scopus)

Abstract

Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs. We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality. One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8). The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment

Original language	English
Pages (from-to)	394-401
Journal	Annals of emergency medicine
Volume	70
Issue number	3
DOIs	https://doi.org/10.1016/j.annemergmed.2016.12.025
Publication status	Published - 2017

Access to Document

https://doi.org/10.1016/j.annemergmed.2016.12.025

Cite this

@article{656e76694a3d45749b4bd98c3ca186ec,

title = "Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study",

abstract = "Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs. We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality. One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8). The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment",

author = "Brent Thoma and Sebok-Syer, {Stefanie S.} and Keeth Krishnan and Marshall Siemens and Trueger, {N. Seth} and Isabelle Colmers-Gray and Rob Woods and Emil Petrusa and Teresa Chan and {AUTHOR GROUP} and Charlotte Alexander and Mohammed Alkhalifah and Saeed Alqahtani and Scott Anderson and Shelaina Anderson and Colin Andrews and Jocelyn Andruko and Felix Ankel and Nikytha Antony and Diptesh Aryal and Barbra Backus and Jennifer Baird and Andrew Baker and Sarah Batty and Jared Baylis and Braeden Beaumont and Chris Belcher and Brent Benavides and Michael Benham and Pelletier, {Elyse Berger} and Julian Botta and Nicholas Bouchard and Victoria Brazil and Emily Brumfield and Anthony Bryson and Wisarut Bunchit and Kat Butler and Lindy Buzikievich and David Calcara and Rob Carey and Stephen Carroll and Casey Lyons and Louise Cassidy and Kirsty Challen and Tim Chaplin and Natasha Chatham-Zvelebil and Eric Chen and Lucy Chen and Sushant Chhabra and Alvin Chin and Milan Ridderikhof",

year = "2017",

doi = "https://doi.org/10.1016/j.annemergmed.2016.12.025",

language = "English",

volume = "70",

pages = "394--401",

journal = "Annals of emergency medicine",

issn = "0196-0644",

publisher = "Mosby Inc.",

number = "3",

}

TY - JOUR

T1 - Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study

AU - Thoma, Brent

AU - Sebok-Syer, Stefanie S.

AU - Krishnan, Keeth

AU - Siemens, Marshall

AU - Trueger, N. Seth

AU - Colmers-Gray, Isabelle

AU - Woods, Rob

AU - Petrusa, Emil

AU - Chan, Teresa

AU - AUTHOR GROUP

AU - Alexander, Charlotte

AU - Alkhalifah, Mohammed

AU - Alqahtani, Saeed

AU - Anderson, Scott

AU - Anderson, Shelaina

AU - Andrews, Colin

AU - Andruko, Jocelyn

AU - Ankel, Felix

AU - Antony, Nikytha

AU - Aryal, Diptesh

AU - Backus, Barbra

AU - Baird, Jennifer

AU - Baker, Andrew

AU - Batty, Sarah

AU - Baylis, Jared

AU - Beaumont, Braeden

AU - Belcher, Chris

AU - Benavides, Brent

AU - Benham, Michael

AU - Pelletier, Elyse Berger

AU - Botta, Julian

AU - Bouchard, Nicholas

AU - Brazil, Victoria

AU - Brumfield, Emily

AU - Bryson, Anthony

AU - Bunchit, Wisarut

AU - Butler, Kat

AU - Buzikievich, Lindy

AU - Calcara, David

AU - Carey, Rob

AU - Carroll, Stephen

AU - Lyons, Casey

AU - Cassidy, Louise

AU - Challen, Kirsty

AU - Chaplin, Tim

AU - Chatham-Zvelebil, Natasha

AU - Chen, Eric

AU - Chen, Lucy

AU - Chhabra, Sushant

AU - Chin, Alvin

AU - Ridderikhof, Milan

PY - 2017

Y1 - 2017

N2 - Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs. We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality. One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8). The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment

AB - Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs. We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality. One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8). The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment

U2 - https://doi.org/10.1016/j.annemergmed.2016.12.025

DO - https://doi.org/10.1016/j.annemergmed.2016.12.025

M3 - Article

C2 - 28262317

SN - 0196-0644

VL - 70

SP - 394

EP - 401

JO - Annals of emergency medicine

JF - Annals of emergency medicine

IS - 3

ER -