TY - JOUR
T1 - Individual Gestalt Is Unreliable for the Evaluation of Quality in Medical Education Blogs: A METRIQ Study
AU - Thoma, Brent
AU - Sebok-Syer, Stefanie S.
AU - Krishnan, Keeth
AU - Siemens, Marshall
AU - Trueger, N. Seth
AU - Colmers-Gray, Isabelle
AU - Woods, Rob
AU - Petrusa, Emil
AU - Chan, Teresa
AU - AUTHOR GROUP
AU - Alexander, Charlotte
AU - Alkhalifah, Mohammed
AU - Alqahtani, Saeed
AU - Anderson, Scott
AU - Anderson, Shelaina
AU - Andrews, Colin
AU - Andruko, Jocelyn
AU - Ankel, Felix
AU - Antony, Nikytha
AU - Aryal, Diptesh
AU - Backus, Barbra
AU - Baird, Jennifer
AU - Baker, Andrew
AU - Batty, Sarah
AU - Baylis, Jared
AU - Beaumont, Braeden
AU - Belcher, Chris
AU - Benavides, Brent
AU - Benham, Michael
AU - Pelletier, Elyse Berger
AU - Botta, Julian
AU - Bouchard, Nicholas
AU - Brazil, Victoria
AU - Brumfield, Emily
AU - Bryson, Anthony
AU - Bunchit, Wisarut
AU - Butler, Kat
AU - Buzikievich, Lindy
AU - Calcara, David
AU - Carey, Rob
AU - Carroll, Stephen
AU - Lyons, Casey
AU - Cassidy, Louise
AU - Challen, Kirsty
AU - Chaplin, Tim
AU - Chatham-Zvelebil, Natasha
AU - Chen, Eric
AU - Chen, Lucy
AU - Chhabra, Sushant
AU - Chin, Alvin
AU - Ridderikhof, Milan
PY - 2017
Y1 - 2017
N2 - Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs. We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality. One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8). The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment
AB - Open educational resources such as blogs are increasingly used for medical education. Gestalt is generally the evaluation method used for these resources; however, little information has been published on it. We aim to evaluate the reliability of gestalt in the assessment of emergency medicine blogs. We identified 60 English-language emergency medicine Web sites that posted clinically oriented blogs between January 1, 2016, and February 24, 2016. Ten Web sites were selected with a random-number generator. Medical students, emergency medicine residents, and emergency medicine attending physicians evaluated the 2 most recent clinical blog posts from each site for quality, using a 7-point Likert scale. The mean gestalt scores of each blog post were compared between groups with Pearson's correlations. Single and average measure intraclass correlation coefficients were calculated within groups. A generalizability study evaluated variance within gestalt and a decision study calculated the number of raters required to reliably (>0.8) estimate quality. One hundred twenty-one medical students, 88 residents, and 100 attending physicians (93.6% of enrolled participants) evaluated all 20 blog posts. Single-measure intraclass correlation coefficients within groups were fair to poor (0.36 to 0.40). Average-measure intraclass correlation coefficients were more reliable (0.811 to 0.840). Mean gestalt ratings by attending physicians correlated strongly with those by medical students (r=0.92) and residents (r=0.99). The generalizability coefficient was 0.91 for the complete data set. The decision study found that 42 gestalt ratings were required to reliably evaluate quality (>0.8). The mean gestalt quality ratings of blog posts between medical students, residents, and attending physicians correlate strongly, but individual ratings are unreliable. With sufficient raters, mean gestalt ratings provide a community standard for assessment
U2 - https://doi.org/10.1016/j.annemergmed.2016.12.025
DO - https://doi.org/10.1016/j.annemergmed.2016.12.025
M3 - Article
C2 - 28262317
SN - 0196-0644
VL - 70
SP - 394
EP - 401
JO - Annals of emergency medicine
JF - Annals of emergency medicine
IS - 3
ER -