Systematic Review and Comparison of Publicly Available ICU Data Sets-A Decision Guide for Clinicians and Data Scientists

Christopher M. Sauer; Tariq A. Dam; Leo A. Celi; Martin Faltys; Miguel A. A. de la Hoz; Lasith Adhikari; Kirsten A. Ziesemer; Armand Girbes; Patrick J. Thoral; Paul Elbers

doi:https://doi.org/10.1097/CCM.0000000000005517

Systematic Review and Comparison of Publicly Available ICU Data Sets-A Decision Guide for Clinicians and Data Scientists

Christopher M. Sauer, Tariq A. Dam, Leo A. Celi, Martin Faltys, Miguel A. A. de la Hoz, Lasith Adhikari, Kirsten A. Ziesemer, Armand Girbes, Patrick J. Thoral, Paul Elbers

Research output: Contribution to journal › Review article › Academic › peer-review

24 Citations (Scopus)

Abstract

OBJECTIVE: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. DATA SOURCES: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. STUDY SELECTION: We selected all studies that reported on publicly available adult patient-level intensive care datasets. DATA EXTRACTION: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. DATA SYNTHESIS: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. CONCLUSIONS: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.

Original language	English
Pages (from-to)	E581-E588
Journal	Critical Care Medicine
Volume	50
Issue number	6
Early online date	2 Mar 2022
DOIs	https://doi.org/10.1097/CCM.0000000000005517
Publication status	Published - 1 Jun 2022

Keywords

ICU
critical care
data science
data set
guide
systematic review

Access to Document

https://doi.org/10.1097/CCM.0000000000005517

Cite this

@article{c43cc149c4ee428da82d9072cff6f759,

title = "Systematic Review and Comparison of Publicly Available ICU Data Sets-A Decision Guide for Clinicians and Data Scientists",

abstract = "OBJECTIVE: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. DATA SOURCES: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. STUDY SELECTION: We selected all studies that reported on publicly available adult patient-level intensive care datasets. DATA EXTRACTION: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. DATA SYNTHESIS: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. CONCLUSIONS: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.",

keywords = "ICU, critical care, data science, data set, guide, systematic review",

author = "Sauer, {Christopher M.} and Dam, {Tariq A.} and Celi, {Leo A.} and Martin Faltys and {de la Hoz}, {Miguel A. A.} and Lasith Adhikari and Ziesemer, {Kirsten A.} and Armand Girbes and Thoral, {Patrick J.} and Paul Elbers",

note = "Funding Information: Dr. Dam received funding from AmsterdamUMC and the Netherlands Organization for Health Research and Development (project number 10430012010003). Dr. Celi received support for article research from the National Institutes of Health. Dr. Faltys received funding from the Swiss National Fund. The remaining authors have disclosed that they do not have any potential conflicts of interest. Publisher Copyright: {\textcopyright} 2022 Lippincott Williams and Wilkins. All rights reserved.",

year = "2022",

month = jun,

day = "1",

doi = "https://doi.org/10.1097/CCM.0000000000005517",

language = "English",

volume = "50",

pages = "E581--E588",

journal = "Critical Care Medicine",

issn = "0090-3493",

publisher = "Lippincott Williams and Wilkins",

number = "6",

}

TY - JOUR

T1 - Systematic Review and Comparison of Publicly Available ICU Data Sets-A Decision Guide for Clinicians and Data Scientists

AU - Sauer, Christopher M.

AU - Dam, Tariq A.

AU - Celi, Leo A.

AU - Faltys, Martin

AU - de la Hoz, Miguel A. A.

AU - Adhikari, Lasith

AU - Ziesemer, Kirsten A.

AU - Girbes, Armand

AU - Thoral, Patrick J.

AU - Elbers, Paul

N1 - Funding Information: Dr. Dam received funding from AmsterdamUMC and the Netherlands Organization for Health Research and Development (project number 10430012010003). Dr. Celi received support for article research from the National Institutes of Health. Dr. Faltys received funding from the Swiss National Fund. The remaining authors have disclosed that they do not have any potential conflicts of interest. Publisher Copyright: © 2022 Lippincott Williams and Wilkins. All rights reserved.

PY - 2022/6/1

Y1 - 2022/6/1

N2 - OBJECTIVE: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. DATA SOURCES: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. STUDY SELECTION: We selected all studies that reported on publicly available adult patient-level intensive care datasets. DATA EXTRACTION: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. DATA SYNTHESIS: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. CONCLUSIONS: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.

AB - OBJECTIVE: As data science and artificial intelligence continue to rapidly gain traction, the publication of freely available ICU datasets has become invaluable to propel data-driven clinical research. In this guide for clinicians and researchers, we aim to: 1) systematically search and identify all publicly available adult clinical ICU datasets, 2) compare their characteristics, data quality, and richness and critically appraise their strengths and weaknesses, and 3) provide researchers with suggestions, which datasets are appropriate for answering their clinical question. DATA SOURCES: A systematic search was performed in Pubmed, ArXiv, MedRxiv, and BioRxiv. STUDY SELECTION: We selected all studies that reported on publicly available adult patient-level intensive care datasets. DATA EXTRACTION: A total of four publicly available, adult, critical care, patient-level databases were included (Amsterdam University Medical Center data base [AmsterdamUMCdb], eICU Collaborative Research Database eICU CRD], High time-resolution intensive care unit dataset [HiRID], and Medical Information Mart for Intensive Care-IV). Databases were compared using a priori defined categories, including demographics, patient characteristics, and data richness. The study protocol and search strategy were prospectively registered. DATA SYNTHESIS: Four ICU databases fulfilled all criteria for inclusion and were queried using SQL (PostgreSQL version 12; PostgreSQL Global Development Group) and analyzed using R (R Foundation for Statistical Computing, Vienna, Austria). The number of unique patient admissions varied between 23,106 (AmsterdamUMCdb) and 200,859 (eICU-CRD). Frequency of laboratory values and vital signs was highest in HiRID, for example, 5.2 (±3.4) lactate values per day and 29.7 (±10.2) systolic blood pressure values per hour. Treatment intensity varied with vasopressor and ventilatory support in 69.0% and 83.0% of patients in AmsterdamUMCdb versus 12.0% and 21.0% in eICU-CRD, respectively. ICU mortality ranged from 5.5% in eICU-CRD to 9.9% in AmsterdamUMCdb. CONCLUSIONS: We identified four publicly available adult clinical ICU datasets. Sample size, severity of illness, treatment intensity, and frequency of reported parameters differ markedly between the databases. This should guide clinicians and researchers which databases to best answer their clinical questions.

KW - ICU

KW - critical care

KW - data science

KW - data set

KW - guide

KW - systematic review

UR - http://www.scopus.com/inward/record.url?scp=85131107732&partnerID=8YFLogxK

U2 - https://doi.org/10.1097/CCM.0000000000005517

DO - https://doi.org/10.1097/CCM.0000000000005517

M3 - Review article

C2 - 35234175

SN - 0090-3493

VL - 50

SP - E581-E588

JO - Critical Care Medicine

JF - Critical Care Medicine

IS - 6

ER -

Systematic Review and Comparison of Publicly Available ICU Data Sets-A Decision Guide for Clinicians and Data Scientists

Abstract

Keywords

Access to Document

Other files and links

Cite this