Development of a Reinforcement Learning Algorithm to Optimize Corticosteroid Therapy in Critically Ill Patients with Sepsis

Razvan Bologheanu; Lorenz Kapral; Daniel Laxar; Mathias Maleczek; Christoph Dibiasi; Sebastian Zeiner; Asan Agibetov; Ari Ercole; Patrick Thoral; Paul Elbers; Clemens Heitzinger; Oliver Kimberger

doi:https://doi.org/10.3390/jcm12041513

Development of a Reinforcement Learning Algorithm to Optimize Corticosteroid Therapy in Critically Ill Patients with Sepsis

Razvan Bologheanu, Lorenz Kapral, Daniel Laxar, Mathias Maleczek, Christoph Dibiasi, Sebastian Zeiner, Asan Agibetov, Ari Ercole, Patrick Thoral, Paul Elbers, Clemens Heitzinger, Oliver Kimberger

Research output: Contribution to journal › Article › Academic › peer-review

4 Citations (Scopus)

Abstract

Background: The optimal indication, dose, and timing of corticosteroids in sepsis is controversial. Here, we used reinforcement learning to derive the optimal steroid policy in septic patients based on data on 3051 ICU admissions from the AmsterdamUMCdb intensive care database. Methods: We identified septic patients according to the 2016 consensus definition. An actor-critic RL algorithm using ICU mortality as a reward signal was developed to determine the optimal treatment policy from time-series data on 277 clinical parameters. We performed off-policy evaluation and testing in independent subsets to assess the algorithm’s performance. Results: Agreement between the RL agent’s policy and the actual documented treatment reached 59%. Our RL agent’s treatment policy was more restrictive compared to the actual clinician behavior: our algorithm suggested withholding corticosteroids in 62% of the patient states, versus 52% according to the physicians’ policy. The 95% lower bound of the expected reward was higher for the RL agent than clinicians’ historical decisions. ICU mortality after concordant action in the testing dataset was lower both when corticosteroids had been withheld and when corticosteroids had been prescribed by the virtual agent. The most relevant variables were vital parameters and laboratory values, such as blood pressure, heart rate, leucocyte count, and glycemia. Conclusions: Individualized use of corticosteroids in sepsis may result in a mortality benefit, but optimal treatment policy may be more restrictive than the routine clinical practice. Whilst external validation is needed, our study motivates a ‘precision-medicine’ approach to future prospective controlled trials and practice.

Original language	English
Article number	1513
Journal	Clinical Chemistry
Volume	12
Issue number	4
DOIs	https://doi.org/10.3390/jcm12041513
Publication status	Published - 1 Feb 2023

Keywords

artificial intelligence
corticosteroids
outcomes
reinforcement learning
sepsis

Access to Document

https://doi.org/10.3390/jcm12041513

Cite this

@article{cb2914fb8999484e81cbc57ef0d44b3a,

title = "Development of a Reinforcement Learning Algorithm to Optimize Corticosteroid Therapy in Critically Ill Patients with Sepsis",

abstract = "Background: The optimal indication, dose, and timing of corticosteroids in sepsis is controversial. Here, we used reinforcement learning to derive the optimal steroid policy in septic patients based on data on 3051 ICU admissions from the AmsterdamUMCdb intensive care database. Methods: We identified septic patients according to the 2016 consensus definition. An actor-critic RL algorithm using ICU mortality as a reward signal was developed to determine the optimal treatment policy from time-series data on 277 clinical parameters. We performed off-policy evaluation and testing in independent subsets to assess the algorithm{\textquoteright}s performance. Results: Agreement between the RL agent{\textquoteright}s policy and the actual documented treatment reached 59%. Our RL agent{\textquoteright}s treatment policy was more restrictive compared to the actual clinician behavior: our algorithm suggested withholding corticosteroids in 62% of the patient states, versus 52% according to the physicians{\textquoteright} policy. The 95% lower bound of the expected reward was higher for the RL agent than clinicians{\textquoteright} historical decisions. ICU mortality after concordant action in the testing dataset was lower both when corticosteroids had been withheld and when corticosteroids had been prescribed by the virtual agent. The most relevant variables were vital parameters and laboratory values, such as blood pressure, heart rate, leucocyte count, and glycemia. Conclusions: Individualized use of corticosteroids in sepsis may result in a mortality benefit, but optimal treatment policy may be more restrictive than the routine clinical practice. Whilst external validation is needed, our study motivates a {\textquoteleft}precision-medicine{\textquoteright} approach to future prospective controlled trials and practice.",

keywords = "artificial intelligence, corticosteroids, outcomes, reinforcement learning, sepsis",

author = "Razvan Bologheanu and Lorenz Kapral and Daniel Laxar and Mathias Maleczek and Christoph Dibiasi and Sebastian Zeiner and Asan Agibetov and Ari Ercole and Patrick Thoral and Paul Elbers and Clemens Heitzinger and Oliver Kimberger",

note = "Publisher Copyright: {\textcopyright} 2023 by the authors.",

year = "2023",

month = feb,

day = "1",

doi = "https://doi.org/10.3390/jcm12041513",

language = "English",

volume = "12",

journal = "Clinical Chemistry",

issn = "0009-9147",

publisher = "American Association for Clinical Chemistry Inc.",

number = "4",

}

TY - JOUR

T1 - Development of a Reinforcement Learning Algorithm to Optimize Corticosteroid Therapy in Critically Ill Patients with Sepsis

AU - Bologheanu, Razvan

AU - Kapral, Lorenz

AU - Laxar, Daniel

AU - Maleczek, Mathias

AU - Dibiasi, Christoph

AU - Zeiner, Sebastian

AU - Agibetov, Asan

AU - Ercole, Ari

AU - Thoral, Patrick

AU - Elbers, Paul

AU - Heitzinger, Clemens

AU - Kimberger, Oliver

PY - 2023/2/1

Y1 - 2023/2/1

N2 - Background: The optimal indication, dose, and timing of corticosteroids in sepsis is controversial. Here, we used reinforcement learning to derive the optimal steroid policy in septic patients based on data on 3051 ICU admissions from the AmsterdamUMCdb intensive care database. Methods: We identified septic patients according to the 2016 consensus definition. An actor-critic RL algorithm using ICU mortality as a reward signal was developed to determine the optimal treatment policy from time-series data on 277 clinical parameters. We performed off-policy evaluation and testing in independent subsets to assess the algorithm’s performance. Results: Agreement between the RL agent’s policy and the actual documented treatment reached 59%. Our RL agent’s treatment policy was more restrictive compared to the actual clinician behavior: our algorithm suggested withholding corticosteroids in 62% of the patient states, versus 52% according to the physicians’ policy. The 95% lower bound of the expected reward was higher for the RL agent than clinicians’ historical decisions. ICU mortality after concordant action in the testing dataset was lower both when corticosteroids had been withheld and when corticosteroids had been prescribed by the virtual agent. The most relevant variables were vital parameters and laboratory values, such as blood pressure, heart rate, leucocyte count, and glycemia. Conclusions: Individualized use of corticosteroids in sepsis may result in a mortality benefit, but optimal treatment policy may be more restrictive than the routine clinical practice. Whilst external validation is needed, our study motivates a ‘precision-medicine’ approach to future prospective controlled trials and practice.

AB - Background: The optimal indication, dose, and timing of corticosteroids in sepsis is controversial. Here, we used reinforcement learning to derive the optimal steroid policy in septic patients based on data on 3051 ICU admissions from the AmsterdamUMCdb intensive care database. Methods: We identified septic patients according to the 2016 consensus definition. An actor-critic RL algorithm using ICU mortality as a reward signal was developed to determine the optimal treatment policy from time-series data on 277 clinical parameters. We performed off-policy evaluation and testing in independent subsets to assess the algorithm’s performance. Results: Agreement between the RL agent’s policy and the actual documented treatment reached 59%. Our RL agent’s treatment policy was more restrictive compared to the actual clinician behavior: our algorithm suggested withholding corticosteroids in 62% of the patient states, versus 52% according to the physicians’ policy. The 95% lower bound of the expected reward was higher for the RL agent than clinicians’ historical decisions. ICU mortality after concordant action in the testing dataset was lower both when corticosteroids had been withheld and when corticosteroids had been prescribed by the virtual agent. The most relevant variables were vital parameters and laboratory values, such as blood pressure, heart rate, leucocyte count, and glycemia. Conclusions: Individualized use of corticosteroids in sepsis may result in a mortality benefit, but optimal treatment policy may be more restrictive than the routine clinical practice. Whilst external validation is needed, our study motivates a ‘precision-medicine’ approach to future prospective controlled trials and practice.

KW - artificial intelligence

KW - corticosteroids

KW - outcomes

KW - reinforcement learning

KW - sepsis

UR - http://www.scopus.com/inward/record.url?scp=85148945046&partnerID=8YFLogxK

U2 - https://doi.org/10.3390/jcm12041513

DO - https://doi.org/10.3390/jcm12041513

M3 - Article

C2 - 36836046

SN - 0009-9147

VL - 12

JO - Clinical Chemistry

JF - Clinical Chemistry

IS - 4

M1 - 1513

ER -

Development of a Reinforcement Learning Algorithm to Optimize Corticosteroid Therapy in Critically Ill Patients with Sepsis

Abstract

Keywords

Access to Document

Other files and links

Cite this