Affordable Access

deepdyve-link
Publisher Website

Inclusion of Unstructured Clinical Text Improves Early Prediction of Death or Prolonged ICU Stay

Authors
  • Weissman, Gary E.1, 2, 3
  • Hubbard, Rebecca A.4
  • Ungar, Lyle H.5
  • Harhay, Michael O.2, 4
  • Greene, Casey S.6, 7, 8
  • Himes, Blanca E.4, 8
  • Halpern, Scott D.1, 2, 3, 4
  • 1 Pulmonary, Allergy, and Critical Care Division, Perelman School of Medicine, University of Pennsylvania, PA
  • 2 Palliative and Advanced Illness Research Center, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, PA
  • 3 Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA
  • 4 Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
  • 5 Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA
  • 6 Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA
  • 7 Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA
  • 8 Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA
Type
Published Article
Journal
Critical care medicine
Publication Date
Jul 01, 2018
Volume
46
Issue
7
Pages
1125–1132
Identifiers
DOI: 10.1097/CCM.0000000000003148
PMID: 29629986
PMCID: PMC6005735
Source
PubMed Central
Keywords
License
Unknown

Abstract

Objective Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients’ goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or intensive care unit (ICU) length of stay (LOS) ≥ 7 days using data from the first 48 hours of hospitalization. Design Retrospective cohort study with split sampling for model training and testing. Setting A single urban academic hospital. Patients All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012. Interventions None. Measurements and Main Results Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU LOS ≥ 7 days. The gradient boosting machine model had the highest discrimination without (AUC 0.83, 95% CI 0.81–0.84) and with (AUC 0.89, 95% CI 0.88–0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data ( p < 0.001), while all models outperformed logistic regression with text data ( p < 0.002). The inclusion of text data increased the discrimination of all four model types ( p < 0.001). Among those models using text data, the increasing presence of terms “intubated” and “poor prognosis” were positively associated with mortality and ICU LOS, while the term “extubated” was inversely associated with them. Conclusions Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open source approaches to capture local variation in care patterns.

Report this publication

Statistics

Seen <100 times