Affordable Access

Access to the full text

Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis

  • Li, Wei Tse1, 2
  • Ma, Jiayan1, 2
  • Shende, Neil1, 2
  • Castaneda, Grant1, 2
  • Chakladar, Jaideep1, 2
  • Tsai, Joseph C.1, 2
  • Apostol, Lauren1, 2
  • Honda, Christine O.1, 2
  • Xu, Jingyue1, 2
  • Wong, Lindsay M.1, 2
  • Zhang, Tianyi1, 2
  • Lee, Abby1, 2
  • Gnanasekar, Aditi1, 2
  • Honda, Thomas K.1, 2
  • Kuo, Selena Z.3
  • Yu, Michael Andrew4
  • Chang, Eric Y.5, 2
  • Rajasekaran, Mahadevan “ Raj”5, 2
  • Ongkeko, Weg M.1, 2
  • 1 UC San Diego School of Medicine, San Diego, CA, 92093, USA , San Diego (United States)
  • 2 VA San Diego Healthcare System, San Diego, CA, 92161, USA , San Diego (United States)
  • 3 Columbia University Medical Center, New York, NY, 10032, USA , New York (United States)
  • 4 Emory University School of Medicine, Atlanta, GA, 30322, USA , Atlanta (United States)
  • 5 University of California San Diego, San Diego, CA, 92093, USA , San Diego (United States)
Published Article
BMC Medical Informatics and Decision Making
Springer (Biomed Central Ltd.)
Publication Date
Sep 29, 2020
DOI: 10.1186/s12911-020-01266-z
Springer Nature


BackgroundThe recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests.MethodsIn this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aim to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID-19 patients and influenza patients based on clinical variables alone.ResultsWe discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients.ConclusionsWe demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.

Report this publication


Seen <100 times