Affordable Access

Publisher Website

A deep learning approach for transgender and gender diverse patient identification in electronic health records.

Authors
  • Hua, Yining1
  • Wang, Liqin2
  • Nguyen, Vi3
  • Rieu-Werden, Meghan4
  • McDowell, Alex5
  • Bates, David W6
  • Foer, Dinah7
  • Zhou, Li8
  • 1 Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Department of Epidemiology, Harvard T.H Chan School of Public Health, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. Electronic address: [email protected].
  • 2 Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. Electronic address: [email protected].
  • 3 Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. Electronic address: [email protected].
  • 4 Division of General Medicine, Massachusetts General Hospital, Boston, MA, USA. Electronic address: [email protected].
  • 5 Health Policy Research Institute, Mongan Institute, Massachusetts General Hospital, Boston, MA, USA; Department of Health Care Policy, Harvard Medical School, Boston, MA, USA. Electronic address: [email protected].
  • 6 Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. Electronic address: [email protected].
  • 7 Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Division of Allergy and Clinical Immunology, Department of Medicine, Brigham and Women's Hospital, USA. Electronic address: [email protected].
  • 8 Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA. Electronic address: [email protected].
Type
Published Article
Journal
Journal of Biomedical Informatics
Publisher
Elsevier
Publication Date
Nov 01, 2023
Volume
147
Pages
104507–104507
Identifiers
DOI: 10.1016/j.jbi.2023.104507
PMID: 37778672
Source
Medline
Keywords
Language
English
License
Unknown

Abstract

Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it remains a challenging task due to incomplete gender information in structured EHR fields. Using TGD identification as a case study, this research uses NLP and deep learning to build an accurate patient gender identity predictive model, aiming to tackle the challenges of identifying relevant patient-level information from EHR data and reducing annotation work. This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/1/2022. To identify relevant information from massive clinical notes, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms. The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms. This is the first study to show that deep learning-integrated NLP algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms. Copyright © 2023 Elsevier Inc. All rights reserved.

Report this publication

Statistics

Seen <100 times