Affordable Access

deepdyve-link
Publisher Website

Phenotype validation in electronic health records based genetic association studies.

Authors
  • Wang, Lu1
  • Damrauer, Scott M2, 3
  • Zhang, Hong4
  • Zhang, Alan X5
  • Xiao, Rui1
  • Moore, Jason H1, 6
  • Chen, Jinbo1
  • 1 Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America. , (United States)
  • 2 Division of Vascular Surgery and Endovascular Therapy, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America. , (United States)
  • 3 Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania, United States of America. , (United States)
  • 4 Institute of Biostatistics, Fudan University, Shanghai, P.R. China. , (China)
  • 5 Sidwell Friends School, Washington, DC, United States of America. , (United States)
  • 6 Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America. , (United States)
Type
Published Article
Journal
Genetic Epidemiology
Publisher
Wiley (John Wiley & Sons)
Publication Date
Dec 01, 2017
Volume
41
Issue
8
Pages
790–800
Identifiers
DOI: 10.1002/gepi.22080
PMID: 29023970
Source
Medline
Keywords
License
Unknown

Abstract

The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome-wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case-control sampling strategy to select subjects for phenotype validation. We develop a closed-form maximum-likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error-prone EHR-derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case-control sampling based only on EHR-derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.

Report this publication

Statistics

Seen <100 times