Affordable Access

deepdyve-link
Publisher Website

Efficient gene-environment interaction tests for large biobank-scale sequencing studies.

Authors
  • Wang, Xinyu1
  • Lim, Elise2
  • Liu, Ching-Ti2
  • Sung, Yun Ju3
  • Rao, Dabeeru C3
  • Morrison, Alanna C4
  • Boerwinkle, Eric4, 5
  • Manning, Alisa K6, 7
  • Chen, Han4, 8
  • 1 Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas.
  • 2 Department of Biostatistics, Boston University, Boston, Massachusetts.
  • 3 Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri.
  • 4 Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas.
  • 5 Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas.
  • 6 Center for Human Genetics Research, Massachusetts General Hospital, Boston, Massachusetts.
  • 7 Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts.
  • 8 Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas.
Type
Published Article
Journal
Genetic Epidemiology
Publisher
Wiley (John Wiley & Sons)
Publication Date
Nov 01, 2020
Volume
44
Issue
8
Pages
908–923
Identifiers
DOI: 10.1002/gepi.22351
PMID: 32864785
Source
Medline
Keywords
Language
English
License
Unknown

Abstract

Complex human diseases are affected by genetic and environmental risk factors and their interactions. Gene-environment interaction (GEI) tests for aggregate genetic variant sets have been developed in recent years. However, existing statistical methods become rate limiting for large biobank-scale sequencing studies with correlated samples. We propose efficient Mixed-model Association tests for GEne-Environment interactions (MAGEE), for testing GEI between an aggregate variant set and environmental exposures on quantitative and binary traits in large-scale sequencing studies with related individuals. Joint tests for the aggregate genetic main effects and GEI effects are also developed. A null generalized linear mixed model adjusting for covariates but without any genetic effects is fit only once in a whole genome GEI analysis, thereby vastly reducing the overall computational burden. Score tests for variant sets are performed as a combination of genetic burden and variance component tests by accounting for the genetic main effects using matrix projections. The computational complexity is dramatically reduced in a whole genome GEI analysis, which makes MAGEE scalable to hundreds of thousands of individuals. We applied MAGEE to the exome sequencing data of 41,144 related individuals from the UK Biobank, and the analysis of 18,970 protein coding genes finished within 10.4 CPU hours. © 2020 Wiley Periodicals LLC.

Report this publication

Statistics

Seen <100 times