Affordable Access

deepdyve-link
Publisher Website

Exploring gene-gene interaction in family-based data with an unsupervised machine learning method: EPISFA.

Authors
  • Xiang, Xiao1
  • Wang, Siyue1
  • Liu, Tianyi2
  • Wang, Mengying1
  • Li, Jiawen3
  • Jiang, Jin1
  • Wu, Tao1
  • Hu, Yonghua1
  • 1 Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China. , (China)
  • 2 Department of Epidemiology and Biostatistics, School of Public Health, Capital Medical University, Beijing, China. , (China)
  • 3 Department of Clinical Medicine, School of Medicine, Peking University, Beijing, China. , (China)
Type
Published Article
Journal
Genetic Epidemiology
Publisher
Wiley (John Wiley & Sons)
Publication Date
Nov 01, 2020
Volume
44
Issue
8
Pages
811–824
Identifiers
DOI: 10.1002/gepi.22342
PMID: 32869348
Source
Medline
Keywords
Language
English
License
Unknown

Abstract

Gene-gene interaction (G × G) is thought to fill the gap between the estimated heritability of complex diseases and the limited genetic proportion explained by identified single-nucleotide polymorphisms. The current tools for exploring G × G were often developed for case-control designs with less considerations for their applications in families. Family-based studies are robust against bias led from population stratification in genetic studies and helpful in understanding G × G. We proposed a new algorithm epistasis sparse factor analysis (EPISFA) and epistasis sparse factor analysis for linkage disequilibrium (EPISFA-LD) based on unsupervised machine learning to screen G × G. Extensive simulations were performed to compare EPISFA/EPISFA-LD with a classical family-based algorithm FAM-MDR (family-based multifactor dimensionality reduction). The results showed that EPISFA/EPISFA-LD is a tool of both high power and computational efficiency that could be applied in family designs and is applicable within high-dimensionality datasets. Finally, we applied EPISFA/EPISFA-LD to a real dataset drawn from the Fangshan/family-based Ischemic Stroke Study in China. Five pairs of G × G were discovered by EPISFA/EPISFA-LD, including three pairs verified by other algorithms (FAM-MDR and logistic), and an additional two pairs uniquely identified by EPISFA/EPISFA-LD only. The results from EPISFA might offer new insights for understanding the genetic etiology of complex diseases. EPISFA/EPISFA-LD was implemented in R. All relevant source code as well as simulated data could be freely downloaded from https://github.com/doublexism/episfa. © 2020 Wiley Periodicals LLC.

Report this publication

Statistics

Seen <100 times