Affordable Access

Access to the full text

Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes

  • Abdollahi-Arpanahi, Rostam1
  • Gianola, Daniel2
  • Peñagaricano, Francisco1, 3
  • 1 University of Florida, Gainesville, FL, USA , Gainesville (United States)
  • 2 University of Wisconsin-Madison, Madison, WI, USA , Madison (United States)
  • 3 University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA , Gainesville (United States)
Published Article
Genetics Selection Evolution
Springer (Biomed Central Ltd.)
Publication Date
Feb 24, 2020
DOI: 10.1186/s12711-020-00531-z
Springer Nature


BackgroundTransforming large amounts of genomic data into valuable knowledge for predicting complex traits has been an important challenge for animal and plant breeders. Prediction of complex traits has not escaped the current excitement on machine-learning, including interest in deep learning algorithms such as multilayer perceptrons (MLP) and convolutional neural networks (CNN). The aim of this study was to compare the predictive performance of two deep learning methods (MLP and CNN), two ensemble learning methods [random forests (RF) and gradient boosting (GB)], and two parametric methods [genomic best linear unbiased prediction (GBLUP) and Bayes B] using real and simulated datasets.MethodsThe real dataset consisted of 11,790 Holstein bulls with sire conception rate (SCR) records and genotyped for 58k single nucleotide polymorphisms (SNPs). To support the evaluation of deep learning methods, various simulation studies were conducted using the observed genotype data as template, assuming a heritability of 0.30 with either additive or non-additive gene effects, and two different numbers of quantitative trait nucleotides (100 and 1000).ResultsIn the bull dataset, the best predictive correlation was obtained with GB (0.36), followed by Bayes B (0.34), GBLUP (0.33), RF (0.32), CNN (0.29) and MLP (0.26). The same trend was observed when using mean squared error of prediction. The simulation indicated that when gene action was purely additive, parametric methods outperformed other methods. When the gene action was a combination of additive, dominance and of two-locus epistasis, the best predictive ability was obtained with gradient boosting, and the superiority of deep learning over the parametric methods depended on the number of loci controlling the trait and on sample size. In fact, with a large dataset including 80k individuals, the predictive performance of deep learning methods was similar or slightly better than that of parametric methods for traits with non-additive gene action.ConclusionsFor prediction of traits with non-additive gene action, gradient boosting was a robust method. Deep learning approaches were not better for genomic prediction unless non-additive variance was sizable.

Report this publication


Seen <100 times