Affordable Access

Access to the full text

The Site-Wise Log-Likelihood Score is a Good Predictor of Genes under Positive Selection

Authors
  • Wang, Huai-Chun1, 2, 3
  • Susko, Edward1, 3
  • Roger, Andrew J.2, 3
  • 1 Dalhousie University, Department of Mathematics and Statistics, Halifax, NS, B3H 4R2, Canada , Halifax (Canada)
  • 2 Dalhousie University, Department of Biochemistry and Molecular Biology, Halifax, NS, B3H 4R2, Canada , Halifax (Canada)
  • 3 Dalhousie University, Centre for Comparative Genomics and Evolutionary Bioinformatics, Halifax, NS, B3H 4R2, Canada , Halifax (Canada)
Type
Published Article
Journal
Journal of Molecular Evolution
Publisher
Springer-Verlag
Publication Date
Apr 18, 2013
Volume
76
Issue
5
Pages
280–294
Identifiers
DOI: 10.1007/s00239-013-9557-0
Source
Springer Nature
Keywords
License
Yellow

Abstract

The strength and direction of selection on the identity of an amino acid residue in a protein is typically measured by the ratio of the rate of non-synonymous substitutions to the rate of synonymous substitutions. In attempting to predict positively selected sites from amino acid alignments, we made the unexpected observation that the site likelihood of an alignment column for a given tree tends to be negatively correlated with the posterior probability that site is in the positive selection class under widely-used codon models. This is likely because positively selected sites tend to be more variable and display more “radical” amino acid changes; both of these features are expected to result in low site log-likelihoods. We explored the efficacy of using the site log-likelihood (SLL) score as a predictor for positive selection. Through simulation we show that a SLL-based test has a low false positive rate and comparable power as the codon models. In one case where the simulated data violated the assumption that synonymous substitution rates were constant across the sites, the codon models were not able to detect positive selection in the data while the SLL test did. We applied the new method to ten empirical datasets and found that it made similar predictions as the codon models in eight of them. For the tax gene dataset the SLL test seemed to produce more reasonable results. The SLL methods are a valuable complement to codon models, especially for some cases where the assumptions of codon models are likely violated.

Report this publication

Statistics

Seen <100 times