Affordable Access

Enhanced protein domain discovery using taxonomy

BioMed Central
Publication Date
  • Research Article
  • Biology
  • Computer Science

Abstract ral ss BioMed CentBMC Bioinformatics Open AcceResearch article Enhanced protein domain discovery using taxonomy Lachlan Coin*, Alex Bateman and Richard Durbin Address: Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Email: Lachlan Coin* - [email protected]; Alex Bateman - [email protected]; Richard Durbin - [email protected] * Corresponding author Abstract Background: It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids. Results: We show that by incorporating our understanding of the taxonomic distribution of specific protein domains, we can enhance domain recognition in protein sequences. We identify 4447 new instances of Pfam domains in the SP-TREMBL database using this technique, equivalent to the coverage increase given by the last 8.3% of Pfam families and to a 0.7% increase in the number of domain predictions. We use PSI-BLAST to cross-validate our new predictions. We also benchmark our approach using a SCOP test set of proteins of known structure, and demonstrate improvements relative to standard Hidden Markov model techniques. Conclusions: Explicitly including knowledge about the taxonomic distribution of protein domains can enhance protein domain recognition. Our method can also incorporate other context-specific domain distributions – such as domain co-occurrence and protein localisation. Background Protein domains are the structural, functional and evolu- tionary units of proteins. Several statistical techniques are currently used for detecting protein domains. In particu- lar, Profile hidden Markov models (profile HMMs) have been successfully applied to this problem [1,2], and form the basis for databases such as Pfam [3]. Profile HMMs can be more sensitive than methods which look for pair-

There are no comments yet on this publication. Be the first to share your thoughts.