Affordable Access

IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence

Authors
Publisher
BioMed Central
Publication Date
Source
PMC
Keywords
  • Methodology Article
Disciplines
  • Biology
  • Computer Science

Abstract

1471-2105-5-112.fm ral ss BioMed CentBMC Bioinformatics Open AcceMethodology article IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence Jibin Sun and An-Ping Zeng* Address: Department of Genome Analysis, GBF-German Research Center for Biotechnology, Mascheroder Weg 1, Braunschweig, 38124, Germany Email: Jibin Sun - [email protected]; An-Ping Zeng* - [email protected] * Corresponding author low-coverageunfinishedgenome sequenceannotationcoding sequencein silico reconstructionvisualizationcomparisonmetabolic networkSal-monella typhimuriumKlebsiella pneumoniae Abstract Background: A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence. Results: In this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella

There are no comments yet on this publication. Be the first to share your thoughts.