Affordable Access

Coding exon detection using comparative sequences.

Authors
Type
Published Article
Journal
Journal of Computational Biology
Publisher
Mary Ann Liebert
Volume
13
Issue
6
Pages
1148–1164
Source
UCSC Cancer biomedical-ucsc
License
Unknown

Abstract

We introduce a new system, called shortHMM, for predicting exons, which predicts individual exons using two related genomes. In this system, we build a hidden semi-Markov model to identify exons. In the hidden Markov model, we propose joint probability models of nucleotides in introns, splice sites, 5 UTR, 3 UTR, and intergenic regions by exploiting the homology between related genomes. In order to reduce the false positive rate of the hidden Markov model, we develop a screening process which is able to identify intergenic regions. We then build a classifier by combining the statistics from the hidden Markov model and the screening process. We implement shortHMM on human-mouse sequence alignments. The source codes are available at < www.stat.purdue.edu/ jingwu/hmm >. Compared to TWINSCAN and SLAM, shortHMM is substantially more powerful in identifying AT-rich RefSeq exons (8% more AT-rich RefSeq exons were predicted), as well as slightly more powerful in identifying RefSeq exons (3-10% more RefSeq exons were predicted), at a similar or lower false positive rate, with less computing time and with less memory usage. Last, shortHMM is also capable of finding new potential exons.

Report this publication

Statistics

Seen <100 times