The UCSC Ebola Genome Portal.
Published in PLoS Currents
Published in PLoS Currents
Published in Nature
An understanding of how centromeric transition regions are organized is a critical aspect of chromosome structure and function; however, the sequence context of these regions has been difficult to resolve on the basis of the draft genome sequence. We present a detailed analysis of the structure and assembly of all human pericentromeric regions (5 m...
Published in Bioinformatics
Published in Nature Biotechnology
RNA and DNA strands produce ionic current signatures when driven through an alpha-hemolysin channel by an applied voltage. Here we combine this nanopore detector with a support vector machine (SVM) to analyze DNA hairpin molecules on the millisecond time scale. Measurable properties include duplex stem length, base pair mismatches, and loop length....
Published in Nucleic Acids Research
The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative...
Published in Nature
Hundreds of highly conserved distal cis-regulatory elements have been characterized so far in vertebrate genomes. Many thousands more are predicted on the basis of comparative genomics. However, in stark contrast to the genes that they regulate, in invertebrates virtually none of these regions can be traced by using sequence similarity, leaving the...
Published in PLoS Computational Biology
Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we obse...
Published in Nucleic Acids Research
For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its incepti...
Published in Bioinformatics
The University of California Santa Cruz (UCSC) Known Genes dataset is constructed by a fully automated process, based on protein data from Swiss-Prot/TrEMBL (UniProt) and the associated mRNA data from Genbank. The detailed steps of this process are described. Extensive cross-references from this dataset to other genomic and proteomic data were cons...
Published in Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Combining mRNA and EST data in splicing graphs with whole genome alignments, we discover alternative splicing events that are conserved in both human and mouse transcriptomes. 1,964 of 19,156 (10%) loci examined contain one or more such alternative splicing events, with 2,698 total events. These events represent a lower bound on the amount of alter...
Published in Proceedings of the National Academy of Sciences
We formalize the problem of recovering the evolutionary history of a set of genomes that are related to an unseen common ancestor genome by operations of speciation, deletion, insertion, duplication, and rearrangement of segments of bases. The problem is examined in the limit as the number of bases in each genome goes to infinity. In this limit, th...
Published in International Conference on Intelligent Systems for Molecular Biology
We present a statistical model of genes in DNA. A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence (Stormo & Haussler 1994). Probabilities are assigned to transitions between states in the GHMM and to the generation of each nucleotide base given a particular state. Machine l...
Published in PLoS Genetics
Regions of the genome that have been the target of positive selection specifically along the human lineage are of special importance in human biology. We used high throughput sequencing combined with methods to enrich human genomic samples for particular targets to obtain the sequence of 22 chromosomal samples at high depth in 40 kb neighborhoods o...
Published in Genome Biology
Published in Cold Spring Harbor Symposia on Quantitative Biology
Published in PLoS Genetics
Published in Journal of Computational Biology
A few models have appeared in recent years that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way the process changes from one site to the next. These models combine phylogenetic models of molecular evolution, which apply to individual sites, and hidden Markov models, which allow f...
Published in Bioinformatics
Published in Journal of Computational Biology
We introduce a new system, called shortHMM, for predicting exons, which predicts individual exons using two related genomes. In this system, we build a hidden semi-Markov model to identify exons. In the hidden Markov model, we propose joint probability models of nucleotides in introns, splice sites, 5 UTR, 3 UTR, and intergenic regions by exploitin...
Published in Bioinformatics