Affordable Access

IsoSVM – Distinguishing isoforms and paralogs on the protein level

Authors
Publisher
BioMed Central
Source
PMC
Keywords

Abstract

1471-2105-7-110.fm ral ss BioMed CentBMC Bioinformatics Open AcceResearch article IsoSVM – Distinguishing isoforms and paralogs on the protein level Michael Spitzer1, Stefan Lorkowski2,3, Paul Cullen2, Alexander Sczyrba4 and Georg Fuellen*1,5 Address: 1Division of Bioinformatics, Biology Department, Schlossplatz 4, 48149 Münster, Germany, 2Leibniz Institute of Arteriosclerosis Research, Domagkstr. 3, 48149 Münster, Germany, 3Institute of Biochemistry, Wilhelm-Klemm-Str. 2, 48149 Münster, Germany, 4Faculty of Technology, Research Group in Practical Computer Science, University of Bielefeld,Postfach 10 01 31, 33501 Bielefeld, Germany and 5Department of Medicine, AG Bioinformatics, Domagkstr. 3, 48149 Münster, Germany Email: Michael Spitzer - [email protected]; Stefan Lorkowski - [email protected]; Paul Cullen - [email protected] muenster.de; Alexander Sczyrba - [email protected]; Georg Fuellen* - [email protected] * Corresponding author Abstract Background: Recent progress in cDNA and EST sequencing is yielding a deluge of sequence data. Like database search results and proteome databases, this data gives rise to inferred protein sequences without ready access to the underlying genomic data. Analysis of this information (e.g. for EST clustering or phylogenetic reconstruction from proteome data) is hampered because it is not known if two protein sequences are isoforms (splice variants) or not (i.e. paralogs/orthologs). However, even without knowing the intron/exon structure, visual analysis of the pattern of similarity across the alignment of the two protein sequences is usually helpful since paralogs and orthologs feature substitutions with respect to each other, as opposed to isoforms, which do not. Results: The IsoSVM tool introduces an automated approach to identifying isoforms on the protein level using a support vector machine (SVM) classifier. Based on three specific features used as input of the SVM classifier, it is p

Report this publication

Statistics

Seen <100 times