Affordable Access

Publisher Website

Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing

BMC Bioinformatics
Springer (Biomed Central Ltd.)
Publication Date
DOI: 10.1186/1471-2105-8-396
  • Research Article
  • Biology
  • Computer Science

Abstract ral ss BioMed CentBMC Bioinformatics Open AcceResearch article Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing Tobias Wittkop1,2,3, Jan Baumbach*1,2,4, Francisco P Lobo1,5 and Sven Rahmann6 Address: 1Computational Methods for Emerging Technologies, Bielefeld University, Bielefeld, Germany, 2Genome informatics, Bielefeld University, Bielefeld, Germany, 3DFG Graduiertenkolleg Bioinformatik, Bielefeld University, Bielefeld, Germany, 4International Graduate School in Bioinformatics and Genome Research, Center for Biotechnology, Bielefeld, Germany, 5Laboratorio de Genetica Bioquimica, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil and 6Bioinformatics for High-Throughput Technologies, Technical University of Dortmund, Germany Email: Tobias Wittkop - [email protected]; Jan Baumbach* - [email protected]; Francisco P Lobo - [email protected]; Sven Rahmann - [email protected] * Corresponding author Abstract Background: Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. Results: We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools (Spectral clustering, TribeMCL

There are no comments yet on this publication. Be the first to share your thoughts.