Affordable Access

Access to the full text

Simplitigs as an efficient and scalable representation of de Bruijn graphs

Authors
  • Břinda, Karel1, 2
  • Baym, Michael1
  • Kucherov, Gregory3, 4
  • 1 Harvard Medical School, Boston, USA and Broad Institute of MIT and Harvard, Cambridge, USA , Cambridge (United States)
  • 2 Harvard T.H. Chan School of Public Health, Boston, USA , Boston (United States)
  • 3 CNRS/LIGM Univ Gustave Eiffel, Marne-la-Vallée, France , Marne-la-Vallée (France)
  • 4 Skolkovo Institute of Science and Technology, Moscow, Russia , Moscow (Russia)
Type
Published Article
Publication Date
Apr 06, 2021
Volume
22
Issue
1
Identifiers
DOI: 10.1186/s13059-021-02297-z
Source
Springer Nature
Keywords
License
Green

Abstract

de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number. When combined with the commonly used Burrows-Wheeler Transform index, simplitigs reduce memory, and index loading and query times, as demonstrated with large-scale examples of GenBank bacterial pan-genomes.

Report this publication

Statistics

Seen <100 times