Affordable Access

Publisher Website

Space Efficient Suffix Trees

Journal of Algorithms
Publication Date
DOI: 10.1006/jagm.2000.1151
  • Computer Science


Abstract We give first the representation of a suffix tree that uses n lg n + O( n) bits of space and supports searching for a pattern string in the given text (from a fixed size alphabet) in O( m) time, where n is the size of the text and m is the length of the pattern. The structure is quite simple and answers a question raised by Muthukrishnan (in Proceedings of the FST and TCS Preconference Workshop on Randomization, 1997, pp. 23–27). Previous compact representations of suffix trees had either a higher lower order term in space and had some expectation assumption or required more time for searching. When the size of the alphabet k is not viewed as a constant, this structure can be modified to use the same space but take O( m lg k) time for string searching or to use an additional O( n lg k) bits but take the same O( m) time for searching. We then give several index structures for binary texts, with less space including • a structure that uses a suffix array (⌈lg ⌉ bits) and an additional () bits, • an indexing structure that takes bits of space, and • an ( lg ) bit structure which answers in () time, the decision question of whether a given pattern of length occurs in the text. Each of these structures uses a different technique, either in the storage scheme or in the search algorithm, in reducing the space requirement. The first one uses a suffix array, a sparse suffix tree, and a table structure. Finding all the occurrences of a pattern using this structure takes O( m + s) time, where s is the number of occurrences of the pattern in the text. The second structure constructs a sparse suffix tree for all the suffixes that start with the bit that occurs more times in the given binary text. The last structure uses an iterative algorithm to search for the pattern. This structure is the first o( n lg n) bit index to support the decision version of indexing queries in time linear in the length of the pattern. But this does not support the general indexing queries where we want to find the position of the occurrence of the pattern. Our main contribution is the development of techniques to use the succinct tree representation through balanced parentheses for suffix trees.

There are no comments yet on this publication. Be the first to share your thoughts.