Incremental algorithms for computing the set of period sets
- Authors
- Publication Date
- Feb 02, 2024
- Source
- HAL-Descartes
- Keywords
- Language
- English
- License
- Unknown
- External links
Abstract
Overlaps between strings are crucial in many areas of computer science, such as bioinformatics, code design, and stringology. A self overlapping string is characterized by its periods and borders. A period of a string $u$ is the starting position of a suffix of $u$ that is also a prefix $u$, and such a suffix is called a border. Each word of length, say $n>0$, has a set of periods, but not all combinations of integers are sets of periods. The question we address is how to compute the set, denoted $\Gamma_n$, of all period sets of strings of length $n$. Computing the period set for all possible words of length $n$ is clearly prohibitive. The cardinality of $\Gamma_n$ is exponential in $n$. One dynamic programming algorithm exists for enumerating $\Gamma_n$, but it suffers from an expensive space complexity. After stating some combinatorial properties of period sets, we present a novel algorithm that computes $\Gamma_n$ from $\Gamma_{n-1}$, for any length $n>1$.The period set of a string $u$ is a key information for computing the absence probability of $u$ in random texts. Hence, computing $\Gamma_n$ is useful for assessing the significance of word statistics, such as the number of missing $k$-mers in a random text, or the number of shared $k$-mers between two random texts. Besides applications, investigating $\Gamma_n$ is interesting per se as it unveils combinatorial properties of string overlaps.