This thesis deals with repetitions in strings, tandem or independent. Tandem repetitions are encoded into maximal repetitions, called runs. We investigate the "runs" conjecture, which claims that the maximum number of runs in a string of length n is at most n . We almost solve the conjecture by proving the bound 1.029 n using a combination of theory and computer verification. This bound is by far the best one and is sufficient for all practical purposes.
For independent repetitions, we consider the longest common extension (LCE) problem that, given a string s and two positions i and j , asks for the longest common prefixes of the suffixes of s that start at i and j , respectively. We give very simple algorithms that use up to 24 times less space and are 5 times faster in practice.
An application of our fast LCE algorithm to approximate string search is presented. We give a modification of the algorithm of Landau and Vishkin that uses 5.6 times less space and runs up to 20 times faster in practice.
Recommendations
Strongly k-Abelian Repetitions
Proceedings of the 9th International Conference on Combinatorics on Words - Volume 8079We consider with a new point of view the notion of nth powers in connection with the k-abelian equivalence of words. For a fixed natural number k, words u and v are k-abelian equivalent if every factor of length at most k occurs in u as many times as in ...
Factorizing Strings into Repetitions
AbstractA factorization f1,…,fm of a string w is called a repetition factorization of w if each factor fi is a repetition, namely, for some non-empty string x, an integer k ≥ 2, and being a proper prefix of x. Dumitran et al. (Proc. SPIRE 2015) ...
Maximal repetitions in strings
The cornerstone of any algorithm computing all repetitions in strings of length n in O(n) time is the fact that the number of maximal repetitions (runs) is linear. Therefore, the most important part of the analysis of the running time of such algorithms ...