review-article

Free Access

40 years of suffix trees

Authors:
Alberto Apostolico

View Profile

,
Maxime Crochemore

King's College London and Université Paris-Est, France

King's College London and Université Paris-Est, France
View Profile

,
Martin Farach-Colton

Rutgers University, Piscataway, NJ

Rutgers University, Piscataway, NJ
View Profile

,
Zvi Galil

Georgia Institute of Technology, Atlanta, GA

Georgia Institute of Technology, Atlanta, GA
View Profile

,
S. Muthukrishnan

Rutgers University, Piscataway, NJ

Rutgers University, Piscataway, NJ
View Profile

Authors Info & Claims

Communications of the ACM Volume 59 Issue 4April 2016pp 66–73https://doi.org/10.1145/2810036

Published:23 March 2016Publication History

Communications of the ACM

Abstract

Tracing the first four decades in the life of suffix trees, their many incarnations, and their applications.

Supplemental Material

Available for Download

pdf

suffixtreessupplemental.pdf (28.5 KB)

A list of resources and references to learn more about suffix trees

References

Amir, A., Benson, G. and Farach, M. Let sleeping files lie: Pattern matching in Z-compressed files. In Proceedings of the 5^th ACM-SIAM Annual Symposium on Discrete Algorithms (Arlington, VA, 1994), 705--714. Google ScholarDigital Library
Apostolico, A. The myriad virtues of suffix trees. Combinatorial Algorithms on Words, vol. 12 of NATO Advanced Science Institutes, Series F. A. Apostolico and Z. Galil, Eds. Springer-Verlag, Berlin, 1985, 85--96. Google ScholarDigital Library
Apostolico, A., Bock, M.E. and Lonardi, S. Monotony of surprise and large-scale quest for unusual words. J. Computational Biology 10, 3 / 4 (2003), 283--311.Google ScholarCross Ref
Apostolico, A., Denas, O. and Dress, A. Efficient tools for comparative substring analysis. J. Biotechnology 149, 3 (2010), 120--126.Google ScholarCross Ref
Apostolico, A. and Preparata, F.P. Optimal off-line detection of repetitions in a string. Theor. Comput. Sci. 22, 3 (1983), 297--315.Google ScholarCross Ref
Apostolico, A. and Preparata, F.P. Data structures and algorithms for the strings statistics problem. Algorithmica 15, 5 (May 1996), 481--494. Google ScholarDigital Library
Baker, B.S. Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Comput. 26, 5 (1997), 1343--1362. Google ScholarDigital Library
Béal, M.-P., Mignosi, F. and Restivo, A. Minimal forbidden words and symbolic dynamics. In Proceedings of the 13^th Annual Symposium on Theoretical Aspects of Computer Science, vol. 1046 of Lecture Notes in Computer Science (Grenoble, France, Feb. 22--24, 1996). Springer, 555--566. Google ScholarDigital Library
Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Chen, M.T. and Seiferas, J. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 40, 1 (1985), 31--55.Google ScholarCross Ref
Brodal, G.S., Lyngsø, R.B., Östlin, A. and Pedersen, C.N.S. Solving the string statistics problem in time O (n log n). In Proceedings of the 29^th International Colloquium on Automata, Languages and Programming, vol. 2380 of Lecture Notes in Computer Science (Malaga, Spain, July 8--13, 2002). Springer, 728--739. Google ScholarDigital Library
Burrows, M. and Wheeler, D.J. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corp., May 1994.Google Scholar
Chairungsee, S. and Crochemore, M. Using minimal absent words to build phylogeny. Theoretical Computer Science 450, 1 (2012), 109--116. Google ScholarDigital Library
Clark, D.R. and Munro, J.I. Efficient suffix trees on secondary storage. In Proceedings of the 7^th ACM-SIAM Annual Symposium on Discrete Algorithms, (Atlanta, GA, 1996), 383--391. Google ScholarDigital Library
Crochemore, M. Transducers and repetitions. Theor. Comput. Sci., 45, 1 (1986), 63--86. Google ScholarDigital Library
Crochemore, M., Mignosi, F. and Restivo, A. Automata and forbidden words. Information Processing Letters 67, 3 (1998), 111--117. Google ScholarDigital Library
Crochemore, M., Mignosi, F., Restivo, A and Salemi, S. Data compression using antidictonaries. In Proceedings of the IEEE: Special Issue Lossless Data Compression 88, 11 (2000). J. Storer, Ed., 1756--1768.Google Scholar
Farach, M. Optimal suffix tree construction with large alphabets. In Proceedings of the 38^th IEEE Annual Symposium on Foundations of Computer Science (Miami Beach, FL, 1997), 137--143. Google ScholarDigital Library
Ferragina, P., Luccio, F., Manzini, G. and Muthukrishnan, S. Compressing and indexing labeled trees with applications. JACM 57, 1 (2009). Google ScholarDigital Library
Ferragina, P. and Manzini, G. Opportunistic data structures with applications. In FOCS (2000), 390--398. Google ScholarDigital Library
Grossi, R., Gupta, A. and Vitter, J.S. High-order entropy-compressed text indexes. In SODA (2003), 841--850. Google ScholarDigital Library
Grossi, R. and Vitter, J.S. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proceedings ACM Symposium on the Theory of Computing (Portland, OR, 2000). ACM Press, 397--406). Google ScholarDigital Library
Gusfield, D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge, U.K., 1997. Google ScholarCross Ref
Harel, D. and Tarjan, R.E. Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13, 2 (1984), 338--355. Google ScholarDigital Library
Hon, W.-K., Shah, R. and Vitter, J.S. Space-efficient framework for top-k string retrieval problems. In FOCS. IEEE Computer Society, 2009, 713--722. Google ScholarDigital Library
Hui, L.C.K. Color set size problem with applications to string matching. In Proceedings of the 3^rd Annual Symposium on Combinatorial Pattern Matching, no. 644 in Lecture Notes in Computer Science, (Tucson, AZ, 1992). A. Apostolico, M. Crochemore, Z. Galil, and U. Manber, Eds. Springer-Verlag, Berlin, 230--243. Google ScholarDigital Library
Karp, R.M., Miller, R.E., and Rosenberg, A.L. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the 4^th ACM Symposium on the Theory of Computing (Denver, CO, 1972). ACM Press, 125--13. Google ScholarDigital Library
Kasai, T., Lee, G., Arimura, H., Arikawa, S. and Park, K. Linear-time longest-common-prefix computation in suffix arrays and its applications. CPM. Springer-Verlag, 2001, 181--192. Google ScholarDigital Library
Kurtz, S. Reducing the space requirements of suffix trees. Softw. Pract. Exp. 29, 13 (1999), 1149--1171. Google ScholarCross Ref
Landau, G.M. String matching in erroneus input. Ph.D. Thesis, Department of Computer Science, Tel-Aviv University, 1986.Google Scholar
Lempel, A. and Ziv, J. On the complexity of finite sequences. IEEE Trans. Inf. Theory 22 (1976), 75--81. Google ScholarDigital Library
Manber, U. and Myers, G. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1^st ACM-SIAM Annual Symposium on Discrete Algorithms (San Francisco, CA, 1990), 319--327. Google ScholarDigital Library
McCreight, E.M. A space-economical suffix tree construction algorithm. J. Algorithms 23, 2 (1976), 262--272. Google ScholarDigital Library
Muthukrishnan, S. Efficient algorithms for document listing problems. In Proceedings of the 13^th ACM-SIAM Annual Symposium on Discrete Algorithms (2002), 657--666. Google ScholarDigital Library
J. C. Na, P. Ferragina, R. Giancarlo, and K. Park. Two-dimensional pattern indexing. In Encyclopedia of Algorithms. 2008.Google ScholarCross Ref
Nong, G., Zhang, S. and Chan, W.H. Two efficient algorithms for linear time suffix array construction. IEEE Trans. Comput. 60, 10 (2011), 1471--1484. Google ScholarDigital Library
Poe, E.A. The Gold-Bug and Other Tales. Dover Thrift Editions Series. Dover, 1991.Google Scholar
Pratt, V. Improvements and applications for the Weiner repetition finder. Manuscript, 1975.Google Scholar
Rodeh, M., Pratt, V. and Even, S. Linear algorithm for data compression via string matching. J. Assoc. Comput. Mach. 28, 1 (1981), 16--24. Google ScholarDigital Library
Ukkonen, E. On-line construction of suffix trees. Algorithmica 14, 3 (1995), 249--260. Google ScholarDigital Library
Ulitsky, I., Burstein, D., Tuller, T. and Chor, B. The average common substring approach to phylogenomic reconstruction. J. Computational Biology 13, 2 (2006), 336--350.Google ScholarCross Ref
Weiner, P. Linear pattern matching algorithms. In Proceedings of the 14^th Annual IEEE Symposium on Switching and Automata Theory, (Washington, D.C., 1973), 1--11. Google ScholarDigital Library

Index Terms

40 years of suffix trees

Recommendations

On suffix extensions in suffix trees

Suffix trees are inherently asymmetric: prefix extensions only cause a few updates, while suffix extensions affect all suffixes causing a wave of updates. In his elegant linear-time on-line suffix tree algorithm Ukkonen relaxed the prevailing suffix ...
Read More
Computing suffix links for suffix trees and arrays

We present a new and simple algorithm to reconstruct suffix links in suffix trees and suffix arrays. The algorithm is based on observations regarding suffix tree construction algorithms. With our algorithm we bring suffix arrays even closer to the ease ...
Read More
Compressed suffix trees: Efficient computation and storage of LCP-values

The suffix tree is a very important data structure in string processing, but typical implementations suffer from huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 59, Issue 4
April 2016
87 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2907055
Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 March 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- review-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 30,090
  Total Downloads
- Downloads (Last 12 months)157
- Downloads (Last 6 weeks)108
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF Chinese translation

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

40 years of suffix trees

Communications of the ACM

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

On suffix extensions in suffix trees

Computing suffix links for suffix trees and arrays

Compressed suffix trees: Efficient computation and storage of LCP-values