skip to main content
10.1145/2807591.2807609acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Free Access

Parallel distributed memory construction of suffix and longest common prefix arrays

Published:15 November 2015Publication History

ABSTRACT

Suffix arrays and trees are fundamental string data structures of importance to many applications in computational biology. Consequently, their parallel construction is an actively studied problem. To date, algorithms with best practical performance lack efficient worst-case run-time guarantees, and vice versa. In addition, much of the recent work targeted low core count, shared memory parallelization. In this paper, we present parallel algorithms for distributed memory construction of suffix arrays and longest common prefix (LCP) arrays that simultaneously achieve good worst-case run-time bounds and superior practical performance. Our algorithms run in O(Tsort(n, p) · log n) worst-case time where Tsort(n, p) is the run-time of parallel sorting. We present several algorithm engineering techniques that improve performance in practice. We demonstrate the construction of suffix and LCP arrays of the human genome in less than 8 seconds on 1,024 Intel Xeon cores, reaching speedups of over 110X compared to the best sequential suffix array construction implementation divsufsort.

References

  1. A. Abdelhadi, A. Kandil, and M. Abouelhoda. Cloud-based parallel suffix array construction based on mpi. In Biomedical Engineering (MECBME), 2014 Middle East Conference on, pages 334--337. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms, 2(1):53--86, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In SODA, volume 97, pages 360--369, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and M. Zagha. A comparison of sorting algorithms for the connection machine cm-2. In Proceedings of the third annual ACM symposium on Parallel algorithms and architectures, pages 3--16. ACM, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Comin and M. Farreras. Efficient parallel construction of suffix trees for genomes larger than main memory. In Proceedings of the 20th European MPI Users' Group Meeting, pages 211--216. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. P. Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422):56--65, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Deo and S. Keely. Parallel suffix array and least common prefix for the gpu. In ACM SIGPLAN Notices, volume 48, pages 197--206. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Fischer and V. Heun. A new succinct representation of rmq-information and improvements in the enhanced suffix array. In Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, pages 459--470. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Futamura, S. Aluru, and S. Kurtz. Parallel suffix sorting. pages 76--81, 2001.Google ScholarGoogle Scholar
  10. A. Ghoting and K. Makarychev. Indexing genomic sequences on the ibm blue gene. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, page 61. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Ghoting and K. Makarychev. Serial and parallel methods for i/o efficient suffix tree construction. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 827--840. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Hariharan. Optimal parallel suffix tree construction. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 290--299. ACM, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Homann, D. Fleer, R. Giegerich, and M. Rehmsmeier. mkesa: enhanced suffix array construction tool. Bioinformatics, 25(8):1084--1085, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. JaJa and K. W. Ryu. Optimal algorithms on the pipelined hypercube and related networks. Parallel and Distributed Systems, IEEE Transactions on, 4(5):582--591, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Kärkkäinen and P. Sanders. Simple linear work suffix array construction. In Automata, Languages and Programming, pages 943--955. Springer, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  16. R. M. Karp, R. E. Miller, and A. L. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the fourth annual ACM symposium on Theory of computing, pages 125--136. ACM, 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Combinatorial pattern matching, pages 181--192. Springer, 2001. Google ScholarGoogle ScholarCross RefCross Ref
  18. P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In Combinatorial Pattern Matching, pages 200--210. Springer, 2003. Google ScholarGoogle ScholarCross RefCross Ref
  19. F. Kulla and P. Sanders. Scalable parallel suffix array construction. Parallel Computing, 33(9):605--612, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. J. Larsson and K. Sadakane. Faster suffix sorting. Theoretical Computer Science, 387(3):258--272, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. siam Journal on Computing, 22(5):935--948, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Mansour, A. Allam, S. Skiadopoulos, and P. Kalnis. Era: efficient serial and parallel suffix tree construction for very long strings. Proceedings of the VLDB Endowment, 5(1):49--60, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Manzini and P. Ferragina. Engineering a lightweight suffix array construction algorithm. Algorithmica, 40(1):33--50, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Mohamed and M. Abouelhoda. Parallel suffix sorting based on bucket pointer refinement. In Biomedical Engineering Conference (CIBEC), 2010 5th Cairo International, pages 98--102. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  25. Y. Mori. libdivsufsort. https://github.com/y-256/libdivsufsort.Google ScholarGoogle Scholar
  26. B. Nystedt, N. R. Street, A. Wetterbom, A. Zuccolo, Y.-C. Lin, D. G. Scofield, F. Vezzi, N. Delhomme, S. Giacomello, A. Alexeyenko, et al. The norway spruce genome sequence and conifer genome evolution. Nature, 497(7451):579--584, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  27. V. Osipov. Parallel suffix array construction for shared memory architectures. In String Processing and Information Retrieval, pages 379--384. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. J. Puglisi, W. F. Smyth, and A. H. Turpin. A taxonomy of suffix array construction algorithms. ACM Computing Surveys (CSUR), 39(2):4, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K.-B. Schürmann and J. Stoye. An incomplex algorithm for fast suffix array construction. Software: Practice and Experience, 37(3):309--329, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Shun. Fast parallel computation of longest common prefixes. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 387--398. IEEE Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallel distributed memory construction of suffix and longest common prefix arrays

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
          November 2015
          985 pages
          ISBN:9781450337236
          DOI:10.1145/2807591
          • General Chair:
          • Jackie Kern,
          • Program Chair:
          • Jeffrey S. Vetter

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 November 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SC '15 Paper Acceptance Rate79of358submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader