skip to main content
research-article
Public Access

Low-Rank Methods for Parallelizing Dynamic Programming Algorithms

Published:24 February 2016Publication History
Skip Abstract Section

Abstract

This article proposes efficient parallel methods for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages or wavefronts. The methods presented in this article provide additional parallelism allowing multiple stages to be computed in parallel despite dependencies among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation.

This article demonstrates the efficiency of the parallel algorithm by showing significant speedups on a variety of important dynamic programming problems. In particular, the parallel Viterbi decoder is up to 24× faster (with 64 processors) than a highly optimized commercial baseline.

References

  1. L. Allison and T. I. Dix. 1986. A bit-string longest-common-subsequence algorithm. Inform. Process. Lett. 23, 6 (Dec. 1986), 305--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Aluru, N. Futamura, and K. Mehrotra. 2003. Parallel biological sequence comparison using prefix computations. J. Parallel Distrib. Comput. 63, 3 (2003), 264--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Apostolico, M. J. Atallah, L. L. Larmore, and S. McFaddin. 1990. Efficient parallel algorithms for string editing and related problems. SIAM J. Comput. 19, 5 (1990), 968--988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bellman. 1957. Dynamic Programming. Princeton University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, and J. F. Reid. 2001. A fast and practical bit-vector algorithm for the longest common subsequence problem. Inform. Process. Lett. 80, 6 (2001), 279--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Deorowicz. 2010. Bit-parallel algorithm for the constrained longest common subsequence problem. Fundamenta Informaticae 99, 4 (2010), 409--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Develin, F. Santos, and B. Sturmfels. 2005. On the rank of a tropical matrix. Combinatorial Computat. Geom. 52 (2005), 213--242.Google ScholarGoogle Scholar
  8. M. Farrar. 2007. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23, 2 (2007), 156--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Fettweis and H. Meyr. 1989. Parallel Viterbi algorithm implementation: Breaking the ACS-bottleneck. IEEE Trans. Commun. 37, 8 (1989), 785--790.Google ScholarGoogle ScholarCross RefCross Ref
  10. Z. Galil and K. Park. 1994. Parallel algorithms for dynamic programming recurrences with more than O(1) dependency. J. Parallel Distrib. Comput. 21, 2 (1994), 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Daniel Hillis and G. L. Steele, Jr. 1986. Data parallel algorithms. Commun. ACM 29, 12 (Dec. 1986), 1170--1183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. S. Hirschberg. 1975. A linear space algorithm for computing maximal common subsequences. Commun. ACM 18, 6 (June 1975), 341--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Hyyro. 2004. Bit-parallel LCS-length computation revisited. In Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms. 16--27.Google ScholarGoogle Scholar
  14. Intel C/C++ Compiler. 2013. Intel C/C++ Compiler. Retrieved from http://software.intel.com/en-us/c-compilers.Google ScholarGoogle Scholar
  15. Intel MPI Library. 2013. Intel MPI Library. Retrieved from http://software.intel.com/en-us/intel-mpi-library/.Google ScholarGoogle Scholar
  16. R. E. Ladner and M. J. Fischer. 1980. Parallel prefix computation. J. ACM 27, 4 (Oct. 1980), 831--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. T. S. Li, W. Shum, and K. Truong. 2007. 160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA). BMC Bioinform. 8, 1 (2007), 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Ligowski and W. Rudnicki. 2009. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing (IPDPS’09). 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Maleki, M. Musuvathi, and T. Mytkowicz. 2014. Parallelizing dynamic programming through rank convergence. SIGPLAN Not. 49, 8 (Feb. 2014), 219--232. DOI:http://dx.doi.org/10.1145/2692916.2555264 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. S. Martins, J. B. Del Cuvillo, F. J. Useche, K. B. Theobald, and G. R. Gao. 2001. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison. In Proceedings of the Pacific Symposium on Biocomputing. 311--322.Google ScholarGoogle Scholar
  21. Y. Muraoka. 1971. Parallelism Exposure and Exploitation in Programs. Ph.D. Dissertation. University of Illinois at Urbana-Champaign. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. MVAPICH: MPI over InfiniBand. 2013. MVAPICH: MPI over InfiniBand. Retrieved from http://mvapich.cse.ohio-state.edu/.Google ScholarGoogle Scholar
  23. National Center for Biotechnology Information. 2013. National Center for Biotechnology Information. Retrieved from http://www.ncbi.nlm.nih.gov/.Google ScholarGoogle Scholar
  24. S. B. Needleman and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molec. Biol. 48 (1970), 443--453. Issue 3.Google ScholarGoogle ScholarCross RefCross Ref
  25. W. Wesley Peterson and E. J. Weldon. 1972. Error-Correcting Codes. MIT Press: Cambridge, MA.Google ScholarGoogle Scholar
  26. M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. 2005. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, Special Issue on “Program Generation, Optimization, and Adaptation” 93 (2005), 232--275.Google ScholarGoogle Scholar
  27. T. F. Smith and M. S. Waterman. 1981. Identification of common molecular subsequences. J. Molec. Biol. 147, 1 (1981), 195--197.Google ScholarGoogle ScholarCross RefCross Ref
  28. Alex Stivala, Peter J. Stuckey, Maria de la Banda Garcia, Manuel Hermenegildo, and Anthony Wirth. 2010. Lock-free parallel dynamic programming. J. Parallel Distrib. Comput. 70, 8 (2010), 839--848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Texas Advanced Computing Center. Stampede: Dell PowerEdge C8220 Cluster with Intel Xeon Phi Coprocessors. Texas Advanced Computing Center. Retrieved from http://www.tacc.utexas.edu/resources/hpc.Google ScholarGoogle Scholar
  30. Top500 Supercompute Sites. 2013. Top500 Supercompute Sites. Retrieved from http://www.top500.org.Google ScholarGoogle Scholar
  31. L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. 1983. Fast parallel computation of polynomials using few processors. SIAM J. Comput. 12, 4 (1983), 641--644.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. J. Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13, 2 (1967), 260--269. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Low-Rank Methods for Parallelizing Dynamic Programming Algorithms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Parallel Computing
        ACM Transactions on Parallel Computing  Volume 2, Issue 4
        Special Issue on PPOPP 2014
        March 2016
        202 pages
        ISSN:2329-4949
        EISSN:2329-4957
        DOI:10.1145/2888415
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 February 2016
        • Accepted: 1 December 2015
        • Revised: 1 November 2015
        • Received: 1 January 2015
        Published in topc Volume 2, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader