skip to main content
research-article
Public Access

Efficient parallelization using rank convergence in dynamic programming algorithms

Published:22 September 2016Publication History
Skip Abstract Section

Abstract

This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman--Wunsch, Smith--Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages, or wavefronts. The algorithm presented in this paper provides additional parallelism allowing multiple stages to be computed in parallel despite dependences among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation.

This paper demonstrates the efficiency of the parallel algorithm by showing significant speedups on a variety of important dynamic programming problems. In particular, the parallel Viterbi decoder is up to 24× faster (with 64 processors) than a highly optimized commercial baseline.<!-- END_PAGE_1 -->

References

  1. Apostolico, A., Atallah, M.J., Larmore, L.L., McFaddin, S. Efficient parallel algorithms for string editing and related problems. SIAM J. Comput. 19, 5 (1990), 968--988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bellman, R. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Deorowicz, S. Bit-parallel algorithm for the constrained longest common subsequence problem. Fund. Inform. 99, 4 (2010), 409--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Develin, M., Santos, F., Sturmfels, B. On the rank of a tropical matrix. Combin. Comput Geom. 52 (2005), 213--242.Google ScholarGoogle Scholar
  5. Farrar, M. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23, 2 (2007), 156--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fettweis, G., Meyr, H. Feedforward architectures for parallel Viterbi decoding. J. VLSI Signal Process. Syst. 3, 1--2 (June 1991), 105--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fettweis, G., Meyr, H. High-speed parallel Viterbi decoding: algorithm and VLSI-architecture. Commun. Mag. IEEE 29, 5 (1991), 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Galil, Z., Park, K. Parallel algorithms for dynamic programming recurrences with more than O(1) dependency. J. Parallel Distrib. Comput. 21, 2 (1994), 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hillis, W.D., Steele, G.L., Jr. Data parallel algorithms. Commun. ACM29, 12 (Dec. 1986), 1170--1183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hirschberg, D.S. A linear space algorithm for computing maximal common subsequences. Commun. ACM18, 6 (June 1975), 341--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hyyro, H. Bit-parallel LCS-length computation revisited. In Proceedings of the 15th Australasian Workshop on Combinatorial Algorithms (2004), 16--27.Google ScholarGoogle Scholar
  12. Ladner, R.E., Fischer, M.J. Parallel prefix computation. J. ACM 27, 4 (Oct. 1980), 831--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Maleki, S., Musuvathi, M., Mytkowicz, T. Parallelizing dynamic programming through rank convergence. In Proceedings of the 19th ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14 (New York, NY, USA, 2014). ACM, 219--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Martins, W.S., Cuvillo, J.B.D., Useche, F.J., Theobald, K.B., Gao, G. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison. In Pacific Symposium on Biocomputing (2001), 311--322.Google ScholarGoogle Scholar
  15. Muraoka, Y. Parallelism exposure and exploitation in programs. PhD thesis, University of Illinois at Urbana-Champaign (1971). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/.2013.Google ScholarGoogle Scholar
  17. Needleman, S.B., Wunsch, C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48 (1970) 443--453.Google ScholarGoogle ScholarCross RefCross Ref
  18. Puschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N. SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, Special Issue on "Program Generation, Optimization, and Adaptation" 93 (2005) 232--275.Google ScholarGoogle ScholarCross RefCross Ref
  19. Smith, T., Waterman, M. Identification of common molecular subsequences. J. Mol. Biol. 147, 1 (1981), 195--197.Google ScholarGoogle ScholarCross RefCross Ref
  20. Stivala, A., Stuckey, P.J., Garcia de la Banda, M., Hermenegildo, M., Wirth, A. Lock-free parallel dynamic programming. J. Parallel Distrib. Comput. 70, 8 (Aug. 2010), 839--848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Texas Advanced Computing Center, http://www.tacc.utexas.edu/resources/hpc. Stampede: Dell PowerEdge C8220 Cluster with Intel Xeon Phi coprocessors.Google ScholarGoogle Scholar
  22. Valiant, L.G., Skyum, S., Berkowitz, S., Rackoff, C. Fast parallel computation of polynomials using few processors. SIAM J. Comput. 12, 4 (1983), 641--644.Google ScholarGoogle ScholarCross RefCross Ref
  23. Viterbi, A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory 13, 2 (1967), 260--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Viterbi, A.J., Omura, J.K. Principles of Digital Communication and Coding. Communications and information theory. McGraw-Hill, New York, 1979. Autre réimpr.: 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient parallelization using rank convergence in dynamic programming algorithms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 59, Issue 10
        October 2016
        85 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/3001840
        • Editor:
        • Moshe Y. Vardi
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 September 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format