skip to main content
10.1145/2259016.2259027acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
research-article

Scan detection and parallelization in "inherently sequential" nested loop programs

Published:31 March 2012Publication History

ABSTRACT

Most automatic parallelizers are based on detection of independent computations, and most of them cannot do anything if there is a true dependence between computations. However, this can be surmounted for programs that perform prefix computations (scans). We present a method for automatically parallelizing such "inherently sequential" programs. Our method, which handles arbitrarily nested loops, identifies situations where the computation performed by the loop body is equivalent to a matrix vector product over a semi-ring. We also deal with mutually dependent variables in the loop. Our method is implemented in a polyhedral program transformation and code generation system and generates OpenMP code. We also present strategies to improve the performance of the generated code, an analytical performance model for the expected speedup, as well as a method to choose the parallelization parameters optimally. We show experimentally that the scan parallelizations performed by our system are effective, yielding linear (iso-efficient) speedup in situations where no other parallelism is available.

References

  1. U. Bondhugula and J. Ramanujam, "Pluto: A practical and fully automatic polyhedral program optimization system," in Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. "Par4all," http://www.par4all.org/.Google ScholarGoogle Scholar
  3. L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan, Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework. New Orleans, LA: IEEE Computer Society Press, 2010.Google ScholarGoogle Scholar
  4. U. of Illinois at Urbana-Champaign, "Polaris: Automatic parallelization of conventional fortran programs," http://polaris.cs.uiuc.edu/polaris/polaris-old.html.Google ScholarGoogle Scholar
  5. P. University, "Cetus: A source-to-source compiler infrastructure for c programs," http://cetus.ecn.purdue.edu/.Google ScholarGoogle Scholar
  6. G. E. Blelloch, "Scans as primitive parallel operations," IEEE Trans. Comput., vol. 38(11), pp. 1526--1538, November 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Redon and P. Feautrier, "Scheduling reductions," in Proceedings of the 8th international conference on Supercomputing, ser. ICS '94. New York, NY, USA: ACM, 1994, pp. 117--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. M. Kogge and H. S. Stone, "A parallel algorithm for the efficient solution of a general class of recurrence equations," IEEE Trans. Comput., vol. 22(8), pp. 786--793, August 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Matsuzaki, Z. Hu, and M. Takeichi, "Towards automatic parallelization of tree reductions in dynamic programming," in Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, ser. SPAA '06. New York, NY, USA: ACM, 2006, pp. 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S.-C. K. D. N. Xu and Z. Hu, "Ptype system: A featherweight parallelizability detector," in proceedings of 2d asian symposium on programming languages and systems (APLAS 2004), LNCS 3302. Springer, LNCS, 2004, pp. 197--212.Google ScholarGoogle Scholar
  11. A. L. Fisher and A. M. Ghuloum, "Parallelizing complex scans and reductions," in Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, ser. PLDI '94. New York, NY, USA: ACM, 1994, pp. 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Sato and H. Iwasaki, "Automatic parallelization via matrix multiplication," in Proc. 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI 2011), to appear, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Redon and P. Feautrier, "Detection of recurrences in sequential programs with loops," in Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe, ser. PARLE '93. London, UK: Springer-Verlag, 1993, pp. 132--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Feautrier, "Automatic parallelization in the polytope model," The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications, pp. 79--103, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Bastoul, "Code generation in the polyhedral model is easier than you think," in Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. S. University, "Alphaz," https://www.cs.colostate.edu/AlphaZ/wiki/doku.php.Google ScholarGoogle Scholar
  17. P. Feautrier, "Dataflow analysis of array and scalar references," International Journal of Parallel Programming, vol. 20, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Bentley, Programming pearls, ser. ACM Press Series. Addison-Wesley, 2000.Google ScholarGoogle Scholar
  19. S. Sato, "Automatic parallelization via matrix multiplication," Master's thesis, The University of Electro-Communications, Japan, 2011.Google ScholarGoogle Scholar
  20. D. Merrill and A. Grimshaw, "Parallel scan for stream architectures," Technical Report CS2009-14, pp. 373--381, 2009.Google ScholarGoogle Scholar
  21. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, "Scan primitives for gpu computing," Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pp. 97--106, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Clauss and V. Loechner, "Parametric analysis of polyhedral iteration spaces," J. VLSI Signal Process. Syst., vol. 19, pp. 179--194, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. J. Ypma, "Historical development of the newton-raphson method," SIAM Rev., vol. 37, pp. 531--551, December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. M. Karp, R. E. Miller, and S. Winograd, "The organization of computations for uniform recurrence equations," J. ACM, vol. 14(3), pp. 563--590, July 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Fettweis, L. Thiele, and G. Meyr, "Algorithm transformations for unlimited parallelism," in Circuits and Systems, 1990., IEEE International Symposium on, may 1990, pp. 1756--1759 vol.3.Google ScholarGoogle Scholar
  26. G. Fettweis and H. Meyr, "High-speed parallel viterbi decoding: algorithm and vlsi-architecture," Communications Magazine, IEEE, vol. 29, no. 5, pp. 46--55, may 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. E. Ladner and M. J. Fischer, "Parallel prefix computation," J. ACM, vol. 27(4), pp. 831--838, October 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. L. Fisher and A. M. Ghuloum, "Parallelizing complex scans and reductions," in Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, ser. PLDI '94. New York, NY, USA: ACM, 1994, pp. 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. L. L. Han and J. M. Tuck, "Speculative parallelization of partial reduction variables," pp. 141--150, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scan detection and parallelization in "inherently sequential" nested loop programs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
      March 2012
      285 pages
      ISBN:9781450312066
      DOI:10.1145/2259016

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 March 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CGO '12 Paper Acceptance Rate26of90submissions,29%Overall Acceptance Rate312of1,061submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader