research-article

Scan detection and parallelization in "inherently sequential" nested loop programs

Authors:
Yun Zou

Colorado State University

Colorado State University
View Profile

,
Sanjay Rajopadhye

Colorado State University

Colorado State University
View Profile

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and OptimizationMarch 2012Pages 74–83https://doi.org/10.1145/2259016.2259027

Published:31 March 2012Publication History

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization

Pages 74–83

ABSTRACT

Most automatic parallelizers are based on detection of independent computations, and most of them cannot do anything if there is a true dependence between computations. However, this can be surmounted for programs that perform prefix computations (scans). We present a method for automatically parallelizing such "inherently sequential" programs. Our method, which handles arbitrarily nested loops, identifies situations where the computation performed by the loop body is equivalent to a matrix vector product over a semi-ring. We also deal with mutually dependent variables in the loop. Our method is implemented in a polyhedral program transformation and code generation system and generates OpenMP code. We also present strategies to improve the performance of the generated code, an analytical performance model for the expected speedup, as well as a method to choose the parallelization parameters optimally. We show experimentally that the scan parallelizations performed by our system are effective, yielding linear (iso-efficient) speedup in situations where no other parallelism is available.

References

U. Bondhugula and J. Ramanujam, "Pluto: A practical and fully automatic polyhedral program optimization system," in Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), 2008. Google ScholarDigital Library
"Par4all," http://www.par4all.org/.Google Scholar
L.-N. Pouchet, U. Bondhugula, C. Bastoul, A. Cohen, J. Ramanujam, and P. Sadayappan, Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework. New Orleans, LA: IEEE Computer Society Press, 2010.Google Scholar
U. of Illinois at Urbana-Champaign, "Polaris: Automatic parallelization of conventional fortran programs," http://polaris.cs.uiuc.edu/polaris/polaris-old.html.Google Scholar
P. University, "Cetus: A source-to-source compiler infrastructure for c programs," http://cetus.ecn.purdue.edu/.Google Scholar
G. E. Blelloch, "Scans as primitive parallel operations," IEEE Trans. Comput., vol. 38(11), pp. 1526--1538, November 1989. Google ScholarDigital Library
X. Redon and P. Feautrier, "Scheduling reductions," in Proceedings of the 8th international conference on Supercomputing, ser. ICS '94. New York, NY, USA: ACM, 1994, pp. 117--125. Google ScholarDigital Library
P. M. Kogge and H. S. Stone, "A parallel algorithm for the efficient solution of a general class of recurrence equations," IEEE Trans. Comput., vol. 22(8), pp. 786--793, August 1973. Google ScholarDigital Library
K. Matsuzaki, Z. Hu, and M. Takeichi, "Towards automatic parallelization of tree reductions in dynamic programming," in Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, ser. SPAA '06. New York, NY, USA: ACM, 2006, pp. 39--48. Google ScholarDigital Library
S.-C. K. D. N. Xu and Z. Hu, "Ptype system: A featherweight parallelizability detector," in proceedings of 2d asian symposium on programming languages and systems (APLAS 2004), LNCS 3302. Springer, LNCS, 2004, pp. 197--212.Google Scholar
A. L. Fisher and A. M. Ghuloum, "Parallelizing complex scans and reductions," in Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, ser. PLDI '94. New York, NY, USA: ACM, 1994, pp. 135--146. Google ScholarDigital Library
S. Sato and H. Iwasaki, "Automatic parallelization via matrix multiplication," in Proc. 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI 2011), to appear, 2011. Google ScholarDigital Library
X. Redon and P. Feautrier, "Detection of recurrences in sequential programs with loops," in Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe, ser. PARLE '93. London, UK: Springer-Verlag, 1993, pp. 132--145. Google ScholarDigital Library
P. Feautrier, "Automatic parallelization in the polytope model," The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications, pp. 79--103, 1996. Google ScholarDigital Library
C. Bastoul, "Code generation in the polyhedral model is easier than you think," in Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT '04. Washington, DC, USA: IEEE Computer Society, 2004, pp. 7--16. Google ScholarDigital Library
C. S. University, "Alphaz," https://www.cs.colostate.edu/AlphaZ/wiki/doku.php.Google Scholar
P. Feautrier, "Dataflow analysis of array and scalar references," International Journal of Parallel Programming, vol. 20, 1991.Google ScholarCross Ref
J. Bentley, Programming pearls, ser. ACM Press Series. Addison-Wesley, 2000.Google Scholar
S. Sato, "Automatic parallelization via matrix multiplication," Master's thesis, The University of Electro-Communications, Japan, 2011.Google Scholar
D. Merrill and A. Grimshaw, "Parallel scan for stream architectures," Technical Report CS2009-14, pp. 373--381, 2009.Google Scholar
S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens, "Scan primitives for gpu computing," Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pp. 97--106, 2007. Google ScholarDigital Library
P. Clauss and V. Loechner, "Parametric analysis of polyhedral iteration spaces," J. VLSI Signal Process. Syst., vol. 19, pp. 179--194, July 1998. Google ScholarDigital Library
T. J. Ypma, "Historical development of the newton-raphson method," SIAM Rev., vol. 37, pp. 531--551, December 1995. Google ScholarDigital Library
R. M. Karp, R. E. Miller, and S. Winograd, "The organization of computations for uniform recurrence equations," J. ACM, vol. 14(3), pp. 563--590, July 1967. Google ScholarDigital Library
G. Fettweis, L. Thiele, and G. Meyr, "Algorithm transformations for unlimited parallelism," in Circuits and Systems, 1990., IEEE International Symposium on, may 1990, pp. 1756--1759 vol.3.Google Scholar
G. Fettweis and H. Meyr, "High-speed parallel viterbi decoding: algorithm and vlsi-architecture," Communications Magazine, IEEE, vol. 29, no. 5, pp. 46--55, may 1991. Google ScholarDigital Library
R. E. Ladner and M. J. Fischer, "Parallel prefix computation," J. ACM, vol. 27(4), pp. 831--838, October 1980. Google ScholarDigital Library
A. L. Fisher and A. M. Ghuloum, "Parallelizing complex scans and reductions," in Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, ser. PLDI '94. New York, NY, USA: ACM, 1994, pp. 135--146. Google ScholarDigital Library
W. L. L. Han and J. M. Tuck, "Speculative parallelization of partial reduction variables," pp. 141--150, 2010. Google ScholarDigital Library

Index Terms

Scan detection and parallelization in "inherently sequential" nested loop programs
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
March 2012
285 pages
ISBN:9781450312066
DOI:10.1145/2259016
General Chairs:
Carol Eidt
Microsoft
,
Anne Holler
VMware
,
Program Chairs:
Uma Srinivasan
Intel
,
Saman Amarasinghe
MIT
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 March 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
CGO '12 Paper Acceptance Rate26of90submissions,29%Overall Acceptance Rate312of1,061submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 361
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scan detection and parallelization in "inherently sequential" nested loop programs

CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimal Loop Parallelization

Advanced loop parallelization: dependence uniformization and trapezoid self-scheduling

Advanced loop parallelization theory and techniques