Article

Sensitivity analysis for automatic parallelization on multi-cores

Authors:
Silvius Rus

Texas A&M University, College Station, TX

Texas A&M University, College Station, TX
View Profile

,
Maikel Pennings

Texas A&M University, College Station, TX

Texas A&M University, College Station, TX
View Profile

,
Lawrence Rauchwerger

Texas A&M University, College Station, TX

Texas A&M University, College Station, TX
View Profile

ICS '07: Proceedings of the 21st annual international conference on SupercomputingJune 2007Pages 263–273https://doi.org/10.1145/1274971.1275008

Published:17 June 2007Publication History

ICS '07: Proceedings of the 21st annual international conference on Supercomputing

Pages 263–273

ABSTRACT

Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when relevant program behavior is input sensitive. In this paper we show how SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. For example, SA can first attempt to validate parallelization by checking simple conditions related to loop bounds. If such simple conditions cannot be met, then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of a loop, thus decreasing the benefits of parallel execution.

We have implemented Sensitivity Analysis in the Polaris compiler and evaluated its performance using 22 industry standard benchmark codes running on two multicore systems. In most cases we have obtained speedups superior to the Intel Ifort compiler because with SA we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism.

References

G. Agrawal, J. H. Saltz, and R. Das. Interprocedural partial redundancy elimination and its application to distributed memory compilation. In Proc. of Prog. Lang. Design and Impl., pp. 258--269, La Jolla, CA, June 1995. Google ScholarDigital Library
U. Banerjee. Dependence Analysis for Supercomputing. Norwell, Mass.: Kluwer Academic Publishers, 1988. Google ScholarDigital Library
W. Blume, et. al. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78--82, 1996. Google ScholarDigital Library
W. Blume and R. Eigenmann. The Range Test: A Dependence Test for Symbolic, Non-linear Expressions. In Proc. of Supercomputing '94, Washington DC, pp. 528--537, Nov. 1994. Google ScholarDigital Library
B. Creusillet and F. Irigoin. Exact vs. approximate array region analyses. In Proc. of Workshop on Lang. and Comp. for Par. Computing, LNCS, vol. 1239, pp. 86--100, Springer 1996. Google ScholarDigital Library
P. Feautrier. Parametric integer programming. Operations Research, 22(3):243--268, 1988.Google ScholarCross Ref
P. Feautrier. Dataflow analysis of array and scalar references. Int. Journal of Par. Prog., 20(1):23--54, 1991.Google ScholarCross Ref
G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. In Proc. of Prog. Lang. Design and Impl., pp. 15--29, Toronto, ON, June 1991. Google ScholarDigital Library
M. Gupta, S. Mukhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. Int. Journal of Par. Prog., 28(6):537--562, 2000. Google ScholarCross Ref
M. R. Haghighat and C. D. Polychronopoulos. Symbolic analysis for parallelizing compilers. ACM TOPLAS, 18(4):477--518, 1996. Google ScholarDigital Library
M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural parallelization analysis in SUIF. ACM TOPLAS., 27(4):662--731, 2005. Google ScholarDigital Library
J. Hoeflinger. Interprocedural Parallelization Using Memory Classification Analysis. PhD thesis, Univ. of Illinois, Urbana-Champaign, Aug. 1998. Google ScholarDigital Library
D. Kai Chen, J. Torrellas, and P.-C. Yew. An Efficient Algorithm for the Run-time Parallelization of DOACROSS Loops. In Proc. of Supercomputing '94, Washington DC, Nov. 1994. Google ScholarDigital Library
X. Kong, D. Klappholz, and K. Psarris. The I test: An improved dependence test for automatic parallelization and vectorization. IEEE TPDS, 2(3):342--349, July 1991. Google ScholarDigital Library
S. Leung and J. Zahorjan. Improving the performance of runtime parallelization. In Proc. of Principles and Practice of Par. Prog., pp. 83--91, ACM Press, May 1993. Google ScholarDigital Library
Y. Lin and D. Padua. Analysis of irregular single-indexed array accesses and its application in compiler optimizations. In Proc. of Int. Conf. on Compiler Constr., pp. 202--218, LNCS vol. 1781, 2000.Google ScholarCross Ref
D. E. Maydan, J. L. Hennessy, and M. S. Lam. Efficient and exact data dependence analysis. In Proc. of Prog. Lang. Design and Impl., pp. 1--14, Toronto, ON, June 1991. Google ScholarDigital Library
S. Moon and M. W. Hall. Evaluation of predicated array data-flow analysis for automatic parallelization. In Proc. of Principles and Practice of Par. Prog., pp. 84--99, ACM Press, 1999. Google ScholarDigital Library
S. Moon, M. W. Hall, and B. R. Murphy. Predicated array data-flow analysis for run-time parallelization. in Proc. of Int. Conf. on Supercomputing, Melbourne, Australia, pp. 212--219, July 1998. Google ScholarDigital Library
S. Moon, B. So, M. W. Hall, and B. R. Murphy. A case for combining compile-time and run-time parallelization. In Proc. of Workshop on Lang., Comp, and Run-time Support for Scalable Systems, pp. 91--106, London, 1998.Google ScholarCross Ref
Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM TOPLAS, 24(1):65--109, 2002. Google ScholarDigital Library
W. Pugh. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proc. of Supercomputing '91, Albuquerque, NM, pp. 4--13, Nov. 1991. Google ScholarDigital Library
W. Pugh and D. Wonnacott. An exact method for analysis of value-based array data dependences. In Proc. of Workshop on Lang. and Comp. for Par. Computing, in LNCS 768, pp. 546--566, Portland, OR, Aug. 1993. Google ScholarDigital Library
W. Pugh and D. Wonnacott. Nonlinear array dependence analysis. In Proc. of Workshop on Lang., Comp. and Run-Time Support for Scalable Systems, Kluwer, Boston 1995.Google Scholar
C. G. Quiñones, et. al. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proc. of Prog. Lang. Design and Impl., pp. 269--279, New York, NY, 2005. Google ScholarDigital Library
L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proc. of Prog. Lang. Design and Impl., pp. 218--232, La Jolla, CA, June 1995. Google ScholarDigital Library
S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid analysis: static & dynamic memory reference analysis. Int. Journal of Par. Prog., 31(3):251--283, 2003. Google ScholarDigital Library
S. Rus, D. Zhang, and L. Rauchwerger. The value evolution graph and its use in memory reference analysis. In Proc. of Par. Arch. and Comp. Techniques, pp. 243--254, Antibes, France, Oct. 2004. Google ScholarDigital Library
J. H. Saltz, R. Mirchandaney, and K. Crowley. Run-time parallelization and scheduling of loops. IEEE Trans. on Computers, 40(5):603--612, May 1991. Google ScholarDigital Library
R. van Engelen, J. Birch, Y. Shou, B. Walsh, and K. Gallivan. A unified framework for nonlinear dependence testing and symbolic analysis. In Proc. of Int. Conf. on Supercomp., pp. 106--115, ACM Press, 2004. Google ScholarDigital Library
M. Wolfe and C.-W. Tseng. The Power test for data dependence. IEEE TPDS, 3(5):591--601, Sept. 1992. Google ScholarDigital Library
M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing, Inc., Boston, MA, 1995. Google ScholarDigital Library
P. Wu, A. Cohen, and D. Padua. Induction variable analysis without idiom recognition: Beyond monotonicity. In Proc. of Workshop on Lang. and Comp. for Par. Computing, pp. 427--441, Cumberland Falls, KY, 2001, LNCS 2624. Google ScholarDigital Library
H. Yu and L. Rauchwerger. Run-time parallelization overhead reduction techniques. In Proc. of Int. Conf. on Compiler Construction, Berlin, Germany, pp. 232--248, Springer LNCS 1781, March 2000.Google ScholarCross Ref

Index Terms

Sensitivity analysis for automatic parallelization on multi-cores
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Implementation of Sensitivity Analysis for Automatic Parallelization
Languages and Compilers for Parallel Computing

Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when program behavior is input sensitive. SA can extract all the input dependent, statically ...
Read More
Automatic parallelization of nonuniform loops
Read More
Transformations techniques for extracting parallelism in non-uniform nested loops

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '07: Proceedings of the 21st annual international conference on Supercomputing
June 2007
315 pages
ISBN:9781595937681
DOI:10.1145/1274971
General Chair:
Burton Smith
Microsoft
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
multicore
parallelism
sensitivity analysis
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Upcoming Conference
ICS '24

Sponsor:

sigarch

2024 International Conference on Supercomputing

June 4 - 7, 2024

Kyoto , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 874
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sensitivity analysis for automatic parallelization on multi-cores

ICS '07: Proceedings of the 21st annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementation of Sensitivity Analysis for Automatic Parallelization

Automatic parallelization of nonuniform loops

Transformations techniques for extracting parallelism in non-uniform nested loops

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sensitivity analysis for automatic parallelization on multi-cores

ICS '07: Proceedings of the 21st annual international conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Implementation of Sensitivity Analysis for Automatic Parallelization

Automatic parallelization of nonuniform loops

Transformations techniques for extracting parallelism in non-uniform nested loops

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media