ABSTRACT
Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when relevant program behavior is input sensitive. In this paper we show how SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. For example, SA can first attempt to validate parallelization by checking simple conditions related to loop bounds. If such simple conditions cannot be met, then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of a loop, thus decreasing the benefits of parallel execution.
We have implemented Sensitivity Analysis in the Polaris compiler and evaluated its performance using 22 industry standard benchmark codes running on two multicore systems. In most cases we have obtained speedups superior to the Intel Ifort compiler because with SA we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism.
- G. Agrawal, J. H. Saltz, and R. Das. Interprocedural partial redundancy elimination and its application to distributed memory compilation. In Proc. of Prog. Lang. Design and Impl., pp. 258--269, La Jolla, CA, June 1995. Google ScholarDigital Library
- U. Banerjee. Dependence Analysis for Supercomputing. Norwell, Mass.: Kluwer Academic Publishers, 1988. Google ScholarDigital Library
- W. Blume, et. al. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78--82, 1996. Google ScholarDigital Library
- W. Blume and R. Eigenmann. The Range Test: A Dependence Test for Symbolic, Non-linear Expressions. In Proc. of Supercomputing '94, Washington DC, pp. 528--537, Nov. 1994. Google ScholarDigital Library
- B. Creusillet and F. Irigoin. Exact vs. approximate array region analyses. In Proc. of Workshop on Lang. and Comp. for Par. Computing, LNCS, vol. 1239, pp. 86--100, Springer 1996. Google ScholarDigital Library
- P. Feautrier. Parametric integer programming. Operations Research, 22(3):243--268, 1988.Google ScholarCross Ref
- P. Feautrier. Dataflow analysis of array and scalar references. Int. Journal of Par. Prog., 20(1):23--54, 1991.Google ScholarCross Ref
- G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. In Proc. of Prog. Lang. Design and Impl., pp. 15--29, Toronto, ON, June 1991. Google ScholarDigital Library
- M. Gupta, S. Mukhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. Int. Journal of Par. Prog., 28(6):537--562, 2000. Google ScholarCross Ref
- M. R. Haghighat and C. D. Polychronopoulos. Symbolic analysis for parallelizing compilers. ACM TOPLAS, 18(4):477--518, 1996. Google ScholarDigital Library
- M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural parallelization analysis in SUIF. ACM TOPLAS., 27(4):662--731, 2005. Google ScholarDigital Library
- J. Hoeflinger. Interprocedural Parallelization Using Memory Classification Analysis. PhD thesis, Univ. of Illinois, Urbana-Champaign, Aug. 1998. Google ScholarDigital Library
- D. Kai Chen, J. Torrellas, and P.-C. Yew. An Efficient Algorithm for the Run-time Parallelization of DOACROSS Loops. In Proc. of Supercomputing '94, Washington DC, Nov. 1994. Google ScholarDigital Library
- X. Kong, D. Klappholz, and K. Psarris. The I test: An improved dependence test for automatic parallelization and vectorization. IEEE TPDS, 2(3):342--349, July 1991. Google ScholarDigital Library
- S. Leung and J. Zahorjan. Improving the performance of runtime parallelization. In Proc. of Principles and Practice of Par. Prog., pp. 83--91, ACM Press, May 1993. Google ScholarDigital Library
- Y. Lin and D. Padua. Analysis of irregular single-indexed array accesses and its application in compiler optimizations. In Proc. of Int. Conf. on Compiler Constr., pp. 202--218, LNCS vol. 1781, 2000.Google ScholarCross Ref
- D. E. Maydan, J. L. Hennessy, and M. S. Lam. Efficient and exact data dependence analysis. In Proc. of Prog. Lang. Design and Impl., pp. 1--14, Toronto, ON, June 1991. Google ScholarDigital Library
- S. Moon and M. W. Hall. Evaluation of predicated array data-flow analysis for automatic parallelization. In Proc. of Principles and Practice of Par. Prog., pp. 84--99, ACM Press, 1999. Google ScholarDigital Library
- S. Moon, M. W. Hall, and B. R. Murphy. Predicated array data-flow analysis for run-time parallelization. in Proc. of Int. Conf. on Supercomputing, Melbourne, Australia, pp. 212--219, July 1998. Google ScholarDigital Library
- S. Moon, B. So, M. W. Hall, and B. R. Murphy. A case for combining compile-time and run-time parallelization. In Proc. of Workshop on Lang., Comp, and Run-time Support for Scalable Systems, pp. 91--106, London, 1998.Google ScholarCross Ref
- Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM TOPLAS, 24(1):65--109, 2002. Google ScholarDigital Library
- W. Pugh. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proc. of Supercomputing '91, Albuquerque, NM, pp. 4--13, Nov. 1991. Google ScholarDigital Library
- W. Pugh and D. Wonnacott. An exact method for analysis of value-based array data dependences. In Proc. of Workshop on Lang. and Comp. for Par. Computing, in LNCS 768, pp. 546--566, Portland, OR, Aug. 1993. Google ScholarDigital Library
- W. Pugh and D. Wonnacott. Nonlinear array dependence analysis. In Proc. of Workshop on Lang., Comp. and Run-Time Support for Scalable Systems, Kluwer, Boston 1995.Google Scholar
- C. G. Quiñones, et. al. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proc. of Prog. Lang. Design and Impl., pp. 269--279, New York, NY, 2005. Google ScholarDigital Library
- L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proc. of Prog. Lang. Design and Impl., pp. 218--232, La Jolla, CA, June 1995. Google ScholarDigital Library
- S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid analysis: static & dynamic memory reference analysis. Int. Journal of Par. Prog., 31(3):251--283, 2003. Google ScholarDigital Library
- S. Rus, D. Zhang, and L. Rauchwerger. The value evolution graph and its use in memory reference analysis. In Proc. of Par. Arch. and Comp. Techniques, pp. 243--254, Antibes, France, Oct. 2004. Google ScholarDigital Library
- J. H. Saltz, R. Mirchandaney, and K. Crowley. Run-time parallelization and scheduling of loops. IEEE Trans. on Computers, 40(5):603--612, May 1991. Google ScholarDigital Library
- R. van Engelen, J. Birch, Y. Shou, B. Walsh, and K. Gallivan. A unified framework for nonlinear dependence testing and symbolic analysis. In Proc. of Int. Conf. on Supercomp., pp. 106--115, ACM Press, 2004. Google ScholarDigital Library
- M. Wolfe and C.-W. Tseng. The Power test for data dependence. IEEE TPDS, 3(5):591--601, Sept. 1992. Google ScholarDigital Library
- M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing, Inc., Boston, MA, 1995. Google ScholarDigital Library
- P. Wu, A. Cohen, and D. Padua. Induction variable analysis without idiom recognition: Beyond monotonicity. In Proc. of Workshop on Lang. and Comp. for Par. Computing, pp. 427--441, Cumberland Falls, KY, 2001, LNCS 2624. Google ScholarDigital Library
- H. Yu and L. Rauchwerger. Run-time parallelization overhead reduction techniques. In Proc. of Int. Conf. on Compiler Construction, Berlin, Germany, pp. 232--248, Springer LNCS 1781, March 2000.Google ScholarCross Ref
Index Terms
- Sensitivity analysis for automatic parallelization on multi-cores
Recommendations
Implementation of Sensitivity Analysis for Automatic Parallelization
Languages and Compilers for Parallel ComputingSensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when program behavior is input sensitive. SA can extract all the input dependent, statically ...
Transformations techniques for extracting parallelism in non-uniform nested loops
Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...
Comments