skip to main content
10.1145/1274971.1275008acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Sensitivity analysis for automatic parallelization on multi-cores

Published:17 June 2007Publication History

ABSTRACT

Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when relevant program behavior is input sensitive. In this paper we show how SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. For example, SA can first attempt to validate parallelization by checking simple conditions related to loop bounds. If such simple conditions cannot be met, then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of a loop, thus decreasing the benefits of parallel execution.

We have implemented Sensitivity Analysis in the Polaris compiler and evaluated its performance using 22 industry standard benchmark codes running on two multicore systems. In most cases we have obtained speedups superior to the Intel Ifort compiler because with SA we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism.

References

  1. G. Agrawal, J. H. Saltz, and R. Das. Interprocedural partial redundancy elimination and its application to distributed memory compilation. In Proc. of Prog. Lang. Design and Impl., pp. 258--269, La Jolla, CA, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. U. Banerjee. Dependence Analysis for Supercomputing. Norwell, Mass.: Kluwer Academic Publishers, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. Blume, et. al. Advanced Program Restructuring for High-Performance Computers with Polaris. IEEE Computer, 29(12):78--82, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Blume and R. Eigenmann. The Range Test: A Dependence Test for Symbolic, Non-linear Expressions. In Proc. of Supercomputing '94, Washington DC, pp. 528--537, Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Creusillet and F. Irigoin. Exact vs. approximate array region analyses. In Proc. of Workshop on Lang. and Comp. for Par. Computing, LNCS, vol. 1239, pp. 86--100, Springer 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Feautrier. Parametric integer programming. Operations Research, 22(3):243--268, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Feautrier. Dataflow analysis of array and scalar references. Int. Journal of Par. Prog., 20(1):23--54, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  8. G. Goff, K. Kennedy, and C.-W. Tseng. Practical dependence testing. In Proc. of Prog. Lang. Design and Impl., pp. 15--29, Toronto, ON, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Gupta, S. Mukhopadhyay, and N. Sinha. Automatic parallelization of recursive procedures. Int. Journal of Par. Prog., 28(6):537--562, 2000. Google ScholarGoogle ScholarCross RefCross Ref
  10. M. R. Haghighat and C. D. Polychronopoulos. Symbolic analysis for parallelizing compilers. ACM TOPLAS, 18(4):477--518, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural parallelization analysis in SUIF. ACM TOPLAS., 27(4):662--731, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Hoeflinger. Interprocedural Parallelization Using Memory Classification Analysis. PhD thesis, Univ. of Illinois, Urbana-Champaign, Aug. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Kai Chen, J. Torrellas, and P.-C. Yew. An Efficient Algorithm for the Run-time Parallelization of DOACROSS Loops. In Proc. of Supercomputing '94, Washington DC, Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Kong, D. Klappholz, and K. Psarris. The I test: An improved dependence test for automatic parallelization and vectorization. IEEE TPDS, 2(3):342--349, July 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Leung and J. Zahorjan. Improving the performance of runtime parallelization. In Proc. of Principles and Practice of Par. Prog., pp. 83--91, ACM Press, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Lin and D. Padua. Analysis of irregular single-indexed array accesses and its application in compiler optimizations. In Proc. of Int. Conf. on Compiler Constr., pp. 202--218, LNCS vol. 1781, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. E. Maydan, J. L. Hennessy, and M. S. Lam. Efficient and exact data dependence analysis. In Proc. of Prog. Lang. Design and Impl., pp. 1--14, Toronto, ON, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Moon and M. W. Hall. Evaluation of predicated array data-flow analysis for automatic parallelization. In Proc. of Principles and Practice of Par. Prog., pp. 84--99, ACM Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Moon, M. W. Hall, and B. R. Murphy. Predicated array data-flow analysis for run-time parallelization. in Proc. of Int. Conf. on Supercomputing, Melbourne, Australia, pp. 212--219, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Moon, B. So, M. W. Hall, and B. R. Murphy. A case for combining compile-time and run-time parallelization. In Proc. of Workshop on Lang., Comp, and Run-time Support for Scalable Systems, pp. 91--106, London, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM TOPLAS, 24(1):65--109, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Pugh. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proc. of Supercomputing '91, Albuquerque, NM, pp. 4--13, Nov. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. W. Pugh and D. Wonnacott. An exact method for analysis of value-based array data dependences. In Proc. of Workshop on Lang. and Comp. for Par. Computing, in LNCS 768, pp. 546--566, Portland, OR, Aug. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Pugh and D. Wonnacott. Nonlinear array dependence analysis. In Proc. of Workshop on Lang., Comp. and Run-Time Support for Scalable Systems, Kluwer, Boston 1995.Google ScholarGoogle Scholar
  25. C. G. Quiñones, et. al. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In Proc. of Prog. Lang. Design and Impl., pp. 269--279, New York, NY, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proc. of Prog. Lang. Design and Impl., pp. 218--232, La Jolla, CA, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid analysis: static & dynamic memory reference analysis. Int. Journal of Par. Prog., 31(3):251--283, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Rus, D. Zhang, and L. Rauchwerger. The value evolution graph and its use in memory reference analysis. In Proc. of Par. Arch. and Comp. Techniques, pp. 243--254, Antibes, France, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. H. Saltz, R. Mirchandaney, and K. Crowley. Run-time parallelization and scheduling of loops. IEEE Trans. on Computers, 40(5):603--612, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. van Engelen, J. Birch, Y. Shou, B. Walsh, and K. Gallivan. A unified framework for nonlinear dependence testing and symbolic analysis. In Proc. of Int. Conf. on Supercomp., pp. 106--115, ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Wolfe and C.-W. Tseng. The Power test for data dependence. IEEE TPDS, 3(5):591--601, Sept. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing, Inc., Boston, MA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Wu, A. Cohen, and D. Padua. Induction variable analysis without idiom recognition: Beyond monotonicity. In Proc. of Workshop on Lang. and Comp. for Par. Computing, pp. 427--441, Cumberland Falls, KY, 2001, LNCS 2624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Yu and L. Rauchwerger. Run-time parallelization overhead reduction techniques. In Proc. of Int. Conf. on Compiler Construction, Berlin, Germany, pp. 232--248, Springer LNCS 1781, March 2000.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Sensitivity analysis for automatic parallelization on multi-cores

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '07: Proceedings of the 21st annual international conference on Supercomputing
      June 2007
      315 pages
      ISBN:9781595937681
      DOI:10.1145/1274971

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate584of2,055submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader