skip to main content
article
Free Access

Single-dimension software pipelining for multidimensional loops

Published:01 March 2007Publication History
Skip Abstract Section

Abstract

Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline a loop nest at an arbitrary loop level that has a rectangular iteration space and contains no sibling inner loops in it. The first step identifies the most profitable loop level for software pipelining in terms of initiation rate, data reuse potential, or any other optimization criteria. The second step simplifies the multidimensional data-dependence graph (DDG) of the selected loop level into a one-dimensional DDG and constructs a one-dimensional (1D) schedule. Based on the one-dimensional schedule, the third step derives a simple mapping function that specifies the schedule time for the operation instances in the multidimensional loop. The classical modulo scheduling is subsumed by SSP as a special case. SSP is also closely related to hyperplane scheduling, and, in fact, extends it to be resource constrained. We prove that SSP schedules are correct and at least as efficient as those schedules generated by traditional modulo scheduling methods. We extend SSP to schedule imperfect loop nests, which are most common at the instruction level. Multiple initiation intervals are naturally allowed to improve execution efficiency. Feasibility and correctness of our approach are verified by a prototype implementation in the ORC compiler for the IA-64 architecture, tested with loop nests from Livermore and SPEC2000 floating-point benchmarks. Preliminary experimental results reveal that, compared to modulo scheduling, software pipelining at an appropriate loop level results in significant performance improvement. Software pipelining is beneficial even with prior loop transformations.

References

  1. Aiken, A. and Nicolau, A. 1990. Fine-grain parallelization and the wavefront method. In Selected Papers of the 2nd Workshop on Languages and Compilers for Parallel Computing. Pitman Publishing, London. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allan, V. H., Jones, R. B., Lee, R. M., and Allan, S. J. 1995. Software pipelining. ACM Computing Surveys 27, 3 (Sept.), 367--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Allen, J. R., Kennedy, K., Porterfield, C., and Warren, J. 1983. Conversion of control dependence to data dependence. In POPL '83: Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. ACM Press, New York. 177--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Banerjee, U. K. 1993. Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publ., Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carr, S. and Kennedy, K. 1994. Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. on Prog. Lang. and Systems 16, 6 (Nov.), 1768--1810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Carr, S., McKinley, K. S., and Tseng, C.-W. 1994. Compiler optimizations for improving data locality. In ASPLOS-VI: Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM Press, New York. 252--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carr, S., Ding, C., and Sweany, P. 1996. Improving software pipelining with unroll-and-jam. In HICSS'96: Proceedings of the 29th Hawaii International Conference on System Sciences (HICSS'96) Volume 1: Software Technology and Architecture. IEEE Computer Society, Washington, D.C. 183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Darte, A. and Robert, Y. 1994. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems 5, 8 (Aug.), 814--822. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Darte, A., Schreiber, R., Rau, B. R., and Vivien, F. 2002. Constructing and exploiting linear schedules with prescribed parallelism. ACM Trans. Des. Autom. Electron. Syst. 7, 1, 159--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Feautrier, P. 1996. Automatic parallelization in the polytope model. Lecture Notes in Computer Science 1132, 79--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gao, G. R., Ning, Q., and Van Dongen, V. 1993. Software pipelining for nested loops. ACAPS Tech Memo 53, School of Computer Science, McGill Univ., Montréal, Québec.Google ScholarGoogle Scholar
  12. Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Prog. Lang. and Syst. 21, 4, 703--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Govindarajan, R., Altman, E. R., and Gao, G. R. 1996. A framework for resource-constrained rate-optimal software pipelining. IEEE Trans. on Parallel and Distributed Systems 7, 11 (Nov.), 1133--1149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Huff, R. A. 1993. Lifetime-sensitive modulo scheduling. In PLDI'93: Proc. of the ACM SIGPLAN 1993 Conf. on Programming Language Design and Implementation. ACM Press, New York. 258--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Intel. 2001. Intel IA-64 Architecture Software Developer's Manual, Vol. 1: IA-64 Application Architecture. Intel Corporation, Santa Clara, CA.Google ScholarGoogle Scholar
  16. Kennedy, K. and McKinley, K. S. 1992. Optimizing for parallelism and data locality. In ICS'92: Proceedings of the 6th International Conference on Supercomputing. ACM Press, New York. 323--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lam, M. 1988. Software pipelining: an effective scheduling technique for vliw machines. In PLDI '88: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation. ACM Press, New York. 318--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lamport, L. 1974. The parallel execution of DO loops. Communications of the ACM 17, 2 (Feb.), 83--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Moon, S.-M. and Ebcioğlu, K. 1997. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems 19, 6 (Nov.), 853--898. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Muthukumar, K. and Doshi, G. 2001. Software pipelining of nested loops. Lecture Notes in Computer Science 2027, 165--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Passos, N. L. and Sha, E. H.-M. 1996. Achieving full parallelism using multidimensional retiming. IEEE Trans. Parallel Distrib. Syst. 7, 11, 1150--1163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Petkov, D., Harr, R., and Amarasinghe, S. 2002. Efficient pipelining of nested loops: unroll-and-squash. In 16th Intl. Parallel and Distributed Processing Symposium (IPDPS '02). IEEE, Washigton, D.C. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ramanujam, J. 1994. Optimal software pipelining of nested loops. In Proceedings of the 8th International Symposium on Parallel Processing. IEEE Computer Society, Washington, D.C. 335--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rau, B. R. 1994. Iterative modulo scheduling: an algorithm for software pipelining loops. In Proc. of the 27th Annual International Symposium on Microarchitecture. ACM Press, New York. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rau, B. R. and Fisher, J. A. 1993. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing 7, 9--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rong, H., Douillet, A., Govindarajan, R., and Gao, G. R. 2004a. Code generation for single-dimension software pipelining of multi-dimensional loops. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, Washington, D.C. 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Rong, H., Tang, Z., Govindarajan, R., Douillet, A., and Gao, G. R. 2004b. Single-dimension software pipelining for multi-dimensional loops. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, Washington, D.C. 163--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rong, H., Douillet, A., and R.Gao, G. 2005. Register allocation for software pipelined multi-dimensional loops. In PLDI'05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rong, H., Tang, Z., Govindarajan, R., Douillet, A., and Gao, G. R. 2007. Single-dimension software pipelining for multi-dimensional loops. CAPSL technical memo, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware. January. In ftp://ftp.capsl.udel.edu/pub/doc/memos/memo049.ps.gz.Google ScholarGoogle Scholar
  30. Wang, J. and Gao, G. R. 1996. Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In CC '96: Proceedings of the 6th International Conference on Compiler Construction. Springer-Verlag, New York. 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. In PLDI'91: Proc. of the ACM SIGPLAN 1991 Conf. on Prog. Lang. Design and Implementation. ACM Press, New York. 30--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wolf, M. E., Maydan, D. E., and Chen, D.-K. 1996. Combining loop transformations considering caches and scheduling. In MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, Washington, D.C. 274--286. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Single-dimension software pipelining for multidimensional loops

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 4, Issue 1
      March 2007
      206 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/1216544
      Issue’s Table of Contents

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 March 2007
      Published in taco Volume 4, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader