article

Free Access

Single-dimension software pipelining for multidimensional loops

Authors:
Hongbo Rong

Microsoft Corporation, Redmond, Washington

Microsoft Corporation, Redmond, Washington
View Profile

,
Zhizhong Tang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
R. Govindarajan

Indian Institute of Science, Bangalore, India

Indian Institute of Science, Bangalore, India
View Profile

,
Alban Douillet

Hewlett-Packard Company, Palo Alto, California

Hewlett-Packard Company, Palo Alto, California
View Profile

,
Guang R. Gao

University of Delaware, Newark, Delaware

University of Delaware, Newark, Delaware
View Profile

Authors Info & Claims

ACM Transactions on Architecture and Code Optimization Volume 4 Issue 1pp 7–eshttps://doi.org/10.1145/1216544.1216550

Published:01 March 2007Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline a loop nest at an arbitrary loop level that has a rectangular iteration space and contains no sibling inner loops in it. The first step identifies the most profitable loop level for software pipelining in terms of initiation rate, data reuse potential, or any other optimization criteria. The second step simplifies the multidimensional data-dependence graph (DDG) of the selected loop level into a one-dimensional DDG and constructs a one-dimensional (1D) schedule. Based on the one-dimensional schedule, the third step derives a simple mapping function that specifies the schedule time for the operation instances in the multidimensional loop. The classical modulo scheduling is subsumed by SSP as a special case. SSP is also closely related to hyperplane scheduling, and, in fact, extends it to be resource constrained. We prove that SSP schedules are correct and at least as efficient as those schedules generated by traditional modulo scheduling methods. We extend SSP to schedule imperfect loop nests, which are most common at the instruction level. Multiple initiation intervals are naturally allowed to improve execution efficiency. Feasibility and correctness of our approach are verified by a prototype implementation in the ORC compiler for the IA-64 architecture, tested with loop nests from Livermore and SPEC2000 floating-point benchmarks. Preliminary experimental results reveal that, compared to modulo scheduling, software pipelining at an appropriate loop level results in significant performance improvement. Software pipelining is beneficial even with prior loop transformations.

References

Aiken, A. and Nicolau, A. 1990. Fine-grain parallelization and the wavefront method. In Selected Papers of the 2nd Workshop on Languages and Compilers for Parallel Computing. Pitman Publishing, London. 1--16. Google ScholarDigital Library
Allan, V. H., Jones, R. B., Lee, R. M., and Allan, S. J. 1995. Software pipelining. ACM Computing Surveys 27, 3 (Sept.), 367--432. Google ScholarDigital Library
Allen, J. R., Kennedy, K., Porterfield, C., and Warren, J. 1983. Conversion of control dependence to data dependence. In POPL '83: Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. ACM Press, New York. 177--189. Google ScholarDigital Library
Banerjee, U. K. 1993. Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publ., Norwell, MA. Google ScholarDigital Library
Carr, S. and Kennedy, K. 1994. Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. on Prog. Lang. and Systems 16, 6 (Nov.), 1768--1810. Google ScholarDigital Library
Carr, S., McKinley, K. S., and Tseng, C.-W. 1994. Compiler optimizations for improving data locality. In ASPLOS-VI: Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM Press, New York. 252--262. Google ScholarDigital Library
Carr, S., Ding, C., and Sweany, P. 1996. Improving software pipelining with unroll-and-jam. In HICSS'96: Proceedings of the 29th Hawaii International Conference on System Sciences (HICSS'96) Volume 1: Software Technology and Architecture. IEEE Computer Society, Washington, D.C. 183. Google ScholarDigital Library
Darte, A. and Robert, Y. 1994. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems 5, 8 (Aug.), 814--822. Google ScholarDigital Library
Darte, A., Schreiber, R., Rau, B. R., and Vivien, F. 2002. Constructing and exploiting linear schedules with prescribed parallelism. ACM Trans. Des. Autom. Electron. Syst. 7, 1, 159--172. Google ScholarDigital Library
Feautrier, P. 1996. Automatic parallelization in the polytope model. Lecture Notes in Computer Science 1132, 79--103. Google ScholarDigital Library
Gao, G. R., Ning, Q., and Van Dongen, V. 1993. Software pipelining for nested loops. ACAPS Tech Memo 53, School of Computer Science, McGill Univ., Montréal, Québec.Google Scholar
Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: a compiler framework for analyzing and tuning memory behavior. ACM Transactions on Prog. Lang. and Syst. 21, 4, 703--746. Google ScholarDigital Library
Govindarajan, R., Altman, E. R., and Gao, G. R. 1996. A framework for resource-constrained rate-optimal software pipelining. IEEE Trans. on Parallel and Distributed Systems 7, 11 (Nov.), 1133--1149. Google ScholarDigital Library
Huff, R. A. 1993. Lifetime-sensitive modulo scheduling. In PLDI'93: Proc. of the ACM SIGPLAN 1993 Conf. on Programming Language Design and Implementation. ACM Press, New York. 258--267. Google ScholarDigital Library
Intel. 2001. Intel IA-64 Architecture Software Developer's Manual, Vol. 1: IA-64 Application Architecture. Intel Corporation, Santa Clara, CA.Google Scholar
Kennedy, K. and McKinley, K. S. 1992. Optimizing for parallelism and data locality. In ICS'92: Proceedings of the 6th International Conference on Supercomputing. ACM Press, New York. 323--334. Google ScholarDigital Library
Lam, M. 1988. Software pipelining: an effective scheduling technique for vliw machines. In PLDI '88: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation. ACM Press, New York. 318--328. Google ScholarDigital Library
Lamport, L. 1974. The parallel execution of DO loops. Communications of the ACM 17, 2 (Feb.), 83--93. Google ScholarDigital Library
Moon, S.-M. and Ebcioğlu, K. 1997. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems 19, 6 (Nov.), 853--898. Google ScholarDigital Library
Muthukumar, K. and Doshi, G. 2001. Software pipelining of nested loops. Lecture Notes in Computer Science 2027, 165--181. Google ScholarDigital Library
Passos, N. L. and Sha, E. H.-M. 1996. Achieving full parallelism using multidimensional retiming. IEEE Trans. Parallel Distrib. Syst. 7, 11, 1150--1163. Google ScholarDigital Library
Petkov, D., Harr, R., and Amarasinghe, S. 2002. Efficient pipelining of nested loops: unroll-and-squash. In 16th Intl. Parallel and Distributed Processing Symposium (IPDPS '02). IEEE, Washigton, D.C. Google ScholarDigital Library
Ramanujam, J. 1994. Optimal software pipelining of nested loops. In Proceedings of the 8th International Symposium on Parallel Processing. IEEE Computer Society, Washington, D.C. 335--342. Google ScholarDigital Library
Rau, B. R. 1994. Iterative modulo scheduling: an algorithm for software pipelining loops. In Proc. of the 27th Annual International Symposium on Microarchitecture. ACM Press, New York. 63--74. Google ScholarDigital Library
Rau, B. R. and Fisher, J. A. 1993. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing 7, 9--50. Google ScholarDigital Library
Rong, H., Douillet, A., Govindarajan, R., and Gao, G. R. 2004a. Code generation for single-dimension software pipelining of multi-dimensional loops. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, Washington, D.C. 175--186. Google ScholarDigital Library
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., and Gao, G. R. 2004b. Single-dimension software pipelining for multi-dimensional loops. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, Washington, D.C. 163--174. Google ScholarDigital Library
Rong, H., Douillet, A., and R.Gao, G. 2005. Register allocation for software pipelined multi-dimensional loops. In PLDI'05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation. ACM Press, New York. Google ScholarDigital Library
Rong, H., Tang, Z., Govindarajan, R., Douillet, A., and Gao, G. R. 2007. Single-dimension software pipelining for multi-dimensional loops. CAPSL technical memo, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware. January. In ftp://ftp.capsl.udel.edu/pub/doc/memos/memo049.ps.gz.Google Scholar
Wang, J. and Gao, G. R. 1996. Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In CC '96: Proceedings of the 6th International Conference on Compiler Construction. Springer-Verlag, New York. 1--17. Google ScholarDigital Library
Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. In PLDI'91: Proc. of the ACM SIGPLAN 1991 Conf. on Prog. Lang. Design and Implementation. ACM Press, New York. 30--44. Google ScholarDigital Library
Wolf, M. E., Maydan, D. E., and Chen, D.-K. 1996. Combining loop transformations considering caches and scheduling. In MICRO 29: Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. IEEE Computer Society, Washington, D.C. 274--286. Google ScholarDigital Library

Index Terms

Single-dimension software pipelining for multidimensional loops
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Modulo scheduling of loops in control-intensive non-numeric programs
MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Much of the previous work on modulo scheduling has targeted numeric programs, in which, often, the majority of the loops are well-behaved loop-counter-based loops without early exits. In control-intensive non-numeric programs, the loops frequently have ...
Read More
Single-Dimension Software Pipelining for Multi-Dimensional Loops
CGO '04: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Traditionally, software pipelining is applied either to theinnermost loop of a given loop nest or from the innermostloop to outer loops. In this paper, we propose a three-stepapproach, called Single-dimension Software Pipelining(SSP), to software ...
Read More
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture
LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Architecture and Code Optimization Volume 4, Issue 1
March 2007
206 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/1216544
Issue’s Table of Contents

Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2007
Published in taco Volume 4, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software pipelining
loop transformation
modulo scheduling
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 1,107
  Total Downloads
- Downloads (Last 12 months)56
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Modulo scheduling of loops in control-intensive non-numeric programs

Single-Dimension Software Pipelining for Multi-Dimensional Loops

Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Modulo scheduling of loops in control-intensive non-numeric programs

Single-Dimension Software Pipelining for Multi-Dimensional Loops

Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media