Schedule-independent storage mapping for loops

Authors:
Michelle Mills Strout

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA
View Profile

,
Larry Carter

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA
View Profile

,
Jeanne Ferrante

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA
View Profile

,
Beth Simon

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA

CSE Department UC, San Diego, 9500 Gilman Drive, La Jolla, CA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 33 Issue 11Nov. 1998pp 24–33https://doi.org/10.1145/291006.291015

Published:01 October 1998Publication History

ACM SIGPLAN Notices

Abstract

This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related dependences, but do so at the expense of increased storage. This paper introduces the universal occupancy vector (UOV) for loops with a regular stencil of dependences. The UOV provides a schedule-independent storage reuse pattern that introduces no further dependences (other than those implied by true flow dependences). OV-mapped code requires less storage than full array expansion and only slightly more storage than schedule-dependent minimal storage.We show that determine if a vector is a UOV is NPcomplete. However, an easily constructed but possibly nonminimal UOV can be used. We also present a branch and bound algorithm which finds the minimal UOV, while still maintaining a legal UOV at all times.Our experimental results show that the use of OV-mapped storage, coupled with tiling for locality, achieves better performance than tiling after array expansion, and accommodates larger problem sizes than untilable, storage-optimized code. F'urthermore, storage mapping based on the UOV introduces negligible runtime overhead.

References

1 B. Alpern, L. Carter, and K.S. Gatlin. Microparallelism and high-performance protein matching. In Supercomputing, 1995. Google ScholarDigital Library
2 Jennifer M. Anderson, Saman P. Amarasinghe, and Monica Lam. Data and computation transformations for multiprocessors. In SIGPLAN PLDI, La Jolla, CA, June 1995. Google ScholarDigital Library
3 David F. Bacon, Susan L. Graham, and Oliver J. Sharp. Compiler transformations for high-performance computing. Computing Surveys, 26(4):345-420, 1994. Google ScholarDigital Library
4 Denis Barthou, Albert Cohen, and Jean-Fran~;ois Collard. Maximal static expansion. In Principles of Programming Languages, January 1998. Google ScholarDigital Library
5 Pierre-Yves Calland, Alain Darte, Yves Robert, and Fr~dSric Vivien. Plugging anti and output dependence removal techniques into loop parallelization algorithms. Parallel Computing, 23(1-2):251-266, 1997. Google ScholarDigital Library
6 Steve Cart, Kathryn S. McKinley, and Chau-Wen Tseng. Compiler optimizations for improving data locality. In ASPLOS VI, pages 252-262, San Jose, CA, November 1994. Google ScholarDigital Library
7 Larry Carter, Jeanne Ferrante, and S. Flynn Hummel. Efficient parallelism via hierarchical tiling, in Proceedings of SIAM Conference on Parallel Processing .for Scientific Computing, February 1995.Google Scholar
8 Larry Carter, Jeanne Ferrante, Susan Flynn Hummel, Bowen Alpern, and Kang Su Gatlin. Hierarchical tiling: A methodology for high performance. Technical Report CS96-508, UCSD, Department of Computer Science and Engineering, November 1996.Google Scholar
9 M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In SIGPLAN PLDI, La Jolla, CA, June 1995. Google ScholarDigital Library
10 Michal Cierniak. Optimizing Programs by Data and Control Transformations. Ph.d. thesis, Rochester, Computer Science Department TR670, November 1997. Google ScholarDigital Library
11 B~atrice Creusillet and Francois Irigoin. Interprocedural array region analyses, international Journal o/ Parallel Programming, 24(6):513-546, December 1996. Google ScholarDigital Library
12 P. Feautrier. Array expansion, in International Conference on Supercomputing, pages 429-442, 1988. Google ScholarDigital Library
13 Paul Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programruing, 20(1), February 1991.Google Scholar
14 Michael R. Garey and David S. Johnson. Computers and Intractability: A guide to the theory o/NP- Completeness. Bell Telephone Laboratories, Incorporated, 1979. Google ScholarDigital Library
15 F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th Annual A CM SIGPLAN Symposium on Priniciples of Programming Languages, pages 319-329, 1988. Google ScholarDigital Library
16 Kathleen Knobe and William J. Dally. The subspace model: A theory of shapes for parallel systems. In 5th Workshop on Compilers for Parallel Computers, Malaga, Spain, 1995.Google Scholar
17 Kathleen Knobe and Vivek Sarkar. Array data-flow analysis and its use in array privatization. In Principles of Programming Languages, January 1998.Google Scholar
18 V. Lefebvre and Paul Feautrier. Automatic storage management for parallel programs. Journal on Parallel Computing, 1997. To be published. Google ScholarDigital Library
19 Wei Li and Keshav Pingali. A singular loop transformation framework based on non-singular matrices. International Journal on Parallel Processing, 22(2), April 1994. Google ScholarDigital Library
20 Dror E. Maydan, Saman P. Amarasinghe, and Monica S. Lam. Array data-flow analysis and its use in array privatization. In Principles of Programming Languages, January 1993. Google ScholarDigital Library
21 William Pugh and David Wonnacott. An exact method for analysis of value-based array data dependences. In Proceedings of the Sixth Annual Workshop on Programming Languagesand Compilers for Parallel Computing, 1993. Google ScholarDigital Library
22 J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for nonshared memory machines. In Proceedings Supercornputing 91, pages 111- 120, 1991. Google ScholarDigital Library
23 Daniel A. Reed and Loyce M. Adams an Merrell L. Patrick. Stencils and problem partitionings: Their influence on the performance of multiple processor systems. IEEE Transactions on Computers, C-36(7):845- 858, July 1987. Google ScholarDigital Library
24 Peng Tu and David Padua. Automatic array privatization. In Languages and Compilers for Parallel Computing 6th International Workshop, pages 500-521, August 1993. Google ScholarDigital Library
25 Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Programming Language Design and Implementation, 1991. Google ScholarDigital Library
26 Michael J. Wolfe. Iteration space tiling for memory hierarchies. In Parallel Processing for Scientific Computing, pages 357-361, 1987. Google ScholarDigital Library
27 Michael J. Wolfe. High PerJbrmance Compilers for Parallel Computing. Addison-Wesley, 1996. Google ScholarDigital Library

Index Terms

Recommendations

Schedule-independent storage mapping for loops

This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related ...
Read More
Schedule-independent storage mapping for loops
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems

This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related ...
Read More
Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures

Nested loops represent a significant portion of application runtime in multimedia and DSP applications, an important domain of applications for coarse-grained reconfigurable architectures (CGRAs). While conventional approaches to mapping nested loops ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 33, Issue 11
Nov. 1998
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/291006
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwel
Massachusetts Institute of Technology, Cambridge
Issue’s Table of Contents
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
October 1998
326 pages
ISBN:1581131070
DOI:10.1145/291069
Chairmen:
Dileep Bhandarkar
Intel
,
Anant Agarwal
Massachusetts Institute of Technology, Cambridge
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 1998
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 625
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Schedule-independent storage mapping for loops

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Schedule-independent storage mapping for loops

Schedule-independent storage mapping for loops

Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures