Abstract
This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related dependences, but do so at the expense of increased storage. This paper introduces the universal occupancy vector (UOV) for loops with a regular stencil of dependences. The UOV provides a schedule-independent storage reuse pattern that introduces no further dependences (other than those implied by true flow dependences). OV-mapped code requires less storage than full array expansion and only slightly more storage than schedule-dependent minimal storage.We show that determine if a vector is a UOV is NPcomplete. However, an easily constructed but possibly nonminimal UOV can be used. We also present a branch and bound algorithm which finds the minimal UOV, while still maintaining a legal UOV at all times.Our experimental results show that the use of OV-mapped storage, coupled with tiling for locality, achieves better performance than tiling after array expansion, and accommodates larger problem sizes than untilable, storage-optimized code. F'urthermore, storage mapping based on the UOV introduces negligible runtime overhead.
- 1 B. Alpern, L. Carter, and K.S. Gatlin. Microparallelism and high-performance protein matching. In Supercomputing, 1995. Google ScholarDigital Library
- 2 Jennifer M. Anderson, Saman P. Amarasinghe, and Monica Lam. Data and computation transformations for multiprocessors. In SIGPLAN PLDI, La Jolla, CA, June 1995. Google ScholarDigital Library
- 3 David F. Bacon, Susan L. Graham, and Oliver J. Sharp. Compiler transformations for high-performance computing. Computing Surveys, 26(4):345-420, 1994. Google ScholarDigital Library
- 4 Denis Barthou, Albert Cohen, and Jean-Fran~;ois Collard. Maximal static expansion. In Principles of Programming Languages, January 1998. Google ScholarDigital Library
- 5 Pierre-Yves Calland, Alain Darte, Yves Robert, and Fr~dSric Vivien. Plugging anti and output dependence removal techniques into loop parallelization algorithms. Parallel Computing, 23(1-2):251-266, 1997. Google ScholarDigital Library
- 6 Steve Cart, Kathryn S. McKinley, and Chau-Wen Tseng. Compiler optimizations for improving data locality. In ASPLOS VI, pages 252-262, San Jose, CA, November 1994. Google ScholarDigital Library
- 7 Larry Carter, Jeanne Ferrante, and S. Flynn Hummel. Efficient parallelism via hierarchical tiling, in Proceedings of SIAM Conference on Parallel Processing .for Scientific Computing, February 1995.Google Scholar
- 8 Larry Carter, Jeanne Ferrante, Susan Flynn Hummel, Bowen Alpern, and Kang Su Gatlin. Hierarchical tiling: A methodology for high performance. Technical Report CS96-508, UCSD, Department of Computer Science and Engineering, November 1996.Google Scholar
- 9 M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In SIGPLAN PLDI, La Jolla, CA, June 1995. Google ScholarDigital Library
- 10 Michal Cierniak. Optimizing Programs by Data and Control Transformations. Ph.d. thesis, Rochester, Computer Science Department TR670, November 1997. Google ScholarDigital Library
- 11 B~atrice Creusillet and Francois Irigoin. Interprocedural array region analyses, international Journal o/ Parallel Programming, 24(6):513-546, December 1996. Google ScholarDigital Library
- 12 P. Feautrier. Array expansion, in International Conference on Supercomputing, pages 429-442, 1988. Google ScholarDigital Library
- 13 Paul Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programruing, 20(1), February 1991.Google Scholar
- 14 Michael R. Garey and David S. Johnson. Computers and Intractability: A guide to the theory o/NP- Completeness. Bell Telephone Laboratories, Incorporated, 1979. Google ScholarDigital Library
- 15 F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th Annual A CM SIGPLAN Symposium on Priniciples of Programming Languages, pages 319-329, 1988. Google ScholarDigital Library
- 16 Kathleen Knobe and William J. Dally. The subspace model: A theory of shapes for parallel systems. In 5th Workshop on Compilers for Parallel Computers, Malaga, Spain, 1995.Google Scholar
- 17 Kathleen Knobe and Vivek Sarkar. Array data-flow analysis and its use in array privatization. In Principles of Programming Languages, January 1998.Google Scholar
- 18 V. Lefebvre and Paul Feautrier. Automatic storage management for parallel programs. Journal on Parallel Computing, 1997. To be published. Google ScholarDigital Library
- 19 Wei Li and Keshav Pingali. A singular loop transformation framework based on non-singular matrices. International Journal on Parallel Processing, 22(2), April 1994. Google ScholarDigital Library
- 20 Dror E. Maydan, Saman P. Amarasinghe, and Monica S. Lam. Array data-flow analysis and its use in array privatization. In Principles of Programming Languages, January 1993. Google ScholarDigital Library
- 21 William Pugh and David Wonnacott. An exact method for analysis of value-based array data dependences. In Proceedings of the Sixth Annual Workshop on Programming Languagesand Compilers for Parallel Computing, 1993. Google ScholarDigital Library
- 22 J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for nonshared memory machines. In Proceedings Supercornputing 91, pages 111- 120, 1991. Google ScholarDigital Library
- 23 Daniel A. Reed and Loyce M. Adams an Merrell L. Patrick. Stencils and problem partitionings: Their influence on the performance of multiple processor systems. IEEE Transactions on Computers, C-36(7):845- 858, July 1987. Google ScholarDigital Library
- 24 Peng Tu and David Padua. Automatic array privatization. In Languages and Compilers for Parallel Computing 6th International Workshop, pages 500-521, August 1993. Google ScholarDigital Library
- 25 Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Programming Language Design and Implementation, 1991. Google ScholarDigital Library
- 26 Michael J. Wolfe. Iteration space tiling for memory hierarchies. In Parallel Processing for Scientific Computing, pages 357-361, 1987. Google ScholarDigital Library
- 27 Michael J. Wolfe. High PerJbrmance Compilers for Parallel Computing. Addison-Wesley, 1996. Google ScholarDigital Library
Index Terms
- Schedule-independent storage mapping for loops
Recommendations
Schedule-independent storage mapping for loops
This paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related ...
Schedule-independent storage mapping for loops
ASPLOS VIII: Proceedings of the eighth international conference on Architectural support for programming languages and operating systemsThis paper studies the relationship between storage requirements and performance. Storage-related dependences inhibit optimizations for locality and parallelism. Techniques such as renaming and array expansion can eliminate all storage-related ...
Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures
Nested loops represent a significant portion of application runtime in multimedia and DSP applications, an important domain of applications for coarse-grained reconfigurable architectures (CGRAs). While conventional approaches to mapping nested loops ...
Comments