This dissertation presents new techniques designed to speed up the execution of computer programs by improving their memory locality. Locality is an important property for today's machines, because it hides the relatively high latency of computer memories.
Our techniques change the layout of multidimensional arrays by applying data transformations. We unify data transformations with code transformations which change the order of execution of loop nests. We solve related problems which would have been obstacles to the practical use of our techniques: we show how to detect and reduce array overlapping and how to recover structure from linearized arrays. Our optimizations reduce the execution times of sequential, scientific benchmarks by up to 50% over what is possible with previous techniques. Parallel programs are improved by as much as a factor of four.
In addition to implementing our techniques in a standard, off-line, compiler, we adapt our optimizations to Just-In-Time (JIT) compilation. The JIT translation becomes very important with the increasing popularity of mobile technologies such as Java. We argue that new, faster algorithms are needed in that context. We propose a collection of fast, approximate compiler techniques for data transformations and show that they are effective for Java programs.
Cited By
- Strout M, Carter L, Ferrante J and Simon B (2019). Schedule-independent storage mapping for loops, ACM SIGPLAN Notices, 33:11, (24-33), Online publication date: 1-Nov-1998.
- Strout M, Carter L, Ferrante J and Simon B Schedule-independent storage mapping for loops Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, (24-33)
- Strout M, Carter L, Ferrante J and Simon B (1998). Schedule-independent storage mapping for loops, ACM SIGOPS Operating Systems Review, 32:5, (24-33), Online publication date: 1-Dec-1998.
Index Terms
- Optimizing programs by data and control transformations
Recommendations
An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing
Parallel processing systems with cache or local memory in the memory hierarchies are considered. These systems have a local cache memory in each processor and usually employ a write-invalidate protocol for the cache coherence. In such systems, a problem ...
An algorithm to automate non-unimodular transformations of loop nests
SPDP '93: Proceedings of the 1993 5th IEEE Symposium on Parallel and Distributed ProcessingThis paper provides a solution to the open problem of automatic rewriting loop nests for non-unimodular transformations.We present an algorithm that rewrites a loop nest under any non-singular (unimodular or non-unimodular) transformation. The algorithm ...
Precise Data Locality Optimization of Nested Loops
A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of ...