Abstract
As computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-time prediction. By profiling a training input, it identifies locality phases by sifting through all accesses to all data elements using variable-distance sampling, wavelet filtering, and optimal phase partitioning. It then constructs a phase hierarchy through grammar compression. Finally, it inserts phase markers into the program using binary rewriting. When the instrumented program runs, it uses the first few executions of a phase to predict all its later executions.Compared with existing methods based on program code and execution intervals, locality phase prediction is unique because it uses locality profiles, and it marks phase boundaries in program code. The second half of the paper presents a comprehensive evaluation. It measures the accuracy and the coverage of the new technique and compares it with best known run-time methods. It measures its benefit in adaptive cache resizing and memory remapping. Finally, it compares the automatic analysis with manual phase marking. The results show that locality phase prediction is well suited for identifying large, recurring phases in complex programs.
- F. Allen and J. Cocke. A proram data flow analysis procedure. Communications of the ACM, 19:137--147, 1976.]] Google ScholarDigital Library
- R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd International Symposium on Microarchitecture, Monterey, California, December 2000.]] Google ScholarDigital Library
- R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Dynamically managing the communication-parallelism trade-off in future clustered processors. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarDigital Library
- A. P. Batson and A. W. Madison. Measurements of major locality phases in symbolic reference strings. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Cambridge, MA, March 1976.]] Google ScholarDigital Library
- T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001.]] Google ScholarDigital Library
- R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.]] Google ScholarDigital Library
- I. Daubechies. Ten Lectures on Wavelets. Capital City Press, Montpelier,Vermont, 1992.]] Google ScholarDigital Library
- P.J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, SE-6(1), January 1980.]]Google ScholarDigital Library
- A. S. Dhodapkar and J. E. Smith. Managing multi-configuration hardware via dynamic working-set analysis. In Proceedings of International Symposium on Computer Architecture, Anchorage, Alaska, June 2002.]] Google ScholarDigital Library
- A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Proceedings of International Symposium on Microarchitecture, December 2003.]] Google ScholarDigital Library
- C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.]] Google ScholarDigital Library
- C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]] Google ScholarDigital Library
- E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.]] Google ScholarDigital Library
- C. Fang, S. Carr, S. Onder, and Z. Wang. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Washington DC, June 2004.]] Google ScholarDigital Library
- H. Han and C. W. Tseng. Improving locality for adaptive irregular scientific codes. In Proceedings of Workshop on Languages and Compilers for High-Performance Computing (LCPC'00), White Plains, NY, August 2000.]] Google ScholarDigital Library
- J. E. Hopcroft and J. D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979.]] Google ScholarDigital Library
- C.-H. Hsu and U. Kermer. The design, implementation and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]] Google ScholarDigital Library
- R. Joseph, Z. Hu, and M. Martonosi. Wavelet analysis for microprocessor design: Experiences with wavelet-based di/dt characterization. In Proceedings of International Symposium on High Performance Computer Architecture, February 2004.]] Google ScholarDigital Library
- J. R. Larus. Whole program paths. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.]] Google ScholarDigital Library
- C. Luk and T. C. Mowry. Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation. In Proceedings of International Symposium on Computer Architecture, Atlanta, GA, May 1999.]] Google ScholarDigital Library
- M. Huang and J. Renau and J. Torrellas. Positional adaptation of processors: application to energy reduction. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarDigital Library
- G. Magklis, M. L. Scott, G. Semeraro, D. H. Albonesi,, and S. Dropsho. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarDigital Library
- G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004.]] Google ScholarDigital Library
- R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.]]Google ScholarDigital Library
- J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001.]] Google ScholarDigital Library
- C. G. Nevill-Manning and I. H. Witten. Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research, 7:67--82, 1997.]] Google ScholarDigital Library
- V. S. Pingali, S. A. McKee, W. C. Hsieh, and J. B. Carter. Restructuring computations for temporal data cache locality. International Journal of Parallel Programming, 31(4), August 2003.]] Google ScholarDigital Library
- X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.]]Google Scholar
- T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, September 2001.]] Google ScholarDigital Library
- T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarDigital Library
- A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Orlando, Florida, June 1994.]] Google ScholarDigital Library
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]] Google ScholarDigital Library
- R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1993.]] Google ScholarDigital Library
- L. Zhang. Efficient Remapping Mechanism for an Adaptive Memory System. PhD thesis, Department of Computer Science, University of Utah, August 2000.]] Google ScholarDigital Library
- L. Zhang, Z. Fang, M. Parker, B. K. Mathew, L. Schaelicke, J. B. Carter, W. C. Hsieh, and S. A. McKee. The Impulse memory controller. IEEE Transactions on Computers, 50(11), November 2001.]] Google ScholarDigital Library
- Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004.]] Google ScholarDigital Library
Index Terms
- Locality phase prediction
Recommendations
Locality phase prediction
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systemsAs computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-...
Locality phase prediction
ASPLOS '04As computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-...
Locality phase prediction
ASPLOS '04As computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-...
Comments