skip to main content
article

Locality phase prediction

Published:07 October 2004Publication History
Skip Abstract Section

Abstract

As computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-time prediction. By profiling a training input, it identifies locality phases by sifting through all accesses to all data elements using variable-distance sampling, wavelet filtering, and optimal phase partitioning. It then constructs a phase hierarchy through grammar compression. Finally, it inserts phase markers into the program using binary rewriting. When the instrumented program runs, it uses the first few executions of a phase to predict all its later executions.Compared with existing methods based on program code and execution intervals, locality phase prediction is unique because it uses locality profiles, and it marks phase boundaries in program code. The second half of the paper presents a comprehensive evaluation. It measures the accuracy and the coverage of the new technique and compares it with best known run-time methods. It measures its benefit in adaptive cache resizing and memory remapping. Finally, it compares the automatic analysis with manual phase marking. The results show that locality phase prediction is well suited for identifying large, recurring phases in complex programs.

References

  1. F. Allen and J. Cocke. A proram data flow analysis procedure. Communications of the ACM, 19:137--147, 1976.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd International Symposium on Microarchitecture, Monterey, California, December 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Dynamically managing the communication-parallelism trade-off in future clustered processors. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. P. Batson and A. W. Madison. Measurements of major locality phases in symbolic reference strings. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Cambridge, MA, March 1976.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Daubechies. Ten Lectures on Wavelets. Capital City Press, Montpelier,Vermont, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P.J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, SE-6(1), January 1980.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. S. Dhodapkar and J. E. Smith. Managing multi-configuration hardware via dynamic working-set analysis. In Proceedings of International Symposium on Computer Architecture, Anchorage, Alaska, June 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Proceedings of International Symposium on Microarchitecture, December 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Fang, S. Carr, S. Onder, and Z. Wang. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Washington DC, June 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Han and C. W. Tseng. Improving locality for adaptive irregular scientific codes. In Proceedings of Workshop on Languages and Compilers for High-Performance Computing (LCPC'00), White Plains, NY, August 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. E. Hopcroft and J. D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C.-H. Hsu and U. Kermer. The design, implementation and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Joseph, Z. Hu, and M. Martonosi. Wavelet analysis for microprocessor design: Experiences with wavelet-based di/dt characterization. In Proceedings of International Symposium on High Performance Computer Architecture, February 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. R. Larus. Whole program paths. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Luk and T. C. Mowry. Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation. In Proceedings of International Symposium on Computer Architecture, Atlanta, GA, May 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Huang and J. Renau and J. Torrellas. Positional adaptation of processors: application to energy reduction. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Magklis, M. L. Scott, G. Semeraro, D. H. Albonesi,, and S. Dropsho. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. G. Nevill-Manning and I. H. Witten. Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research, 7:67--82, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. S. Pingali, S. A. McKee, W. C. Hsieh, and J. B. Carter. Restructuring computations for temporal data cache locality. International Journal of Parallel Programming, 31(4), August 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.]]Google ScholarGoogle Scholar
  29. T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, September 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Orlando, Florida, June 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Zhang. Efficient Remapping Mechanism for an Adaptive Memory System. PhD thesis, Department of Computer Science, University of Utah, August 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Zhang, Z. Fang, M. Parker, B. K. Mathew, L. Schaelicke, J. B. Carter, W. C. Hsieh, and S. A. McKee. The Impulse memory controller. IEEE Transactions on Computers, 50(11), November 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Locality phase prediction

    Recommendations

    Reviews

    Peter C. Patton

    The principles of task and data locality are very important in computer architecture, especially in the application of cache memory to bridge the speed mismatch between a central processing unit (CPU) and its primary memory. This paper goes well beyond qualitative consideration of the principles of locality, to describe a detailed, quantitative evaluation of their effects in actual memory hierarchies, for an actual benchmark program. As memory hierarchies become adaptive, performance becomes dependent on forecasting dynamic locality. This paper presents a method that predicts the locality phases of a program, by a combination of locality profiling and runtime prediction. Previous studies on the manual adaptation of programs by dynamic data reorganization, at both the hardware and application program levels, have shown improvement in cache locality and prefetching efficiency. Unfortunately, such manual analysis is difficult, and depends on the user's intimate knowledge of the program and its data sets. Most large-scale scientific and engineering computations require substantial computer time, and also have recurring locality phases. These computations are good candidates for adaptation using the method described in this paper. Application of the three-step method to the familiar tomcatv benchmark predicts the locality phase of this program with near perfect accuracy, showing a cache size reduction of 40 percent, without additional misses, and a program speedup of 35 percent. This is an excellent paper on quantitative computer architecture; even Hennessy and Patterson would be impressed. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 32, Issue 5
      ASPLOS 2004
      December 2004
      283 pages
      ISSN:0163-5964
      DOI:10.1145/1037947
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
        October 2004
        296 pages
        ISBN:1581138040
        DOI:10.1145/1024393

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 October 2004

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader