skip to main content
10.1145/1944862.1944887acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipeacConference Proceedingsconference-collections
research-article

Directly characterizing cross core interference through contention synthesis

Published:24 January 2011Publication History

ABSTRACT

In this paper, we present a direct methodology and framework for the measurement and characterization of an application's cross-core interference sensitivity on multicore microarchitectures. While prior works use indirect indicators, such as last level cache miss rate, to infer an application's cross-core interference sensitivity, our approach is direct, in that it characterizes the application's cross-core interference sensitivity using the performance impact due to actual contention. Our methodology and framework, the Cross-core interference Profiling Environment, or CiPE, is composed of a lightweight runtime environment on which a host application runs, along with a carefully designed contention synthesis engine that executes on a neighboring core. CiPE manipulates the co-running contention synthesis engine, while monitoring and analyzing the resulting dynamic impact on the host application.

CiPE is able to characterize the cross-core interference sensitivity of the entire application, its individual phases, or source level code regions. To demonstrate the effectiveness of CiPE, we use CiPE characterizations to address two pressing problems. First, we use CiPE characterizations to perform contention conscious batch scheduling that minimizes cross-core interference, resulting in a 12% performance improvment on average when applied to the SPEC2006 benchmark suite, and beyond 20% in the case of mcf and omnetpp. Second, we use CiPE to design a performance analysis tool that is capable identifying contentious bottlenecks in application code.

References

  1. M. Banikazemi, D. Poff, and B. Abali. Pam: a novel performance/power aware meta-scheduler for multi-core systems. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12, Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Cho and L. Jin. Managing distributed, shared l2 caches through os-level page allocation. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 455--468, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Eranian. What can performance counters do for memory subsystem analysis? In MSPC '08: Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness, pages 26--30, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Fedorova, S. Blagodurov, and S. Zhuravlev. Managing contention for shared resources on multicore processors. Communications of the ACM, 53(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Fedorova, M. Seltzer, and M. D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 25--38, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Intel Corporation. IA-32 Application Developer's Architecture Guide. Intel Corporation, Santa Clara, CA, USA, 2009.Google ScholarGoogle Scholar
  9. M. Itzkowitz, B. J. N. Wylie, C. Aoki, and N. Kosche. Memory profiling using hardware counters. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 17, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. In SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 25--36, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 220--229, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 111--122, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. IEEE Micro, 28(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Lohr. Demand for data puts engineers in spotlight. The New York Times, 2008. Published June 17th.Google ScholarGoogle Scholar
  15. J. Mars and R. Hundt. Scenario based optimization: A framework for statically enabling online optimizations. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 169--179, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Mars and M. Soffa. Synthesizing contention. WBIA '09: Proceedings of the Workshop on Binary Instrumentation and Applications, Dec 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Mars, N. Vachharajani, R. Hundt, and M. Soffa. Contention aware execution: online contention detection and response. CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, Apr 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Merkel, J. Stoess, and F. Bellosa. Resource-conscious scheduling for energy efficiency on multicore processors. EuroSys '10: Proceedings of the 5th European conference on Computer systems, Apr 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 208--222, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 57--68, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Pesterev, N. Zeldovich, and R. Morris. Locating cache performance bottlenecks using data profiling. EuroSys '10: Proceedings of the 5th European conference on Computer systems, Apr 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 2--12, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. In MICRO '08: Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, pages 258--269, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 117, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Q. Wu, A. Pyatakov, A. Spiridonov, E. Raman, D. W. Clark, and D. I. August. Exposing memory access regularities using object-relative memory profiling. In CGO '04: Proceedings of the international symposium on Code generation and optimization, page 315, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Xie and G. H. Loh. Dynamic Classification of Program Memory Behaviors in CMPs. In The 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.Google ScholarGoogle Scholar
  28. L. Zhao, R. Iyer, R. Illikkal, J. Moses, S. Makineni, and D. Newell. Cachescouts: Fine-grain monitoring of shared caches in cmp platforms. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 339--352, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Directly characterizing cross core interference through contention synthesis

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
                January 2011
                226 pages
                ISBN:9781450302418
                DOI:10.1145/1944862

                Copyright © 2011 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 24 January 2011

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader