research-article

Directly characterizing cross core interference through contention synthesis

Authors:
Jason Mars

University of Virginia

University of Virginia
View Profile

,
Lingjia Tang

University of Virginia

University of Virginia
View Profile

,
Mary Lou Soffa

University of Virginia

University of Virginia
View Profile

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and CompilersJanuary 2011Pages 167–176https://doi.org/10.1145/1944862.1944887

Published:24 January 2011Publication History

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

Pages 167–176

ABSTRACT

In this paper, we present a direct methodology and framework for the measurement and characterization of an application's cross-core interference sensitivity on multicore microarchitectures. While prior works use indirect indicators, such as last level cache miss rate, to infer an application's cross-core interference sensitivity, our approach is direct, in that it characterizes the application's cross-core interference sensitivity using the performance impact due to actual contention. Our methodology and framework, the Cross-core interference Profiling Environment, or CiPE, is composed of a lightweight runtime environment on which a host application runs, along with a carefully designed contention synthesis engine that executes on a neighboring core. CiPE manipulates the co-running contention synthesis engine, while monitoring and analyzing the resulting dynamic impact on the host application.

CiPE is able to characterize the cross-core interference sensitivity of the entire application, its individual phases, or source level code regions. To demonstrate the effectiveness of CiPE, we use CiPE characterizations to address two pressing problems. First, we use CiPE characterizations to perform contention conscious batch scheduling that minimizes cross-core interference, resulting in a 12% performance improvment on average when applied to the SPEC2006 benchmark suite, and beyond 20% in the case of mcf and omnetpp. Second, we use CiPE to design a performance analysis tool that is capable identifying contentious bottlenecks in application code.

References

M. Banikazemi, D. Poff, and B. Abali. Pam: a novel performance/power aware meta-scheduler for multi-core systems. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pages 1--12, Piscataway, NJ, USA, 2008. IEEE Press. Google ScholarDigital Library
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA '05: Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pages 340--351, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, 2006. Google ScholarDigital Library
S. Cho and L. Jin. Managing distributed, shared l2 caches through os-level page allocation. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 455--468, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
S. Eranian. What can performance counters do for memory subsystem analysis? In MSPC '08: Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness, pages 26--30, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
A. Fedorova, S. Blagodurov, and S. Zhuravlev. Managing contention for shared resources on multicore processors. Communications of the ACM, 53(2), 2010. Google ScholarDigital Library
A. Fedorova, M. Seltzer, and M. D. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 25--38, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
Intel Corporation. IA-32 Application Developer's Architecture Guide. Intel Corporation, Santa Clara, CA, USA, 2009.Google Scholar
M. Itzkowitz, B. J. N. Wylie, C. Aoki, and N. Kosche. Memory profiling using hardware counters. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 17, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarDigital Library
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. In SIGMETRICS '07: Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 25--36, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 220--229, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 111--122, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS Observations to Improve Performance in Multicore Systems. IEEE Micro, 28(3), 2008. Google ScholarDigital Library
S. Lohr. Demand for data puts engineers in spotlight. The New York Times, 2008. Published June 17th.Google Scholar
J. Mars and R. Hundt. Scenario based optimization: A framework for statically enabling online optimizations. In CGO '09: Proceedings of the 2009 International Symposium on Code Generation and Optimization, pages 169--179, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
J. Mars and M. Soffa. Synthesizing contention. WBIA '09: Proceedings of the Workshop on Binary Instrumentation and Applications, Dec 2009. Google ScholarDigital Library
J. Mars, N. Vachharajani, R. Hundt, and M. Soffa. Contention aware execution: online contention detection and response. CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, Apr 2010. Google ScholarDigital Library
A. Merkel, J. Stoess, and F. Bellosa. Resource-conscious scheduling for energy efficiency on multicore processors. EuroSys '10: Proceedings of the 5th European conference on Computer systems, Apr 2010. Google ScholarDigital Library
K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 208--222, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 57--68, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
A. Pesterev, N. Zeldovich, and R. Morris. Locating cache performance bottlenecks using data profiling. EuroSys '10: Proceedings of the 5th European conference on Computer systems, Apr 2010. Google ScholarDigital Library
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 423--432, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven cmp cache management. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 2--12, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an os-level, software-only pollute buffer. In MICRO '08: Proceedings of the 2008 41st IEEE/ACM International Symposium on Microarchitecture, pages 258--269, Washington, DC, USA, 2008. IEEE Computer Society. Google ScholarDigital Library
G. E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 117, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarDigital Library
Q. Wu, A. Pyatakov, A. Spiridonov, E. Raman, D. W. Clark, and D. I. August. Exposing memory access regularities using object-relative memory profiling. In CGO '04: Proceedings of the international symposium on Code generation and optimization, page 315, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarDigital Library
Y. Xie and G. H. Loh. Dynamic Classification of Program Memory Behaviors in CMPs. In The 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2008.Google Scholar
L. Zhao, R. Iyer, R. Illikkal, J. Moses, S. Makineni, and D. Newell. Cachescouts: Fine-grain monitoring of shared caches in cmp platforms. In PACT '07: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, pages 339--352, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010. Google ScholarDigital Library

Index Terms

Recommendations

Contention aware execution: online contention detection and response
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

Cross-core application interference due to contention for shared on-chip and off-chip resources pose a significant challenge to providing application level quality of service (QoS) guarantees on commodity multicore micro-architectures. Unexpected cross-...
Read More
Synthesizing contention
WBIA '09: Proceedings of the Workshop on Binary Instrumentation and Applications

Multicore microarchitecture designs have become ubiquitous in today's computing environment enabling multiple processes to execute simultaneously on a single chip. With these new parallel processing capabilities comes a need to better understand how co-...
Read More
ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers
ASPLOS '13

As multicore processors with expanding core counts continue to dominate the server market, the overall utilization of the class of datacenters known as warehouse scale computers (WSCs) depends heavily on colocation of multiple workloads on each server ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
January 2011
226 pages
ISBN:9781450302418
DOI:10.1145/1944862
General Chairs:
Manolis Katevenis
FORTH-ICS and U.Crete, Greece
,
Margaret Martonosi
Princeton University
,
Program Chairs:
Christos Kozyrakis
Stanford University
,
Olivier Temam
INRIA, France
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 January 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contention
cross-core interference
dynamic analysis
execution runtimes
multicore
profiling framework
program understanding
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 344
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Directly characterizing cross core interference through contention synthesis

HiPEAC '11: Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Contention aware execution: online contention detection and response

Synthesizing contention

ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers