research-article

FlexDCP: a QoS framework for CMP architectures

Authors:
Miquel Moreto

Universitat Politècnica de Catalunya, (UPC), Barcelona, Spain

Universitat Politècnica de Catalunya, (UPC), Barcelona, Spain
View Profile

,
Francisco J. Cazorla

Barcelona Supercomputing Center, (BSC), Barcelona, Spain

Barcelona Supercomputing Center, (BSC), Barcelona, Spain
View Profile

,
Alex Ramirez

UPC, BSC, Barcelona, Spain

UPC, BSC, Barcelona, Spain
View Profile

,
Rizos Sakellariou

University of Manchester, United Kingdom

University of Manchester, United Kingdom
View Profile

,
Mateo Valero

UPC, BSC, Barcelona, Spain

UPC, BSC, Barcelona, Spain
View Profile

Authors Info & Claims

ACM SIGOPS Operating Systems Review Volume 43 Issue 2April 2009pp 86–96https://doi.org/10.1145/1531793.1531806

Published:21 April 2009Publication History

ACM SIGOPS Operating Systems Review

Abstract

Current multicore architectures offer high throughput by increasing hardware resource utilization. As the number of cores in a multicore system increases, providing Quality of Service (QoS) to applications in addition to throughput is becoming an important problem.

In this work, we present FlexDCP, a framework that allows the Operating System (OS) to guarantee a QoS for each application running in a chip multiprocessor. FlexDCP directly estimates the performance of applications for different cache configurations instead of using indirect measures of performance like the number of misses. This information allows the OS to convert QoS requirements into resource assignments. Consequently, it offers more flexibility to the OS as it can optimize different QoS metrics like per-application performance or global performance metrics such as fairness, weighted speed up or throughput.

Our results show that FlexDCP is able to force applications in a workload to run at a certain percentage of their maximum performance in 94% of the cases considered, being on average 1:48% under the objective for remaining cases. When optimizing a global QoS metric like fairness, FlexDCP consistently outperforms traditional eviction policies like LRU, pseudo LRU and previous dynamic cache partitioning proposals for two-, four- and eightcore configurations. In an eight-core architecture FlexDCP obtains a fairness improvement of 10:1% over Fair, the best policy in the literature optimizing fairness.

References

ARM920T. Technical Reference Manual. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0151c/ARM920T_TRM1_S.pdf.Google Scholar
UltraSPARC T2. Supplement to the UltraSPARC Architecture 2007. http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf.Google Scholar
D.P. Bovet and M. Cesati. Understanding Linux kernel. O'Reilly, 3rd edition, 2005. Google ScholarDigital Library
F.J. Cazorla, P.M.W. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE ToC, 55(7):785--799, 2006. Google ScholarDigital Library
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Design Automation Conference, 2000.Google Scholar
F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. In MICRO, 2007. Google ScholarDigital Library
L. Hammond, B.A. Nayfeh, and K. Olukotun. A single-chip multiprocessor. Computer, 30(9):79--85, 1997. Google ScholarDigital Library
L.C. Heller and M.S. Farrell. Millicode in an IBM zSeries processor. IBM J. Res. Dev., 48(3-4):425--434, 2004. Google ScholarDigital Library
J.L. Hennessy and D.A. Patterson. Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., 3rd edition, 2002. Google ScholarDigital Library
L.R. Hsu, S.K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT, 2006. Google ScholarDigital Library
C.J. Hughes, P. Kaul, S.V. Adve, R. Jain, C. Park, and J. Srinivasan. Variability in the execution of multimedia applications and implications for architecture. In ISCA, 2001. Google ScholarDigital Library
R.R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L.R. Hsu, and S.K. Reinhardt. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007. Google ScholarDigital Library
A. Jaleel, W. Hasenplaugh, M.K. Qureshi, J. Sebot, S.C.S. Jr, and J. Emer. Adaptive insertion policies for managing shared caches on cmps. In PACT, 2008. Google ScholarDigital Library
T.S. Karkhanis and J.E. Smith. A first-order superscalar processor model. In ISCA, 2004. Google ScholarDigital Library
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google ScholarDigital Library
J.W. Lee and K. Asanovic. METERG: Measurement-based end-toend performance estimation technique in QoS-capable multiprocessors. In RTAS, 2006. Google ScholarDigital Library
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarDigital Library
K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google Scholar
R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.Google ScholarDigital Library
M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Explaining dynamic cache partitioning speed ups. IEEE CAL, 2007. Google ScholarDigital Library
M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Online prediction of applications cache utility. In IC-SAMOS, 2007.Google ScholarCross Ref
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007. Google ScholarDigital Library
K.J. Nesbit, N. Aggarwal, J. Laudon, and J.E. Smith. Fair queuing memory systems. In MICRO, 2006. Google ScholarDigital Library
K.J. Nesbit, J. Laudon, and J.E. Smith. Virtual private caches. In ISCA, 2007. Google ScholarDigital Library
K.J. Nesbit, M. Moreto, F.J. Cazorla, A. Ramirez, M. Valero, and J.E. Smith. A framework for managing multicore resources. IEEE Micro, special issue on Interaction of Computer Architecture and Operating System in the Many-core Era, 38(3), 2008.Google Scholar
M.K. Qureshi and Y.N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006. Google ScholarDigital Library
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In PACT, 2006. Google ScholarDigital Library
M.J. Serrano, R. Wood, and M. Nemirovsky. A study on multistreamed superscalar processors. Technical Report 93-05, UCSB, 1993.Google Scholar
A. Settle, D. Connors, E. Gibert, and A. Gonzalez. A dynamically reconfigurable cache for multithreaded processors. Journal of Embedded Computing, 1(3-4), 2005. Google ScholarDigital Library
T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and exploiting program phases. IEEE Micro, 2003. Google ScholarDigital Library
J.E. Smith and R. Nair. Virtual machines: versatile platforms for systems and processes. Morgan Kaufmann Publishers Inc., 2005. Google ScholarDigital Library
G.E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002. Google ScholarDigital Library
D.M. Tullsen, S.J. Eggers, and H.M. Levy. Simultaneous multithreading: maximizing on-chip parallelism. In ISCA, 1995. Google ScholarDigital Library
J. Vera, F.J. Cazorla, A. Pajuelo, O.J. Santana, E. Fernandez, and M. Valero. FAME: Fairly measuring multithreaded architectures. In PACT, 2007. Google ScholarDigital Library
T.Y. Yeh and G. Reinman. Fast and fair: data-stream quality of service. In CASES, 2005. Google ScholarDigital Library
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS, 2004. Google ScholarDigital Library

Index Terms

FlexDCP: a QoS framework for CMP architectures

Recommendations

Ubik: efficient cache sharing with strict qos for latency-critical workloads
ASPLOS '14

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, ...
Read More
Ubik: efficient cache sharing with strict qos for latency-critical workloads
ASPLOS '14

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, ...
Read More
Ubik: efficient cache sharing with strict qos for latency-critical workloads
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Chip-multiprocessors (CMPs) must often execute workload mixes with different performance requirements. On one hand, user-facing, latency-critical applications (e.g., web search) need low tail (i.e., worst-case) latencies, often in the millisecond range, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGOPS Operating Systems Review Volume 43, Issue 2
April 2009
119 pages
ISSN:0163-5980
DOI:10.1145/1531793
Issue’s Table of Contents

Copyright © 2009 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 April 2009
Check for updates
Author Tags
cache partitioning
multicore systems
operating systems
performance predictability
quality of service
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 442
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Ubik: efficient cache sharing with strict qos for latency-critical workloads

Ubik: efficient cache sharing with strict qos for latency-critical workloads

Ubik: efficient cache sharing with strict qos for latency-critical workloads

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review

Abstract

References

Cited By

Index Terms

Recommendations

Ubik: efficient cache sharing with strict qos for latency-critical workloads

Ubik: efficient cache sharing with strict qos for latency-critical workloads

Ubik: efficient cache sharing with strict qos for latency-critical workloads

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media