skip to main content
research-article

FlexDCP: a QoS framework for CMP architectures

Published:21 April 2009Publication History
Skip Abstract Section

Abstract

Current multicore architectures offer high throughput by increasing hardware resource utilization. As the number of cores in a multicore system increases, providing Quality of Service (QoS) to applications in addition to throughput is becoming an important problem.

In this work, we present FlexDCP, a framework that allows the Operating System (OS) to guarantee a QoS for each application running in a chip multiprocessor. FlexDCP directly estimates the performance of applications for different cache configurations instead of using indirect measures of performance like the number of misses. This information allows the OS to convert QoS requirements into resource assignments. Consequently, it offers more flexibility to the OS as it can optimize different QoS metrics like per-application performance or global performance metrics such as fairness, weighted speed up or throughput.

Our results show that FlexDCP is able to force applications in a workload to run at a certain percentage of their maximum performance in 94% of the cases considered, being on average 1:48% under the objective for remaining cases. When optimizing a global QoS metric like fairness, FlexDCP consistently outperforms traditional eviction policies like LRU, pseudo LRU and previous dynamic cache partitioning proposals for two-, four- and eightcore configurations. In an eight-core architecture FlexDCP obtains a fairness improvement of 10:1% over Fair, the best policy in the literature optimizing fairness.

References

  1. ARM920T. Technical Reference Manual. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0151c/ARM920T_TRM1_S.pdf.Google ScholarGoogle Scholar
  2. UltraSPARC T2. Supplement to the UltraSPARC Architecture 2007. http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf.Google ScholarGoogle Scholar
  3. D.P. Bovet and M. Cesati. Understanding Linux kernel. O'Reilly, 3rd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F.J. Cazorla, P.M.W. Knijnenburg, R. Sakellariou, E. Fernandez, A. Ramirez, and M. Valero. Predictable performance in SMT processors: Synergy between the OS and SMTs. IEEE ToC, 55(7):785--799, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Design Automation Conference, 2000.Google ScholarGoogle Scholar
  6. F. Guo, Y. Solihin, L. Zhao, and R. Iyer. A framework for providing quality of service in chip multi-processors. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Hammond, B.A. Nayfeh, and K. Olukotun. A single-chip multiprocessor. Computer, 30(9):79--85, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L.C. Heller and M.S. Farrell. Millicode in an IBM zSeries processor. IBM J. Res. Dev., 48(3-4):425--434, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.L. Hennessy and D.A. Patterson. Computer architecture: a quantitative approach. Morgan Kaufmann Publishers Inc., 3rd edition, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L.R. Hsu, S.K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C.J. Hughes, P. Kaul, S.V. Adve, R. Jain, C. Park, and J. Srinivasan. Variability in the execution of multimedia applications and implications for architecture. In ISCA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R.R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L.R. Hsu, and S.K. Reinhardt. QoS policies and architecture for cache/memory in CMP platforms. In SIGMETRICS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Jaleel, W. Hasenplaugh, M.K. Qureshi, J. Sebot, S.C.S. Jr, and J. Emer. Adaptive insertion policies for managing shared caches on cmps. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T.S. Karkhanis and J.E. Smith. A first-order superscalar processor model. In ISCA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J.W. Lee and K. Asanovic. METERG: Measurement-based end-toend performance estimation technique in QoS-capable multiprocessors. In RTAS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Luo, J. Gummaraju, and M. Franklin. Balancing throughput and fairness in SMT processors. In ISPASS, 2001.Google ScholarGoogle Scholar
  19. R.L. Mattson, J. Gecsei, D.R. Slutz, and I.L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Explaining dynamic cache partitioning speed ups. IEEE CAL, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Moreto, F.J. Cazorla, A. Ramirez, and M. Valero. Online prediction of applications cache utility. In IC-SAMOS, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K.J. Nesbit, N. Aggarwal, J. Laudon, and J.E. Smith. Fair queuing memory systems. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K.J. Nesbit, J. Laudon, and J.E. Smith. Virtual private caches. In ISCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K.J. Nesbit, M. Moreto, F.J. Cazorla, A. Ramirez, M. Valero, and J.E. Smith. A framework for managing multicore resources. IEEE Micro, special issue on Interaction of Computer Architecture and Operating System in the Many-core Era, 38(3), 2008.Google ScholarGoogle Scholar
  26. M.K. Qureshi and Y.N. Patt. Utility-based cache partitioning: A lowoverhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In PACT, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M.J. Serrano, R. Wood, and M. Nemirovsky. A study on multistreamed superscalar processors. Technical Report 93-05, UCSB, 1993.Google ScholarGoogle Scholar
  29. A. Settle, D. Connors, E. Gibert, and A. Gonzalez. A dynamically reconfigurable cache for multithreaded processors. Journal of Embedded Computing, 1(3-4), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and exploiting program phases. IEEE Micro, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J.E. Smith and R. Nair. Virtual machines: versatile platforms for systems and processes. Morgan Kaufmann Publishers Inc., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G.E. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In HPCA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D.M. Tullsen, S.J. Eggers, and H.M. Levy. Simultaneous multithreading: maximizing on-chip parallelism. In ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Vera, F.J. Cazorla, A. Pajuelo, O.J. Santana, E. Fernandez, and M. Valero. FAME: Fairly measuring multithreaded architectures. In PACT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T.Y. Yeh and G. Reinman. Fast and fair: data-stream quality of service. In CASES, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FlexDCP: a QoS framework for CMP architectures

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader