skip to main content
research-article

HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

Published:05 April 2017Publication History
Skip Abstract Section

Abstract

Data compression plays a pivotal role in improving system performance and reducing energy consumption, because it increases the logical effective capacity of a compressed memory system without physically increasing the memory size. However, data compression techniques incur some cost, such as non-negligible compression and decompression overhead. This overhead becomes more severe if compression is used in the cache. In this article, we aim to minimize the read-hit decompression penalty in compressed Last-Level Caches (LLCs) by speculatively decompressing frequently used cachelines. To this end, we propose a Hot-cacheline Prediction and Early decompression (HoPE) mechanism that consists of three synergistic techniques: Hot-cacheline Prediction (HP), Early Decompression (ED), and Hit-history-based Insertion (HBI). HP and HBI efficiently identify the hot compressed cachelines, while ED selectively decompresses hot cachelines, based on their size information. Unlike previous approaches, the HoPE framework considers the performance balance/tradeoff between the increased effective cache capacity and the decompression penalty. To evaluate the effectiveness of the proposed HoPE mechanism, we run extensive simulations on memory traces obtained from multi-threaded benchmarks running on a full-system simulation framework. We observe significant performance improvements over compressed cache schemes employing the conventional Least-Recently Used (LRU) replacement policy, the Dynamic Re-Reference Interval Prediction (DRRIP) scheme, and the Effective Capacity Maximizer (ECM) compressed cache management mechanism. Specifically, HoPE exhibits system performance improvements of approximately 11%, on average, over LRU, 8% over DRRIP, and 7% over ECM by reducing the read-hit decompression penalty by around 65%, over a wide range of applications.

References

  1. Bulent Abali, Hubertus Franke, Xiaowei Shen, Dan E. Poff, and T. Basil Smith. 2001. Performance of hardware compressed main memory. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA’01). 73--81. Google ScholarGoogle ScholarCross RefCross Ref
  2. Ali-Reza Adl-Tabatabai, Anwar M. Ghuloum, and Shobhit O. Kanaujia. 2007. Compression in cache design. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS’07). 190--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alaa R. Alameldeen and David A. Wood. 2004a. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). 12--223. Google ScholarGoogle ScholarCross RefCross Ref
  4. Alaa R. Alameldeen and David A. Wood. 2004b. Frequent pattern compression: A significance-based compression scheme for L2 caches. In Technical Report 1500. Computer Sciences Department, University of Wisconsin—Madison.Google ScholarGoogle Scholar
  5. Apple. 2015. Apple OS X yosemite, advanced technologies. Retrieved June 2015 from http://www.apple.com/osx/advanced-technologies/.Google ScholarGoogle Scholar
  6. Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In Proceeding of the 41st Annual International Symposium on Computer Architecture (ISCA’14). 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, and Jongman Kim. 2014. Designing hybrid DRAM/PCM main memory systems utilizing dual-phase compression. ACM Trans. Des. Autom. Electron. Syst. 20, 1, Article 11 (Nov. 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Junghee Lee, and Jongman Kim. 2015. Size-aware cache management for compressed cache architectures. In IEEE Trans. Comput. 64. 2337--2352. Google ScholarGoogle ScholarCross RefCross Ref
  9. Christian Bienia and Kai Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.Google ScholarGoogle Scholar
  10. Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. VLSI 18, 8 (Aug. 2010), 1196--1208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Krupal Chikhale and Urmila Shrawankar. 2014. Hybrid multi-level cache management policy. In Proceedings of the 4th International Conference on Communication Systems and Network Technologies (CSNT’14). 1119--1123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ju Hee Choi, Jong Wook Kwak, Seong Tae Jhang, and Chu Shik Jhon. 2014. Adaptive cache compression for non-volatile memories in embedded system. In Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems (RACS’14). 52--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Franaszek, J. Robinson, and J. Thomas. 1996. Parallel compression with cooperative dictionary construction. In Proceedings of the Conference on Data Compression (DCC’96). 200--209. Google ScholarGoogle ScholarCross RefCross Ref
  14. E. G. Hallnor and S. K. Reinhardt. 2004. A compressed memory hierarchy using an indirect index cache. In Proceedings of the 3rd Workshop on Memory Performance Issues: In conjunction with the 31st International Symposium on Computer Architecture (WMPI’04). 9--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. G. Hallnor and S. K. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05). 201--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hewlett-Packard. CACTI-6.5. Retrieved from http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  17. Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon Steely, Jr., and Joel Emer. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 208--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Soontae Kim, Jongmin Lee, Jesung Kim, and Seokin Hong. 2011. Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 420--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. J. Syst. Arch. 46, 15 (Dec. 2000), pp. 1365--1382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Haiming Liu, Michael Ferdman, Jaehyuk Huh, and Doug Burger. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’08). 222--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. IEEE Comput. 35, 2 (Oct. 2002), 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Arch. News 33, 4 (2005), 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Micron. 2013. Datasheet of DDR3 SDRAM UDIMM, MT8JTF12864AZ, MT8JTF25664AZ, MT8JFT51264AZ.Google ScholarGoogle Scholar
  25. Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip B Gibbons, Michael Kozuch, Todd C Mowry, and others. 2015. Exploiting compressed block size as an indicator of future reuse. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63. Google ScholarGoogle ScholarCross RefCross Ref
  26. Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 377--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). 381--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). 2--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Luis Villa, Michael Zhang, and Krste Asanović. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). 214--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 430--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yuejian Xie and G. H. Loh. 2011. Thread-aware dynamic shared cache compression in multi-core processors. In Proceedings of the 29th IEEE International Conference on Computer Design (ICCD’11). 135--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). 258--265. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Design Automation of Electronic Systems
        ACM Transactions on Design Automation of Electronic Systems  Volume 22, Issue 3
        July 2017
        440 pages
        ISSN:1084-4309
        EISSN:1557-7309
        DOI:10.1145/3062395
        • Editor:
        • Naehyuck Chang
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 April 2017
        • Accepted: 1 September 2016
        • Revised: 1 August 2016
        • Received: 1 December 2015
        Published in todaes Volume 22, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader