research-article

HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

Authors:
Jaehyun Park

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Seungcheol Baek

Samsung Electronics Co., Ltd., Gyeonggi-do, South Korea

Samsung Electronics Co., Ltd., Gyeonggi-do, South Korea
View Profile

,
Hyung Gyu Lee

Daegu University, Gyeongbuk, South Korea

Daegu University, Gyeongbuk, South Korea
View Profile

,
Chrysostomos Nicopoulos

University of Cyprus, Nicosia, Cyprus

University of Cyprus, Nicosia, Cyprus
View Profile

,
Vinson Young

Georgia Institute of Technology, Atlanta, USA

Georgia Institute of Technology, Atlanta, USA
View Profile

,
Junghee Lee

University of Texas at San Antonio, TX, USA

University of Texas at San Antonio, TX, USA
View Profile

,
Jongman Kim

Soteria Systems LLC, Atlanta, USA

Soteria Systems LLC, Atlanta, USA
View Profile

ACM Transactions on Design Automation of Electronic Systems Volume 22 Issue 3Article No.: 40pp 1–25https://doi.org/10.1145/2999538

Published:05 April 2017Publication History

ACM Transactions on Design Automation of Electronic Systems

Abstract

Data compression plays a pivotal role in improving system performance and reducing energy consumption, because it increases the logical effective capacity of a compressed memory system without physically increasing the memory size. However, data compression techniques incur some cost, such as non-negligible compression and decompression overhead. This overhead becomes more severe if compression is used in the cache. In this article, we aim to minimize the read-hit decompression penalty in compressed Last-Level Caches (LLCs) by speculatively decompressing frequently used cachelines. To this end, we propose a Hot-cacheline Prediction and Early decompression (HoPE) mechanism that consists of three synergistic techniques: Hot-cacheline Prediction (HP), Early Decompression (ED), and Hit-history-based Insertion (HBI). HP and HBI efficiently identify the hot compressed cachelines, while ED selectively decompresses hot cachelines, based on their size information. Unlike previous approaches, the HoPE framework considers the performance balance/tradeoff between the increased effective cache capacity and the decompression penalty. To evaluate the effectiveness of the proposed HoPE mechanism, we run extensive simulations on memory traces obtained from multi-threaded benchmarks running on a full-system simulation framework. We observe significant performance improvements over compressed cache schemes employing the conventional Least-Recently Used (LRU) replacement policy, the Dynamic Re-Reference Interval Prediction (DRRIP) scheme, and the Effective Capacity Maximizer (ECM) compressed cache management mechanism. Specifically, HoPE exhibits system performance improvements of approximately 11%, on average, over LRU, 8% over DRRIP, and 7% over ECM by reducing the read-hit decompression penalty by around 65%, over a wide range of applications.

References

Bulent Abali, Hubertus Franke, Xiaowei Shen, Dan E. Poff, and T. Basil Smith. 2001. Performance of hardware compressed main memory. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA’01). 73--81. Google ScholarCross Ref
Ali-Reza Adl-Tabatabai, Anwar M. Ghuloum, and Shobhit O. Kanaujia. 2007. Compression in cache design. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS’07). 190--201. Google ScholarDigital Library
Alaa R. Alameldeen and David A. Wood. 2004a. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). 12--223. Google ScholarCross Ref
Alaa R. Alameldeen and David A. Wood. 2004b. Frequent pattern compression: A significance-based compression scheme for L2 caches. In Technical Report 1500. Computer Sciences Department, University of Wisconsin—Madison.Google Scholar
Apple. 2015. Apple OS X yosemite, advanced technologies. Retrieved June 2015 from http://www.apple.com/osx/advanced-technologies/.Google Scholar
Angelos Arelakis and Per Stenstrom. 2014. SC²: A statistical compression cache scheme. In Proceeding of the 41st Annual International Symposium on Computer Architecture (ISCA’14). 145--156. Google ScholarDigital Library
Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, and Jongman Kim. 2014. Designing hybrid DRAM/PCM main memory systems utilizing dual-phase compression. ACM Trans. Des. Autom. Electron. Syst. 20, 1, Article 11 (Nov. 2014). Google ScholarDigital Library
Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Junghee Lee, and Jongman Kim. 2015. Size-aware cache management for compressed cache architectures. In IEEE Trans. Comput. 64. 2337--2352. Google ScholarCross Ref
Christian Bienia and Kai Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.Google Scholar
Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. VLSI 18, 8 (Aug. 2010), 1196--1208. Google ScholarDigital Library
Krupal Chikhale and Urmila Shrawankar. 2014. Hybrid multi-level cache management policy. In Proceedings of the 4th International Conference on Communication Systems and Network Technologies (CSNT’14). 1119--1123. Google ScholarDigital Library
Ju Hee Choi, Jong Wook Kwak, Seong Tae Jhang, and Chu Shik Jhon. 2014. Adaptive cache compression for non-volatile memories in embedded system. In Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems (RACS’14). 52--57. Google ScholarDigital Library
P. Franaszek, J. Robinson, and J. Thomas. 1996. Parallel compression with cooperative dictionary construction. In Proceedings of the Conference on Data Compression (DCC’96). 200--209. Google ScholarCross Ref
E. G. Hallnor and S. K. Reinhardt. 2004. A compressed memory hierarchy using an indirect index cache. In Proceedings of the 3rd Workshop on Memory Performance Issues: In conjunction with the 31st International Symposium on Computer Architecture (WMPI’04). 9--15. Google ScholarDigital Library
E. G. Hallnor and S. K. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05). 201--212. Google ScholarDigital Library
Hewlett-Packard. CACTI-6.5. Retrieved from http://www.hpl.hp.com/research/cacti/.Google Scholar
Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon Steely, Jr., and Joel Emer. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 208--219. Google ScholarDigital Library
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). 60--71. Google ScholarDigital Library
Soontae Kim, Jongmin Lee, Jesung Kim, and Seokin Hong. 2011. Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 420--429. Google ScholarDigital Library
Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. J. Syst. Arch. 46, 15 (Dec. 2000), pp. 1365--1382. Google ScholarDigital Library
Haiming Liu, Michael Ferdman, Jaehyuk Huh, and Doug Burger. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’08). 222--233. Google ScholarDigital Library
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. IEEE Comput. 35, 2 (Oct. 2002), 50--58. Google ScholarDigital Library
Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Arch. News 33, 4 (2005), 92--99. Google ScholarDigital Library
Micron. 2013. Datasheet of DDR3 SDRAM UDIMM, MT8JTF12864AZ, MT8JTF25664AZ, MT8JFT51264AZ.Google Scholar
Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip B Gibbons, Michael Kozuch, Todd C Mowry, and others. 2015. Exploiting compressed block size as an indicator of future reuse. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63. Google ScholarCross Ref
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 377--388. Google ScholarDigital Library
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). 381--391. Google ScholarDigital Library
Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). 2--73. Google ScholarDigital Library
Luis Villa, Michael Zhang, and Krste Asanović. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). 214--220. Google ScholarDigital Library
Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 430--441. Google ScholarDigital Library
Yuejian Xie and G. H. Loh. 2011. Thread-aware dynamic shared cache compression in multi-core processors. In Proceedings of the 29th IEEE International Conference on Computer Design (ICCD’11). 135--141. Google ScholarDigital Library
Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). 258--265. Google ScholarDigital Library

Index Terms

HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs
1. Computer systems organization
  1. Architectures
  2. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures

Recommendations

Size-Aware Cache Management for Compressed Cache Architectures
A practical way to increase the effective capacity of a microprocessor's cache, without physically increasing the cache size, is to employ data compression. Last-Level Caches (LLC) are particularly amenable to such compression schemes, since the primary ...
Read More
Zero-content augmented caches
ICS '09: Proceedings of the 23rd international conference on Supercomputing

It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On some applications more than 20% of the data accesses concern null data blocks. Representing a null block ...
Read More
Performance-Energy Considerations for Shared Cache Management in a Heterogeneous Multicore Processor

Heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators such as graphic processing unit (GPU) cores onto the same die raise several new issues for sharing various on-chip resources. The shared last-level cache (LLC) is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Design Automation of Electronic Systems Volume 22, Issue 3
July 2017
440 pages
ISSN:1084-4309
EISSN:1557-7309
DOI:10.1145/3062395
Editor:
Naehyuck Chang
Korea Advanced Institute of Science and Technology, Korea
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 5 April 2017
- Accepted: 1 September 2016
- Revised: 1 August 2016
- Received: 1 December 2015
Published in todaes Volume 22, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cache
cache management policy
compression
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 182
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

ACM Transactions on Design Automation of Electronic Systems

Abstract

References

Cited By

Index Terms

Recommendations

Size-Aware Cache Management for Compressed Cache Architectures

Zero-content augmented caches

Performance-Energy Considerations for Shared Cache Management in a Heterogeneous Multicore Processor