Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy

Authors:
Po-An Tsai

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Daniel Sanchez

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating SystemsApril 2019Pages 229–242https://doi.org/10.1145/3297858.3304006

Published:04 April 2019Publication History

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 229–242

ABSTRACT

Existing cache and main memory compression techniques compress data in small fixed-size blocks, typically cache lines. Moreover, they use simple compression algorithms that focus on exploiting redundancy within a block. These techniques work well for scientific programs that are dominated by arrays. However, they are ineffective on object-based programs because objects do not fall neatly into fixed-size blocks and have a more irregular layout. We present the first compressed memory hierarchy designed for object-based applications. We observe that (i) objects, not cache lines, are the natural unit of compression for these programs, as they traverse and operate on object pointers; and (ii) though redundancy within each object is limited, redundancy across objects of the same type is plentiful. We exploit these insights through Zippads, an object-based compressed memory hierarchy, and COCO, a cross-object-compression algorithm. Building on a recent object-based memory hierarchy, Zippads transparently compresses variable-sized objects and stores them compactly. As a result, Zippads consistently outperforms a state-of-the-art compressed memory hierarchy: on a mix of array- and object-dominated workloads, Zippads achieves 1.63x higher compression ratio and improves performance by 17%.

References

Alaa R Alameldeen and David A Wood. 2004 a. Adaptive cache compression for high-performance processors. In Proc. ISCA-31 . Google ScholarDigital Library
Alaa R Alameldeen and David A Wood. 2004 b. Frequent pattern compression: A significance-based compression scheme for L2 caches . Technical Report 1500. Dept. Comp. Sci., Univ. Wisconsin-Madison.Google Scholar
Bowen Alpern, Steve Augart, Stephen M Blackburn, Maria Butrico, Anthony Cocchi, Perry Cheng, Julian Dolby, Stephen Fink, David Grove, Michael Hind, et almbox. 2005. The Jikes research virtual machine project: Building an open-source research community. IBM Systems Journal , Vol. 44, 2 (2005). Google ScholarDigital Library
Angelos Arelakis, Fredrik Dahlgren, and Per Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods. In Proc. MICRO-48 . Google ScholarDigital Library
Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In Proc. ISCA-41 . Google ScholarDigital Library
Nathan Beckmann and Daniel Sanchez. 2015. Bridging Theory and Practice in Cache Replacement . Technical Report MIT-CSAIL-TR-2015-034. Massachusetts Institute of Technology.Google Scholar
Nathan Beckmann and Daniel Sanchez. 2017. Maximizing Cache Performance Under Uncertainty . In Proc. HPCA-23 .Google ScholarCross Ref
Stephen M Blackburn, Robin Garner, Chris Hoffmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J Eliot B Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proc. OOPSLA . Google ScholarDigital Library
Stephen M Blackburn and Kathryn S McKinley. 2008. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proc. PLDI . Google ScholarDigital Library
Hans-J Boehm. 2002. An artificial garbage collection benchmark . http://hboehm.info/gc/gc_bench.html, archived at https://perma.cc/Y4BY-7RN4 .Google Scholar
Guangyu Chen, Mahmut Kandemir, and Mary J Irwin. 2005. Exploiting frequent field values in Java objects for reducing heap memory requirements. In VEE . Google ScholarDigital Library
Guangyu Chen, M Kandemir, Narayanan Vijaykrishnan, Mary Jane Irwin, Bernd Mathiske, and Mario Wolczko. 2003. Heap compression for memory-constrained Java environments. In Proc. OOPSLA . Google ScholarDigital Library
Xi Chen, Lei Yang, Robert P Dick, Li Shang, and Haris Lekatsas. 2010. C-Pack: A high-performance microprocessor cache compression algorithm. IEEE transactions on very large scale integration (VLSI) systems , Vol. 18, 8 (2010). Google ScholarDigital Library
David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P Stevenson, and Omid Azizi. 2012. HICAMP: Architectural support for efficient concurrency-safe shared structured data access. In Proc. ASPLOS-XVII . Google ScholarDigital Library
Esha Choukse, Mattan Erez, and Alaa Alameldeen. 2018a. Compresso: Pragmatic main memory compression. In Proc. MICRO-51 .Google ScholarDigital Library
Esha Choukse, Mattan Erez, and Alaa Alameldeen. 2018b. CompressPoints: An evaluation methodology for compressed memory systems. Computer Architecture Letters , Vol. 17, 2 (2018).Google ScholarCross Ref
Magnus Ekman and Per Stenstrom. 2005. A robust main-memory compression scheme. In Proc. ISCA-32 . Google ScholarDigital Library
Jason Evans. 2005. jemalloc http://jemalloc.net/.Google Scholar
Yee Ling Gan. 2018. Redesigning the memory hierarchy for memory-safe programming languages. Master's thesis. Massachusetts Institute of Technology.Google Scholar
Sanjay Ghemawat and Paul Menage. 2005. TCMalloc: Thread-Caching Malloc http://goog-perftools.sourceforge.net/doc/tcmalloc.html .Google Scholar
Google. 2004. Guava: Google Core Libraries for Java . https://github.com/google/guava .Google Scholar
Erik G Hallnor and Steven K Reinhardt. 2005. A unified compressed memory hierarchy. In Proc. HPCA-11 . Google ScholarDigital Library
Nangate Inc. 2008. The NanGate 45nm Open Cell Library . http://www.nangate.com/?page_id=2325 .Google Scholar
Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proc. ISCA-43 . Google ScholarDigital Library
Seikwon Kim, Seonyoung Lee, Taehoon Kim, and Jaehyuk Huh. 2017. Transparent dual memory compression architecture. In Proc. PACT-26 .Google ScholarCross Ref
Jan Kotek. 2012. JDBM: A simple transactional persistent engine for Java . http://jdbm.sourceforge.net/.Google Scholar
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. PLDI . Google ScholarDigital Library
Barak Naveh. 2003. JGraphT . http://jgrapht.org .Google Scholar
Tri M Nguyen and David Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In Proc. MICRO-48 . Google ScholarDigital Library
Biswabandan Panda and André Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Proc. MICRO-49 . Google ScholarDigital Library
Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C Mowry, and Stephen W Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proc. HPCA-22 .Google ScholarCross Ref
Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In Proc. HPCA-21 .Google ScholarCross Ref
Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2012a. Linearly compressed pages: A low-complexity, low-latency main memory compression framework . Technical Report SAFARI 2012-002. Carnegie Mellon University.Google Scholar
Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2013. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In Proc. MICRO-46 . Google ScholarDigital Library
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2012b. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proc. PACT-21 . Google ScholarDigital Library
Roldan Pozo and Bruce Miller. 2004. SciMark 2.0 . https://math.nist.gov/scimark2/.Google Scholar
Moinuddin K. Qureshi, David Thompson, and Yale N. Patt. 2005. The V-Way cache: Demand based associativity via global replacement. In Proc. ISCA-32 . Google ScholarDigital Library
Andrey Rodchenko, Christos Kotselidis, Andy Nisbet, Antoniu Pop, and Mikel Luján. 2017. MaxSim: A simulation platform for managed applications. In Proc. ISPASS .Google ScholarCross Ref
Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proc. ISCA-40 . Google ScholarDigital Library
Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A Wood. 2015. A primer on compression in the memory hierarchy. Synthesis Lectures on Computer Architecture , Vol. 10, 5 (2015). Google ScholarDigital Library
Somayeh Sardashti, André Seznec, and David A Wood. 2014. Skewed compressed caches. In Proc. MICRO-47 . Google ScholarDigital Library
Somayeh Sardashti and David A Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proc. MICRO-46 . Google ScholarDigital Library
Jennifer B Sartor, Stephen M Blackburn, Daniel Frampton, Martin Hirzel, and Kathryn S McKinley. 2010. Z-rays: divide arrays and conquer speed and flexibility. In Proc. PLDI . Google ScholarDigital Library
Jennifer B Sartor, Wim Heirman, Stephen M Blackburn, Lieven Eeckhout, and Kathryn S McKinley. 2014. Cooperative cache scrubbing. In Proc. PACT-23 . Google ScholarDigital Library
Jennifer B Sartor, Martin Hirzel, and Kathryn S McKinley. 2008. No bit left behind: The limits of heap data compression. In Proc. ISMM . Google ScholarDigital Library
Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and Al Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In Proc. HPCA-20 .Google ScholarCross Ref
Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas. 2017. PageForge: A near-memory content-aware page-merging architecture. In Proc. MICRO-50 . Google ScholarDigital Library
Standard Performance Evaluation Corporation. 2006. SPECjbb2005 (Java Server Benchmark) . https://www.spec.org/jbb2005/.Google Scholar
Sun Microsystems. 2006. Memory management in the Java HotSpot virtual machine . http://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf .Google Scholar
Yingying Tian, Samira M Khan, Daniel A Jiménez, and Gabriel H Loh. 2016. Last-level cache deduplication. In Proc. ICS'16 . Google ScholarDigital Library
R Brett Tremaine, Peter A Franaszek, John T Robinson, Charles O Schulz, T Basil Smith, Michael E Wazlowski, and P Maurice Bland. 2001. IBM memory expansion technology (MXT). IBM Journal of Research and Development (2001). Google ScholarDigital Library
Po-An Tsai, Yee Ling Gan, and Daniel Sanchez. 2018. Rethinking the memory hierarchy for modern languages. In Proc. MICRO-51 .Google ScholarDigital Library
Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In Proc. SOSP-24 . Google ScholarDigital Library
Christian Wimmer, Michael Haupt, Michael L Van De Vanter, Mick Jordan, Laurent Daynès, and Douglas Simon. 2013. Maxine: An approachable virtual machine for, and in, Java. ACM Transactions on Architecture and Code Optimization (TACO) , Vol. 9, 4 (2013). Google ScholarDigital Library
Clifford Wolf. 2012. Yosys Open Synthesis Suite . http://www.clifford.at/yosys/.Google Scholar
Yi Zhao, Jin Shi, Kai Zheng, Haichuan Wang, Haibo Lin, and Ling Shao. 2009. Allocation wall: A limiting factor of Java applications on emerging multi-core platforms. In Proc. OOPSLA . Google ScholarDigital Library

Index Terms

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures

Recommendations

Size-Aware Cache Management for Compressed Cache Architectures
A practical way to increase the effective capacity of a microprocessor's cache, without physically increasing the cache size, is to employ data compression. Last-Level Caches (LLC) are particularly amenable to such compression schemes, since the primary ...
Read More
Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...
Read More
HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

Data compression plays a pivotal role in improving system performance and reducing energy consumption, because it increases the logical effective capacity of a compressed memory system without physically increasing the memory size. However, data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
April 2019
1126 pages
ISBN:9781450362405
DOI:10.1145/3297858
General Chairs:
Iris Bahar
Brown University
,
Maurice Herlihy
Brown University
,
Program Chairs:
Emmett Witchel
University of Texas, Austin
,
Alvin Lebeck
Duke University
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 April 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache
compression
memory
object-based
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '19 Paper Acceptance Rate74of351submissions,21%Overall Acceptance Rate535of2,713submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 2,308
  Total Downloads
- Downloads (Last 12 months)211
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy

ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Size-Aware Cache Management for Compressed Cache Architectures

Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs