ABSTRACT
Existing cache and main memory compression techniques compress data in small fixed-size blocks, typically cache lines. Moreover, they use simple compression algorithms that focus on exploiting redundancy within a block. These techniques work well for scientific programs that are dominated by arrays. However, they are ineffective on object-based programs because objects do not fall neatly into fixed-size blocks and have a more irregular layout. We present the first compressed memory hierarchy designed for object-based applications. We observe that (i) objects, not cache lines, are the natural unit of compression for these programs, as they traverse and operate on object pointers; and (ii) though redundancy within each object is limited, redundancy across objects of the same type is plentiful. We exploit these insights through Zippads, an object-based compressed memory hierarchy, and COCO, a cross-object-compression algorithm. Building on a recent object-based memory hierarchy, Zippads transparently compresses variable-sized objects and stores them compactly. As a result, Zippads consistently outperforms a state-of-the-art compressed memory hierarchy: on a mix of array- and object-dominated workloads, Zippads achieves 1.63x higher compression ratio and improves performance by 17%.
- Alaa R Alameldeen and David A Wood. 2004 a. Adaptive cache compression for high-performance processors. In Proc. ISCA-31 . Google ScholarDigital Library
- Alaa R Alameldeen and David A Wood. 2004 b. Frequent pattern compression: A significance-based compression scheme for L2 caches . Technical Report 1500. Dept. Comp. Sci., Univ. Wisconsin-Madison.Google Scholar
- Bowen Alpern, Steve Augart, Stephen M Blackburn, Maria Butrico, Anthony Cocchi, Perry Cheng, Julian Dolby, Stephen Fink, David Grove, Michael Hind, et almbox. 2005. The Jikes research virtual machine project: Building an open-source research community. IBM Systems Journal , Vol. 44, 2 (2005). Google ScholarDigital Library
- Angelos Arelakis, Fredrik Dahlgren, and Per Stenstrom. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods. In Proc. MICRO-48 . Google ScholarDigital Library
- Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In Proc. ISCA-41 . Google ScholarDigital Library
- Nathan Beckmann and Daniel Sanchez. 2015. Bridging Theory and Practice in Cache Replacement . Technical Report MIT-CSAIL-TR-2015-034. Massachusetts Institute of Technology.Google Scholar
- Nathan Beckmann and Daniel Sanchez. 2017. Maximizing Cache Performance Under Uncertainty . In Proc. HPCA-23 .Google ScholarCross Ref
- Stephen M Blackburn, Robin Garner, Chris Hoffmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J Eliot B Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo benchmarks: Java benchmarking development and analysis. In Proc. OOPSLA . Google ScholarDigital Library
- Stephen M Blackburn and Kathryn S McKinley. 2008. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proc. PLDI . Google ScholarDigital Library
- Hans-J Boehm. 2002. An artificial garbage collection benchmark . http://hboehm.info/gc/gc_bench.html, archived at https://perma.cc/Y4BY-7RN4 .Google Scholar
- Guangyu Chen, Mahmut Kandemir, and Mary J Irwin. 2005. Exploiting frequent field values in Java objects for reducing heap memory requirements. In VEE . Google ScholarDigital Library
- Guangyu Chen, M Kandemir, Narayanan Vijaykrishnan, Mary Jane Irwin, Bernd Mathiske, and Mario Wolczko. 2003. Heap compression for memory-constrained Java environments. In Proc. OOPSLA . Google ScholarDigital Library
- Xi Chen, Lei Yang, Robert P Dick, Li Shang, and Haris Lekatsas. 2010. C-Pack: A high-performance microprocessor cache compression algorithm. IEEE transactions on very large scale integration (VLSI) systems , Vol. 18, 8 (2010). Google ScholarDigital Library
- David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P Stevenson, and Omid Azizi. 2012. HICAMP: Architectural support for efficient concurrency-safe shared structured data access. In Proc. ASPLOS-XVII . Google ScholarDigital Library
- Esha Choukse, Mattan Erez, and Alaa Alameldeen. 2018a. Compresso: Pragmatic main memory compression. In Proc. MICRO-51 .Google ScholarDigital Library
- Esha Choukse, Mattan Erez, and Alaa Alameldeen. 2018b. CompressPoints: An evaluation methodology for compressed memory systems. Computer Architecture Letters , Vol. 17, 2 (2018).Google ScholarCross Ref
- Magnus Ekman and Per Stenstrom. 2005. A robust main-memory compression scheme. In Proc. ISCA-32 . Google ScholarDigital Library
- Jason Evans. 2005. jemalloc http://jemalloc.net/.Google Scholar
- Yee Ling Gan. 2018. Redesigning the memory hierarchy for memory-safe programming languages. Master's thesis. Massachusetts Institute of Technology.Google Scholar
- Sanjay Ghemawat and Paul Menage. 2005. TCMalloc: Thread-Caching Malloc http://goog-perftools.sourceforge.net/doc/tcmalloc.html .Google Scholar
- Google. 2004. Guava: Google Core Libraries for Java . https://github.com/google/guava .Google Scholar
- Erik G Hallnor and Steven K Reinhardt. 2005. A unified compressed memory hierarchy. In Proc. HPCA-11 . Google ScholarDigital Library
- Nangate Inc. 2008. The NanGate 45nm Open Cell Library . http://www.nangate.com/?page_id=2325 .Google Scholar
- Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proc. ISCA-43 . Google ScholarDigital Library
- Seikwon Kim, Seonyoung Lee, Taehoon Kim, and Jaehyuk Huh. 2017. Transparent dual memory compression architecture. In Proc. PACT-26 .Google ScholarCross Ref
- Jan Kotek. 2012. JDBM: A simple transactional persistent engine for Java . http://jdbm.sourceforge.net/.Google Scholar
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proc. PLDI . Google ScholarDigital Library
- Barak Naveh. 2003. JGraphT . http://jgrapht.org .Google Scholar
- Tri M Nguyen and David Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In Proc. MICRO-48 . Google ScholarDigital Library
- Biswabandan Panda and André Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Proc. MICRO-49 . Google ScholarDigital Library
- Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C Mowry, and Stephen W Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proc. HPCA-22 .Google ScholarCross Ref
- Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In Proc. HPCA-21 .Google ScholarCross Ref
- Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2012a. Linearly compressed pages: A low-complexity, low-latency main memory compression framework . Technical Report SAFARI 2012-002. Carnegie Mellon University.Google Scholar
- Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2013. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. In Proc. MICRO-46 . Google ScholarDigital Library
- Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B Gibbons, Michael A Kozuch, and Todd C Mowry. 2012b. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proc. PACT-21 . Google ScholarDigital Library
- Roldan Pozo and Bruce Miller. 2004. SciMark 2.0 . https://math.nist.gov/scimark2/.Google Scholar
- Moinuddin K. Qureshi, David Thompson, and Yale N. Patt. 2005. The V-Way cache: Demand based associativity via global replacement. In Proc. ISCA-32 . Google ScholarDigital Library
- Andrey Rodchenko, Christos Kotselidis, Andy Nisbet, Antoniu Pop, and Mikel Luján. 2017. MaxSim: A simulation platform for managed applications. In Proc. ISPASS .Google ScholarCross Ref
- Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proc. ISCA-40 . Google ScholarDigital Library
- Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A Wood. 2015. A primer on compression in the memory hierarchy. Synthesis Lectures on Computer Architecture , Vol. 10, 5 (2015). Google ScholarDigital Library
- Somayeh Sardashti, André Seznec, and David A Wood. 2014. Skewed compressed caches. In Proc. MICRO-47 . Google ScholarDigital Library
- Somayeh Sardashti and David A Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proc. MICRO-46 . Google ScholarDigital Library
- Jennifer B Sartor, Stephen M Blackburn, Daniel Frampton, Martin Hirzel, and Kathryn S McKinley. 2010. Z-rays: divide arrays and conquer speed and flexibility. In Proc. PLDI . Google ScholarDigital Library
- Jennifer B Sartor, Wim Heirman, Stephen M Blackburn, Lieven Eeckhout, and Kathryn S McKinley. 2014. Cooperative cache scrubbing. In Proc. PACT-23 . Google ScholarDigital Library
- Jennifer B Sartor, Martin Hirzel, and Kathryn S McKinley. 2008. No bit left behind: The limits of heap data compression. In Proc. ISMM . Google ScholarDigital Library
- Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and Al Davis. 2014. MemZip: Exploring unconventional benefits from memory compression. In Proc. HPCA-20 .Google ScholarCross Ref
- Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas. 2017. PageForge: A near-memory content-aware page-merging architecture. In Proc. MICRO-50 . Google ScholarDigital Library
- Standard Performance Evaluation Corporation. 2006. SPECjbb2005 (Java Server Benchmark) . https://www.spec.org/jbb2005/.Google Scholar
- Sun Microsystems. 2006. Memory management in the Java HotSpot virtual machine . http://www.oracle.com/technetwork/java/javase/memorymanagement-whitepaper-150215.pdf .Google Scholar
- Yingying Tian, Samira M Khan, Daniel A Jiménez, and Gabriel H Loh. 2016. Last-level cache deduplication. In Proc. ICS'16 . Google ScholarDigital Library
- R Brett Tremaine, Peter A Franaszek, John T Robinson, Charles O Schulz, T Basil Smith, Michael E Wazlowski, and P Maurice Bland. 2001. IBM memory expansion technology (MXT). IBM Journal of Research and Development (2001). Google ScholarDigital Library
- Po-An Tsai, Yee Ling Gan, and Daniel Sanchez. 2018. Rethinking the memory hierarchy for modern languages. In Proc. MICRO-51 .Google ScholarDigital Library
- Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In Proc. SOSP-24 . Google ScholarDigital Library
- Christian Wimmer, Michael Haupt, Michael L Van De Vanter, Mick Jordan, Laurent Daynès, and Douglas Simon. 2013. Maxine: An approachable virtual machine for, and in, Java. ACM Transactions on Architecture and Code Optimization (TACO) , Vol. 9, 4 (2013). Google ScholarDigital Library
- Clifford Wolf. 2012. Yosys Open Synthesis Suite . http://www.clifford.at/yosys/.Google Scholar
- Yi Zhao, Jin Shi, Kai Zheng, Haichuan Wang, Haibo Lin, and Ling Shao. 2009. Allocation wall: A limiting factor of Java applications on emerging multi-core platforms. In Proc. OOPSLA . Google ScholarDigital Library
Index Terms
- Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy
Recommendations
Size-Aware Cache Management for Compressed Cache Architectures
A practical way to increase the effective capacity of a microprocessor's cache, without physically increasing the cache size, is to employ data compression. Last-Level Caches (LLC) are particularly amenable to such compression schemes, since the primary ...
Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue
To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...
HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs
Data compression plays a pivotal role in improving system performance and reducing energy consumption, because it increases the logical effective capacity of a compressed memory system without physically increasing the memory size. However, data ...
Comments