research-article

A collaborative memory system for high-performance and cost-effective clustered architectures

Authors:
Ahmad Samih

System Architecture Lab, Intel Research Labs, Hillsboro, OR and North Carolina State University, Raleigh, NC

System Architecture Lab, Intel Research Labs, Hillsboro, OR and North Carolina State University, Raleigh, NC
View Profile

,
Ren Wang

System Architecture Lab, Intel Research Labs, Hillsboro, OR

System Architecture Lab, Intel Research Labs, Hillsboro, OR
View Profile

,
Christian Maciocco

System Architecture Lab, Intel Research Labs, Hillsboro, OR

System Architecture Lab, Intel Research Labs, Hillsboro, OR
View Profile

,
Tsung-Yuan Charlie Tai

System Architecture Lab, Intel Research Labs, Hillsboro, OR

System Architecture Lab, Intel Research Labs, Hillsboro, OR
View Profile

,
Yan Solihin

North Carolina State University, Raleigh, NC

North Carolina State University, Raleigh, NC
View Profile

ASBD '11: Proceedings of the 1st Workshop on Architectures and Systems for Big DataOctober 2011Pages 4–12https://doi.org/10.1145/2377978.2377979

Published:10 October 2011Publication History

ASBD '11: Proceedings of the 1st Workshop on Architectures and Systems for Big Data

Pages 4–12

ABSTRACT

With the fast development of highly integrated distributed systems (cluster systems), especially those encapsulated within a single platform [28, 9], designers have to face interesting memory hierarchy design choices that attempt to avoid disk storage swapping. Disk swapping activities slow down application execution drastically. Leveraging remote free memory through Memory Collaboration has demonstrated its cost-effectiveness compared to overprovisioning for peak load requirements. Recent studies propose several ways on accessing the under-utilized remote memory in static system configurations, without detailed exploration on the dynamic memory collaboration. Dynamic collaboration is an important aspect given the run-time memory usage fluctuations in clustered systems.

In this paper, we propose an Autonomous Collaborative Memory System (ACMS) that manages memory resources dynamically at run time, to optimize performance, and provide QoS measures for nodes engaging in the system. We implement a prototype realizing the proposed ACMS, experiment with a wide range of real-world applications, and show up to 3x performance speedup compared to a non-collaborative memory system, without perceivable performance impact on nodes that provide memory. Based on our experiments, we conduct detailed analysis on the remote memory access overhead and provide insights for future optimizations.

References

A. Agarwal. Facebook: Science and the social graph. http://www.infoq.com/presentations/Facebook-Software-Stack, 2009. presented in QCon San Francisco.Google Scholar
Apache. Hadoop. http://hadoop.apache.org/, 2011.Google Scholar
M. Awasthi, K. Sudan, R. Balasubramonian, and J. Carter. Dynamic Hardware-Assisted Software-Controlled Page Placement to Manage Capacity Allocation and Sharing within Large Caches. In HPCA '09: 2009 IEEE 15th Intl. Symp. on High Performance Computer Architecture, 2009.Google ScholarCross Ref
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schuepbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In SOSP '09: 22nd ACM symposium on Operating systems principles, New York, NY, USA, 2009. ACM Press. Google ScholarDigital Library
B. M. Beckmann, M. R. Marty, and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In MICRO 39: 39th IEEE/ACM Intl. Symp. on Microarchitecture, 2006. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative Caching for Chip Multiprocessors. In Computer Architecture, 2006. ISCA '06. 33rd Intl. Symp. on, 2006. Google ScholarDigital Library
H. Chen, Y. Luo, X. Wang, B. Zhang, Y. Sun, and Z. Wang. A transparent remote paging model for virtual machines, 2008.Google Scholar
Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing Replication, Communication and Capacity Allocation in CMPs,. In In the 32th ISCA, June 2005. Google ScholarDigital Library
I. Corp. Chip shot: Intel outlines low-power micro server strategy, 2011.Google Scholar
G. Dhiman, R. Ayoub, and T. Rosing. PDRAM: a hybrid PRAM and DRAM main memory system. In 46th Design Automation Conf., DAC '09, pages 664--469, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Fedora Project. Intel. Core. i7-800 Processor Series. http://fedoraproject.org/, 2010.Google Scholar
Intel Corp. Thunderbolt Technology. http://www.intel.com/technology/io/thunderbolt/index.htm, 2011.Google Scholar
Intel Microarchitecture. Intel. Core. i7-800 Processor Series. http://download.intel.com/products/processor/corei7/319724.pdf, 2010.Google Scholar
S. Liang, R. Noronha, and D. Panda. Swapping to remote memory over InfiniBand: An approach using a high performance network block device. In Cluster Computing, 2005. IEEE Intl., pages 1--10, 2005.Google Scholar
K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In 36th annual international symposium on Computer architecture, ISCA '09, pages 267--278, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
E. Markatos, E. P. Markatos, G. Dramitinos, and G. Dramitinos. Implementation of a reliable remote memory pager. In In USENIX Technical Conf., pages 177--190, 1996. Google ScholarDigital Library
E. P. Markatos and G. Dramitinos. Adding flexibility to a remote memory pager, 1996. Google ScholarDigital Library
M. L. Massie, B. N. Chun, and D. E. Culler. The ganglia distributed monitoring system: Design, implementation and experience, 2004.Google Scholar
C. R. R. Maule. iwarp ethernet: key to driving ethernet into high performance environments. In 2006 ACM/IEEE conference on Supercomputing, SC '06, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
H. Midorikawa, M. Kurokawa, R. Himeno, and M. Sato. DLM: A distributed large memory system using remote memory swapping over cluster nodes. In Cluster Computing, 2008 IEEE Intl. Conf. on, pages 268--273, 2008.Google ScholarCross Ref
Network Block Device TCP version. NBD. http://nbd.sourceforge.net/, 2011.Google Scholar
T. Newhall, S. Finney, K. Ganchev, and M. Spiegel. Nswap: A network swapping module for linux clusters, 2003.Google Scholar
J. K. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for ramclouds: Scalable high-performance storage entirely in DRAM. In SIGOPS OSR, 2009. Google ScholarDigital Library
M. Qureshi. Adaptive Spill-Receive for Robust High-Performance Caching in CMPs. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th Intl. Symp. on, 2009. Google ScholarDigital Library
M. K. Qureshi, V. Srinivasan, and J. A. Rivers. Scalable high performance main memory system using phase-change memory technology. In 36th annual international symposium on Computer architecture, ISCA '09, pages 24--33, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In PACT '06: 15th international conference on Parallel architectures and compilation techniques, 2006. Google ScholarDigital Library
L. E. Ramos, E. Gorbatov, and R. Bianchini. Page placement in hybrid memory systems. In international conference on Supercomputing, ICS '11, pages 85--95, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
A. Rao. Seamicro technology overview, 2010.Google Scholar
A. Romanow and S. Bailey. An overview of RDMA over IP. In In 1st Intl. Workshop on Protocols for Fast Long-Distance Networks (PFLDnet, 2003.Google Scholar
A. Samih, A. Krishna, and Y. Solihin. Understanding the limits of capacity sharing in CMP Private Caches, in CMP-MSI, 2009.Google Scholar
A. Samih, A. Krishna, and Y. Solihin. Evaluating Placement Policies for Managing Capacity Sharing in CMP Architectures with Private Caches. ACM Trans. on Architecture and Code Optimization (TACO), 8(3), 2011. Google ScholarDigital Library
M. Schlansker, N. Chitlur, E. Oertli, P. M. Stillwell, Jr, L. Rankin, D. Bradford, R. J. Carter, J. Mudigonda, N. Binkert, and N. P. Jouppi. High-performance ethernet-based communications for future multi-core processors. In 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 37:1--37:12, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Standard Performance Evaluation Corporation. http://www.specbench.org, 2006.Google Scholar
D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations. SIGPLAN Not., 44(3), 2009. Google ScholarDigital Library
A. S. Tanenbaum and R. Van Renesse. Distributed operating systems. ACM Comput. Surv., 17:419--470, 1985. Google ScholarDigital Library
Transaction Processing Performance Council. TPC-H 2.14.2. http://www.tpc.org/tpch/, 2011.Google Scholar
vmware. experience game-changing virtual machine mobility. http://www.vmware.com/products/vmotion/overview.html, 2011.Google Scholar
N. Wang, X. Liu, J. He, J. Han, L. Zhang, and Z. Xu. Collaborative memory pool in cluster system. In Parallel Processing, 2007. ICPP 2007. Intl. Conf. on, page 17, 2007. Google ScholarDigital Library
M. Zhang and K. Asanovic. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In ISCA '05: 32nd annual international symposium on Computer Architecture, 2005. Google ScholarDigital Library

A collaborative memory system for high-performance and cost-effective clustered architectures

Recommendations

State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System
DAC '14: Proceedings of the 51st Annual Design Automation Conference

Multi-level Cell Spin-Transfer Torque Random Access Memory (MLC STT-RAM) is a promising nonvolatile memory technology for high-capacity and high-performance applications. However, the reliability concerns and the complicated access mechanism greatly ...
Read More
Scalable high performance main memory system using phase-change memory technology

The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. An alternative memory technology that uses resistance contrast in phase-...
Read More
Scalable high performance main memory system using phase-change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. An alternative memory technology that uses resistance contrast in phase-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ASBD '11: Proceedings of the 1st Workshop on Architectures and Systems for Big Data
October 2011
40 pages
ISBN:9781450314398
DOI:10.1145/2377978

Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 171
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A collaborative memory system for high-performance and cost-effective clustered architectures

ASBD '11: Proceedings of the 1st Workshop on Architectures and Systems for Big Data

ABSTRACT

References

Cited By

Recommendations

State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System

Scalable high performance main memory system using phase-change memory technology

Scalable high performance main memory system using phase-change memory technology

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A collaborative memory system for high-performance and cost-effective clustered architectures

ASBD '11: Proceedings of the 1st Workshop on Architectures and Systems for Big Data

ABSTRACT

References

Cited By

Recommendations

State-Restrict MLC STT-RAM Designs for High-Reliable High-Performance Memory System

Scalable high performance main memory system using phase-change memory technology

Scalable high performance main memory system using phase-change memory technology

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media