skip to main content
10.1145/2377978.2377979acmotherconferencesArticle/Chapter ViewAbstractPublication PagesasbdConference Proceedingsconference-collections
research-article

A collaborative memory system for high-performance and cost-effective clustered architectures

Authors Info & Claims
Published:10 October 2011Publication History

ABSTRACT

With the fast development of highly integrated distributed systems (cluster systems), especially those encapsulated within a single platform [28, 9], designers have to face interesting memory hierarchy design choices that attempt to avoid disk storage swapping. Disk swapping activities slow down application execution drastically. Leveraging remote free memory through Memory Collaboration has demonstrated its cost-effectiveness compared to overprovisioning for peak load requirements. Recent studies propose several ways on accessing the under-utilized remote memory in static system configurations, without detailed exploration on the dynamic memory collaboration. Dynamic collaboration is an important aspect given the run-time memory usage fluctuations in clustered systems.

In this paper, we propose an Autonomous Collaborative Memory System (ACMS) that manages memory resources dynamically at run time, to optimize performance, and provide QoS measures for nodes engaging in the system. We implement a prototype realizing the proposed ACMS, experiment with a wide range of real-world applications, and show up to 3x performance speedup compared to a non-collaborative memory system, without perceivable performance impact on nodes that provide memory. Based on our experiments, we conduct detailed analysis on the remote memory access overhead and provide insights for future optimizations.

References

  1. A. Agarwal. Facebook: Science and the social graph. http://www.infoq.com/presentations/Facebook-Software-Stack, 2009. presented in QCon San Francisco.Google ScholarGoogle Scholar
  2. Apache. Hadoop. http://hadoop.apache.org/, 2011.Google ScholarGoogle Scholar
  3. M. Awasthi, K. Sudan, R. Balasubramonian, and J. Carter. Dynamic Hardware-Assisted Software-Controlled Page Placement to Manage Capacity Allocation and Sharing within Large Caches. In HPCA '09: 2009 IEEE 15th Intl. Symp. on High Performance Computer Architecture, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schuepbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. In SOSP '09: 22nd ACM symposium on Operating systems principles, New York, NY, USA, 2009. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. M. Beckmann, M. R. Marty, and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In MICRO 39: 39th IEEE/ACM Intl. Symp. on Microarchitecture, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Chang and G. S. Sohi. Cooperative Caching for Chip Multiprocessors. In Computer Architecture, 2006. ISCA '06. 33rd Intl. Symp. on, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Chen, Y. Luo, X. Wang, B. Zhang, Y. Sun, and Z. Wang. A transparent remote paging model for virtual machines, 2008.Google ScholarGoogle Scholar
  8. Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing Replication, Communication and Capacity Allocation in CMPs,. In In the 32th ISCA, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Corp. Chip shot: Intel outlines low-power micro server strategy, 2011.Google ScholarGoogle Scholar
  10. G. Dhiman, R. Ayoub, and T. Rosing. PDRAM: a hybrid PRAM and DRAM main memory system. In 46th Design Automation Conf., DAC '09, pages 664--469, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fedora Project. Intel. Core. i7-800 Processor Series. http://fedoraproject.org/, 2010.Google ScholarGoogle Scholar
  12. Intel Corp. Thunderbolt Technology. http://www.intel.com/technology/io/thunderbolt/index.htm, 2011.Google ScholarGoogle Scholar
  13. Intel Microarchitecture. Intel. Core. i7-800 Processor Series. http://download.intel.com/products/processor/corei7/319724.pdf, 2010.Google ScholarGoogle Scholar
  14. S. Liang, R. Noronha, and D. Panda. Swapping to remote memory over InfiniBand: An approach using a high performance network block device. In Cluster Computing, 2005. IEEE Intl., pages 1--10, 2005.Google ScholarGoogle Scholar
  15. K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In 36th annual international symposium on Computer architecture, ISCA '09, pages 267--278, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Markatos, E. P. Markatos, G. Dramitinos, and G. Dramitinos. Implementation of a reliable remote memory pager. In In USENIX Technical Conf., pages 177--190, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. P. Markatos and G. Dramitinos. Adding flexibility to a remote memory pager, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. L. Massie, B. N. Chun, and D. E. Culler. The ganglia distributed monitoring system: Design, implementation and experience, 2004.Google ScholarGoogle Scholar
  19. C. R. R. Maule. iwarp ethernet: key to driving ethernet into high performance environments. In 2006 ACM/IEEE conference on Supercomputing, SC '06, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Midorikawa, M. Kurokawa, R. Himeno, and M. Sato. DLM: A distributed large memory system using remote memory swapping over cluster nodes. In Cluster Computing, 2008 IEEE Intl. Conf. on, pages 268--273, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  21. Network Block Device TCP version. NBD. http://nbd.sourceforge.net/, 2011.Google ScholarGoogle Scholar
  22. T. Newhall, S. Finney, K. Ganchev, and M. Spiegel. Nswap: A network swapping module for linux clusters, 2003.Google ScholarGoogle Scholar
  23. J. K. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for ramclouds: Scalable high-performance storage entirely in DRAM. In SIGOPS OSR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Qureshi. Adaptive Spill-Receive for Robust High-Performance Caching in CMPs. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th Intl. Symp. on, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. K. Qureshi, V. Srinivasan, and J. A. Rivers. Scalable high performance main memory system using phase-change memory technology. In 36th annual international symposium on Computer architecture, ISCA '09, pages 24--33, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural support for operating system-driven CMP cache management. In PACT '06: 15th international conference on Parallel architectures and compilation techniques, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. E. Ramos, E. Gorbatov, and R. Bianchini. Page placement in hybrid memory systems. In international conference on Supercomputing, ICS '11, pages 85--95, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Rao. Seamicro technology overview, 2010.Google ScholarGoogle Scholar
  29. A. Romanow and S. Bailey. An overview of RDMA over IP. In In 1st Intl. Workshop on Protocols for Fast Long-Distance Networks (PFLDnet, 2003.Google ScholarGoogle Scholar
  30. A. Samih, A. Krishna, and Y. Solihin. Understanding the limits of capacity sharing in CMP Private Caches, in CMP-MSI, 2009.Google ScholarGoogle Scholar
  31. A. Samih, A. Krishna, and Y. Solihin. Evaluating Placement Policies for Managing Capacity Sharing in CMP Architectures with Private Caches. ACM Trans. on Architecture and Code Optimization (TACO), 8(3), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Schlansker, N. Chitlur, E. Oertli, P. M. Stillwell, Jr, L. Rankin, D. Bradford, R. J. Carter, J. Mudigonda, N. Binkert, and N. P. Jouppi. High-performance ethernet-based communications for future multi-core processors. In 2007 ACM/IEEE conference on Supercomputing, SC '07, pages 37:1--37:12, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Standard Performance Evaluation Corporation. http://www.specbench.org, 2006.Google ScholarGoogle Scholar
  34. D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations. SIGPLAN Not., 44(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. S. Tanenbaum and R. Van Renesse. Distributed operating systems. ACM Comput. Surv., 17:419--470, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Transaction Processing Performance Council. TPC-H 2.14.2. http://www.tpc.org/tpch/, 2011.Google ScholarGoogle Scholar
  37. vmware. experience game-changing virtual machine mobility. http://www.vmware.com/products/vmotion/overview.html, 2011.Google ScholarGoogle Scholar
  38. N. Wang, X. Liu, J. He, J. Han, L. Zhang, and Z. Xu. Collaborative memory pool in cluster system. In Parallel Processing, 2007. ICPP 2007. Intl. Conf. on, page 17, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Zhang and K. Asanovic. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In ISCA '05: 32nd annual international symposium on Computer Architecture, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. A collaborative memory system for high-performance and cost-effective clustered architectures

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ASBD '11: Proceedings of the 1st Workshop on Architectures and Systems for Big Data
          October 2011
          40 pages
          ISBN:9781450314398
          DOI:10.1145/2377978

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 October 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader