skip to main content
research-article
Open Access

The RAMCloud Storage System

Published:31 August 2015Publication History
Skip Abstract Section

Abstract

RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers into a single coherent key-value store. RAMCloud ensures the durability of DRAM-based data by keeping backup copies on secondary storage. It uses a uniform log-structured mechanism to manage both DRAM and secondary storage, which results in high performance and efficient memory usage. RAMCloud uses a polling-based approach to communication, bypassing the kernel to communicate directly with NICs; with this approach, client applications can read small objects from any RAMCloud storage server in less than 5μs, durable writes of small objects take about 13.5μs. RAMCloud does not keep multiple copies of data online; instead, it provides high availability by recovering from crashes very quickly (1 to 2 seconds). RAMCloud’s crash recovery mechanism harnesses the resources of the entire cluster working concurrently so that recovery performance scales with cluster size.

References

  1. Ars Technica. 2013. Memory That Never Forgets: Non-Volatile DIMMs Hit the Market. Retrieved July 2015, from http://arstechnica.com/information-technology/2013/04/memory-that-never-forgets-non-volatile-dimms-hit-the-market/.Google ScholarGoogle Scholar
  2. Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). ACM, New York, NY, 53--64. DOI:http://dx.doi.org/10.1145/2254756.2254766 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. 1994. Balanced allocations (extended abstract). In Proceedings of the 26th ACM Symposium on Theory of Computing (STOC’94). ACM, New York, NY, 593--602. DOI:http://dx.doi.org/10.1145/195058.195412 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mary Baker and John K. Ousterhout. 1991. Availability in the sprite distributed file system. Operating Systems Review 25, 2, 95--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mary Louise Gray Baker. 1994. Fast Crash Recovery in Distributed File Systems. Ph.D. Dissertation. University of California at Berkeley, Berkeley, CA.Google ScholarGoogle Scholar
  6. Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web search for a Planet: The Google cluster architecture. IEEE Micro 23, 2, 22--28. DOI:http://dx.doi.org/10.1109/MM.2003.1196112 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 49--65. http://dl.acm.org/citation.cfm?id=2685048.2685053 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi, Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, and Guru Parulkar. 2014. ONOS: Towards an open, distributed SDN OS. In Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking (HotSDN’14). ACM, New York, NY, 1--6. DOI:http://dx.doi.org/10.1145/2620728.2620744 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, 117--128. DOI:http://dx.doi.org/10.1145/378993.379232 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Trevor Blackwell, Jeffrey Harris, and Margo Seltzer. 1995. Heuristic cleaning algorithms in log-structured file systems. In Proceedings of the USENIX 1995 Technical Conference (TCON’95). 277--288. http://dl.acm.org/citation.cfm?id=1267411.1267434 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cassandra. 2014. Apache Cassandra. Retrieved July 2015, from http://cassandra.apache.org/.Google ScholarGoogle Scholar
  12. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26, 2, Article No. 4. DOI:http://dx.doi.org/10.1145/1365815.1365816 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!’s hosted data serving platform. Proceedings of the VLDB Endowment 1, 2, 1277--1288. http://dl.acm.org/citation.cfm?id=1454159.1454167 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). ACM, New York, NY, 143--154. DOI:http://dx.doi.org/10.1145/1807128.1807152 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. William Dally. 2012. Lightspeed Datacenter Network. Presentation slides.Google ScholarGoogle Scholar
  16. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113. DOI:http://dx.doi.org/10.1145/1327452.1327492 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07). ACM, New York, NY, 205--220. DOI:http://dx.doi.org/10.1145/1294261.1294281 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael R. Stonebraker, and David A. Wood. 1984. Implementation techniques for main memory database systems. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD’84). ACM, New York, NY, 1--8. DOI:http://dx.doi.org/10.1145/602259.602261 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 401--414. https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevi. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Robert Escriva, Bernard Wong, and Emin Gün Sirer. 2012. HyperDex: A distributed, searchable key-value store. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’12). ACM, New York, NY, 25--36. DOI:http://dx.doi.org/10.1145/2342356.2342360 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proceedings of the BSDCan Conference.Google ScholarGoogle Scholar
  22. Hector Garcia-Molina and Kenneth Salem. 1992. Main memory database systems: An overview. IEEE Transactions on Knowledge and Data Engineering 4, 6, 509--516. DOI:http://dx.doi.org/10.1109/69.180602 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). ACM, New York, NY, 29--43. DOI:http://dx.doi.org/10.1145/945445.945450 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. GitHub. 2014. LevelDB—A Fast and Lightweight Key/Value Database Library by Google. Retrieved July 2015, from http://code.google.com/p/leveldb/.Google ScholarGoogle Scholar
  25. GitHub. 2015b. LogCabin GitHub Repository. Retrieved July 2015, from https://github.com/logcabin/logcabin/.Google ScholarGoogle Scholar
  26. GitHub. 2015a. RAMCloud Git Repository. Retrieved July 2015, from https://github.com/PlatformLab/RAMCloud.git.Google ScholarGoogle Scholar
  27. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Cary Gray and David Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89). ACM, New York, NY, 202--210. DOI:http://dx.doi.org/10.1145/74850.74870 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12, 3, 463--492. DOI:http://dx.doi.org/10.1145/78969.78972 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (USENIX ATC’10). 145--158. http://portal.acm.org/citation.cfm?id=1855840.1855851 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM’14). ACM, New York, NY, 295--306. DOI:http://dx.doi.org/10.1145/2619239.2626299 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: A high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment 1, 2, 1496--1499. DOI:http://dx.doi.org/10.1145/1454159.1454211 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ankita Kejriwal, Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang, and John Ousterhout. 2015. SLIK: Scalable Low-Latency Indexes for a Key-Value Store. Technical Report. Stanford University, Stanford, CA.Google ScholarGoogle Scholar
  34. Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. LevelDB. 2014. LevelDB File Layouts and Compactions. Retrieved July 2015, from http://leveldb.googlecode.com/svn/trunk/doc/impl.html.Google ScholarGoogle Scholar
  36. Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 429--444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jeanna Neefe Matthews, Drew Roselli, Adam M. Costello, Randolph Y. Wang, and Thomas E. Anderson. 1997. Improving the performance of log-structured file systems with adaptive methods. SIGOPS Operating Systems Review 31, 5, 238--251. DOI:http://dx.doi.org/10.1145/269005.266700 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Memcached. 2011. Memcached: A Distributed Memory Object Caching System. Retrieved July 2015, from http://www.memcached.org/.Google ScholarGoogle Scholar
  39. Michael David Mitzenmacher. 1996. The Power of Two Choices in Randomized Load Balancing. Ph.D. Dissertation. University of California, Berkeley.Google ScholarGoogle Scholar
  40. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 456--471. DOI:http://dx.doi.org/10.1145/2517349.2522739 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). 305--319. https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). ACM, New York, NY, 29--41. DOI:http://dx.doi.org/10.1145/2043556.2043560 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2011. The case for RAMCloud. Communications of the ACM 54, 7, 121--130. DOI:http://dx.doi.org/10.1145/1965724.1965751 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. John K. Ousterhout, Andrew R. Cherenson, Frederick Douglis, Michael N. Nelson, and Brent B. Welch. 1988. The sprite network operating system. Computer 21, 2, 23--36. DOI:http://dx.doi.org/10.1109/2.16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 69--84. DOI:http://dx.doi.org/10.1145/2517349.2522716 Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The operating system is the control plane. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 1--16. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Redis. 2014. Redis Home Page. Retrieved July 2015, from http://www.redis.io/.Google ScholarGoogle Scholar
  48. Dennis M. Ritchie and Ken Thompson. 1974. The UNIX time-sharing system. Communications of the ACM 17, 7, 365--375. DOI:http://dx.doi.org/10.1145/361011.361061 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems 10, 1, 26--52. DOI:http://dx.doi.org/10.1145/146941.146943 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Stephen M. Rumble. 2014. Memory and Object Management in RAMCloud. Ph.D. Dissertation. Stanford University, Stanford, CA.Google ScholarGoogle Scholar
  51. Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 1--16. http://dl.acm.org/citation.cfm?id=2591305.2591307 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Margo Seltzer, Keith Bostic, Marshall Kirk Mckusick, and Carl Staelin. 1993. An implementation of a log-structured file system for UNIX. In Proceedings of the 1993 Winter USENIX Technical Conference (USENIX’93). 307--326. http://dl.acm.org/citation.cfm?id=1267303.1267306 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Margo Seltzer, Keith A. Smith, Hari Balakrishnan, Jacqueline Chang, Sara McMains, and Venkata Padmanabhan. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the USENIX 1995 Technical Conference (TCON’95). 249--264. http://dl.acm.org/citation.cfm?id=1267411.1267432 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient transaction processing in SAP HANA database: The end of a column store myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). ACM, New York, NY, 731--742. DOI:http://dx.doi.org/10.1145/2213836.2213946 Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. SourceForge. 2013. Google Performance Tools. Retrieved July 2015, from http://goog-perftools.sourceforge.net/.Google ScholarGoogle Scholar
  56. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. 2003. Chord: A scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Transactions on Networking 11, 1, 17--32. DOI:http://dx.doi.org/10.1109/TNET.2002.808407 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ryan Stutsman, Collin Lee, and John Ousterhout. 2015. Experience with rules-based programming for distributed, concurrent, fault-tolerant code. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC’15). 17--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Ryan S. Stutsman. 2013. Durability and Crash Recovery in Distributed In-Memory Storage Systems. Ph.D. Dissertation. Stanford University, Stanford, CA.Google ScholarGoogle Scholar
  59. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 2. http://dl.acm.org/citation.cfm?id=2228298.2228301 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The RAMCloud Storage System

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 33, Issue 3
    September 2015
    140 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/2818727
    Issue’s Table of Contents

    Copyright © 2015 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 31 August 2015
    • Revised: 1 July 2015
    • Accepted: 1 July 2015
    • Received: 1 October 2014
    Published in tocs Volume 33, Issue 3

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader