research-article

Open Access

The RAMCloud Storage System

Authors:
John Ousterhout

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Arjun Gopalan

Stanford University, Mountain View, CA

Stanford University, Mountain View, CA
View Profile

,
Ashish Gupta

Stanford University, Menlo Park, CA

Stanford University, Menlo Park, CA
View Profile

,
Ankita Kejriwal

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Collin Lee

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Behnam Montazeri

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Diego Ongaro

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Seo Jin Park

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Henry Qin

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Mendel Rosenblum

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Stephen Rumble

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Ryan Stutsman

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Stephen Yang

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 33 Issue 3Article No.: 7pp 1–55https://doi.org/10.1145/2806887

Published:31 August 2015Publication History

ACM Transactions on Computer Systems

Abstract

RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers into a single coherent key-value store. RAMCloud ensures the durability of DRAM-based data by keeping backup copies on secondary storage. It uses a uniform log-structured mechanism to manage both DRAM and secondary storage, which results in high performance and efficient memory usage. RAMCloud uses a polling-based approach to communication, bypassing the kernel to communicate directly with NICs; with this approach, client applications can read small objects from any RAMCloud storage server in less than 5μs, durable writes of small objects take about 13.5μs. RAMCloud does not keep multiple copies of data online; instead, it provides high availability by recovering from crashes very quickly (1 to 2 seconds). RAMCloud’s crash recovery mechanism harnesses the resources of the entire cluster working concurrently so that recovery performance scales with cluster size.

References

Ars Technica. 2013. Memory That Never Forgets: Non-Volatile DIMMs Hit the Market. Retrieved July 2015, from http://arstechnica.com/information-technology/2013/04/memory-that-never-forgets-non-volatile-dimms-hit-the-market/.Google Scholar
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). ACM, New York, NY, 53--64. DOI:http://dx.doi.org/10.1145/2254756.2254766 Google ScholarDigital Library
Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. 1994. Balanced allocations (extended abstract). In Proceedings of the 26th ACM Symposium on Theory of Computing (STOC’94). ACM, New York, NY, 593--602. DOI:http://dx.doi.org/10.1145/195058.195412 Google ScholarDigital Library
Mary Baker and John K. Ousterhout. 1991. Availability in the sprite distributed file system. Operating Systems Review 25, 2, 95--98. Google ScholarDigital Library
Mary Louise Gray Baker. 1994. Fast Crash Recovery in Distributed File Systems. Ph.D. Dissertation. University of California at Berkeley, Berkeley, CA.Google Scholar
Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web search for a Planet: The Google cluster architecture. IEEE Micro 23, 2, 22--28. DOI:http://dx.doi.org/10.1109/MM.2003.1196112 Google ScholarDigital Library
Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 49--65. http://dl.acm.org/citation.cfm?id=2685048.2685053 Google ScholarDigital Library
Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi, Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, and Guru Parulkar. 2014. ONOS: Towards an open, distributed SDN OS. In Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking (HotSDN’14). ACM, New York, NY, 1--6. DOI:http://dx.doi.org/10.1145/2620728.2620744 Google ScholarDigital Library
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, 117--128. DOI:http://dx.doi.org/10.1145/378993.379232 Google ScholarDigital Library
Trevor Blackwell, Jeffrey Harris, and Margo Seltzer. 1995. Heuristic cleaning algorithms in log-structured file systems. In Proceedings of the USENIX 1995 Technical Conference (TCON’95). 277--288. http://dl.acm.org/citation.cfm?id=1267411.1267434 Google ScholarDigital Library
Cassandra. 2014. Apache Cassandra. Retrieved July 2015, from http://cassandra.apache.org/.Google Scholar
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26, 2, Article No. 4. DOI:http://dx.doi.org/10.1145/1365815.1365816 Google ScholarDigital Library
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo&excl;’s hosted data serving platform. Proceedings of the VLDB Endowment 1, 2, 1277--1288. http://dl.acm.org/citation.cfm?id=1454159.1454167 Google ScholarDigital Library
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). ACM, New York, NY, 143--154. DOI:http://dx.doi.org/10.1145/1807128.1807152 Google ScholarDigital Library
William Dally. 2012. Lightspeed Datacenter Network. Presentation slides.Google Scholar
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113. DOI:http://dx.doi.org/10.1145/1327452.1327492 Google ScholarDigital Library
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07). ACM, New York, NY, 205--220. DOI:http://dx.doi.org/10.1145/1294261.1294281 Google ScholarDigital Library
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael R. Stonebraker, and David A. Wood. 1984. Implementation techniques for main memory database systems. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD’84). ACM, New York, NY, 1--8. DOI:http://dx.doi.org/10.1145/602259.602261 Google ScholarDigital Library
Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 401--414. https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevi. Google ScholarDigital Library
Robert Escriva, Bernard Wong, and Emin Gün Sirer. 2012. HyperDex: A distributed, searchable key-value store. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’12). ACM, New York, NY, 25--36. DOI:http://dx.doi.org/10.1145/2342356.2342360 Google ScholarDigital Library
Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proceedings of the BSDCan Conference.Google Scholar
Hector Garcia-Molina and Kenneth Salem. 1992. Main memory database systems: An overview. IEEE Transactions on Knowledge and Data Engineering 4, 6, 509--516. DOI:http://dx.doi.org/10.1109/69.180602 Google ScholarDigital Library
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). ACM, New York, NY, 29--43. DOI:http://dx.doi.org/10.1145/945445.945450 Google ScholarDigital Library
GitHub. 2014. LevelDB—A Fast and Lightweight Key/Value Database Library by Google. Retrieved July 2015, from http://code.google.com/p/leveldb/.Google Scholar
GitHub. 2015b. LogCabin GitHub Repository. Retrieved July 2015, from https://github.com/logcabin/logcabin/.Google Scholar
GitHub. 2015a. RAMCloud Git Repository. Retrieved July 2015, from https://github.com/PlatformLab/RAMCloud.git.Google Scholar
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google ScholarDigital Library
Cary Gray and David Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89). ACM, New York, NY, 202--210. DOI:http://dx.doi.org/10.1145/74850.74870 Google ScholarDigital Library
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12, 3, 463--492. DOI:http://dx.doi.org/10.1145/78969.78972 Google ScholarDigital Library
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (USENIX ATC’10). 145--158. http://portal.acm.org/citation.cfm?id=1855840.1855851 Google ScholarDigital Library
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM’14). ACM, New York, NY, 295--306. DOI:http://dx.doi.org/10.1145/2619239.2626299 Google ScholarDigital Library
Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: A high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment 1, 2, 1496--1499. DOI:http://dx.doi.org/10.1145/1454159.1454211 Google ScholarDigital Library
Ankita Kejriwal, Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang, and John Ousterhout. 2015. SLIK: Scalable Low-Latency Indexes for a Key-Value Store. Technical Report. Stanford University, Stanford, CA.Google Scholar
Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY. Google ScholarDigital Library
LevelDB. 2014. LevelDB File Layouts and Compactions. Retrieved July 2015, from http://leveldb.googlecode.com/svn/trunk/doc/impl.html.Google Scholar
Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 429--444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim. Google ScholarDigital Library
Jeanna Neefe Matthews, Drew Roselli, Adam M. Costello, Randolph Y. Wang, and Thomas E. Anderson. 1997. Improving the performance of log-structured file systems with adaptive methods. SIGOPS Operating Systems Review 31, 5, 238--251. DOI:http://dx.doi.org/10.1145/269005.266700 Google ScholarDigital Library
Memcached. 2011. Memcached: A Distributed Memory Object Caching System. Retrieved July 2015, from http://www.memcached.org/.Google Scholar
Michael David Mitzenmacher. 1996. The Power of Two Choices in Randomized Load Balancing. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 456--471. DOI:http://dx.doi.org/10.1145/2517349.2522739 Google ScholarDigital Library
Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). 305--319. https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro. Google ScholarDigital Library
Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). ACM, New York, NY, 29--41. DOI:http://dx.doi.org/10.1145/2043556.2043560 Google ScholarDigital Library
John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2011. The case for RAMCloud. Communications of the ACM 54, 7, 121--130. DOI:http://dx.doi.org/10.1145/1965724.1965751 Google ScholarDigital Library
John K. Ousterhout, Andrew R. Cherenson, Frederick Douglis, Michael N. Nelson, and Brent B. Welch. 1988. The sprite network operating system. Computer 21, 2, 23--36. DOI:http://dx.doi.org/10.1109/2.16 Google ScholarDigital Library
Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 69--84. DOI:http://dx.doi.org/10.1145/2517349.2522716 Google ScholarDigital Library
Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The operating system is the control plane. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 1--16. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter. Google ScholarDigital Library
Redis. 2014. Redis Home Page. Retrieved July 2015, from http://www.redis.io/.Google Scholar
Dennis M. Ritchie and Ken Thompson. 1974. The UNIX time-sharing system. Communications of the ACM 17, 7, 365--375. DOI:http://dx.doi.org/10.1145/361011.361061 Google ScholarDigital Library
Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems 10, 1, 26--52. DOI:http://dx.doi.org/10.1145/146941.146943 Google ScholarDigital Library
Stephen M. Rumble. 2014. Memory and Object Management in RAMCloud. Ph.D. Dissertation. Stanford University, Stanford, CA.Google Scholar
Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 1--16. http://dl.acm.org/citation.cfm?id=2591305.2591307 Google ScholarDigital Library
Margo Seltzer, Keith Bostic, Marshall Kirk Mckusick, and Carl Staelin. 1993. An implementation of a log-structured file system for UNIX. In Proceedings of the 1993 Winter USENIX Technical Conference (USENIX’93). 307--326. http://dl.acm.org/citation.cfm?id=1267303.1267306 Google ScholarDigital Library
Margo Seltzer, Keith A. Smith, Hari Balakrishnan, Jacqueline Chang, Sara McMains, and Venkata Padmanabhan. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the USENIX 1995 Technical Conference (TCON’95). 249--264. http://dl.acm.org/citation.cfm?id=1267411.1267432 Google ScholarDigital Library
Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient transaction processing in SAP HANA database: The end of a column store myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). ACM, New York, NY, 731--742. DOI:http://dx.doi.org/10.1145/2213836.2213946 Google ScholarDigital Library
SourceForge. 2013. Google Performance Tools. Retrieved July 2015, from http://goog-perftools.sourceforge.net/.Google Scholar
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. 2003. Chord: A scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Transactions on Networking 11, 1, 17--32. DOI:http://dx.doi.org/10.1109/TNET.2002.808407 Google ScholarDigital Library
Ryan Stutsman, Collin Lee, and John Ousterhout. 2015. Experience with rules-based programming for distributed, concurrent, fault-tolerant code. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC’15). 17--30. Google ScholarDigital Library
Ryan S. Stutsman. 2013. Durability and Crash Recovery in Distributed In-Memory Storage Systems. Ph.D. Dissertation. Stanford University, Stanford, CA.Google Scholar
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 2. http://dl.acm.org/citation.cfm?id=2228298.2228301 Google ScholarDigital Library

Index Terms

The RAMCloud Storage System
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Distributed memory
        Main memory
        Secondary storage

Recommendations

Storage systems for movies-on-demand video servers
MSS '95: Proceedings of the 14th IEEE Symposium on Mass Storage Systems

We evaluate storage system alternatives for movies-on-demand video servers. We begin by characterizing the movies-on-demand workload. We briefly discuss performance in disk arrays. First, we study disk farms in which one movie is stored per disk. This ...
Read More
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, ...
Read More
Fast crash recovery in RAMCloud
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

RAMCloud is a DRAM-based storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 33, Issue 3
September 2015
140 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/2818727
Editor:
Todd C. Mowry
Carnegie Mellon University, Pittsburgh, PA
Issue’s Table of Contents
Copyright © 2015 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 August 2015
- Revised: 1 July 2015
- Accepted: 1 July 2015
- Received: 1 October 2014
Published in tocs Volume 33, Issue 3

Check for updates
Author Tags
Datacenters
large-scale systems
low latency
storage systems
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 187
  Total Citations
  View Citations
- 9,826
  Total Downloads
- Downloads (Last 12 months)821
- Downloads (Last 6 weeks)104
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The RAMCloud Storage System

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Storage systems for movies-on-demand video servers

Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

Fast crash recovery in RAMCloud

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The RAMCloud Storage System

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Storage systems for movies-on-demand video servers

Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud

Fast crash recovery in RAMCloud

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media