Abstract
RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers into a single coherent key-value store. RAMCloud ensures the durability of DRAM-based data by keeping backup copies on secondary storage. It uses a uniform log-structured mechanism to manage both DRAM and secondary storage, which results in high performance and efficient memory usage. RAMCloud uses a polling-based approach to communication, bypassing the kernel to communicate directly with NICs; with this approach, client applications can read small objects from any RAMCloud storage server in less than 5μs, durable writes of small objects take about 13.5μs. RAMCloud does not keep multiple copies of data online; instead, it provides high availability by recovering from crashes very quickly (1 to 2 seconds). RAMCloud’s crash recovery mechanism harnesses the resources of the entire cluster working concurrently so that recovery performance scales with cluster size.
- Ars Technica. 2013. Memory That Never Forgets: Non-Volatile DIMMs Hit the Market. Retrieved July 2015, from http://arstechnica.com/information-technology/2013/04/memory-that-never-forgets-non-volatile-dimms-hit-the-market/.Google Scholar
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). ACM, New York, NY, 53--64. DOI:http://dx.doi.org/10.1145/2254756.2254766 Google ScholarDigital Library
- Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. 1994. Balanced allocations (extended abstract). In Proceedings of the 26th ACM Symposium on Theory of Computing (STOC’94). ACM, New York, NY, 593--602. DOI:http://dx.doi.org/10.1145/195058.195412 Google ScholarDigital Library
- Mary Baker and John K. Ousterhout. 1991. Availability in the sprite distributed file system. Operating Systems Review 25, 2, 95--98. Google ScholarDigital Library
- Mary Louise Gray Baker. 1994. Fast Crash Recovery in Distributed File Systems. Ph.D. Dissertation. University of California at Berkeley, Berkeley, CA.Google Scholar
- Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web search for a Planet: The Google cluster architecture. IEEE Micro 23, 2, 22--28. DOI:http://dx.doi.org/10.1109/MM.2003.1196112 Google ScholarDigital Library
- Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). 49--65. http://dl.acm.org/citation.cfm?id=2685048.2685053 Google ScholarDigital Library
- Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi, Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, and Guru Parulkar. 2014. ONOS: Towards an open, distributed SDN OS. In Proceedings of the 3rd Workshop on Hot Topics in Software Defined Networking (HotSDN’14). ACM, New York, NY, 1--6. DOI:http://dx.doi.org/10.1145/2620728.2620744 Google ScholarDigital Library
- Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, 117--128. DOI:http://dx.doi.org/10.1145/378993.379232 Google ScholarDigital Library
- Trevor Blackwell, Jeffrey Harris, and Margo Seltzer. 1995. Heuristic cleaning algorithms in log-structured file systems. In Proceedings of the USENIX 1995 Technical Conference (TCON’95). 277--288. http://dl.acm.org/citation.cfm?id=1267411.1267434 Google ScholarDigital Library
- Cassandra. 2014. Apache Cassandra. Retrieved July 2015, from http://cassandra.apache.org/.Google Scholar
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26, 2, Article No. 4. DOI:http://dx.doi.org/10.1145/1365815.1365816 Google ScholarDigital Library
- Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!’s hosted data serving platform. Proceedings of the VLDB Endowment 1, 2, 1277--1288. http://dl.acm.org/citation.cfm?id=1454159.1454167 Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). ACM, New York, NY, 143--154. DOI:http://dx.doi.org/10.1145/1807128.1807152 Google ScholarDigital Library
- William Dally. 2012. Lightspeed Datacenter Network. Presentation slides.Google Scholar
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51, 1, 107--113. DOI:http://dx.doi.org/10.1145/1327452.1327492 Google ScholarDigital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP’07). ACM, New York, NY, 205--220. DOI:http://dx.doi.org/10.1145/1294261.1294281 Google ScholarDigital Library
- David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael R. Stonebraker, and David A. Wood. 1984. Implementation techniques for main memory database systems. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (SIGMOD’84). ACM, New York, NY, 1--8. DOI:http://dx.doi.org/10.1145/602259.602261 Google ScholarDigital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 401--414. https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevi. Google ScholarDigital Library
- Robert Escriva, Bernard Wong, and Emin Gün Sirer. 2012. HyperDex: A distributed, searchable key-value store. In Proceedings of the ACM SIGCOMM 2012 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’12). ACM, New York, NY, 25--36. DOI:http://dx.doi.org/10.1145/2342356.2342360 Google ScholarDigital Library
- Jason Evans. 2006. A scalable concurrent malloc (3) implementation for FreeBSD. In Proceedings of the BSDCan Conference.Google Scholar
- Hector Garcia-Molina and Kenneth Salem. 1992. Main memory database systems: An overview. IEEE Transactions on Knowledge and Data Engineering 4, 6, 509--516. DOI:http://dx.doi.org/10.1109/69.180602 Google ScholarDigital Library
- Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). ACM, New York, NY, 29--43. DOI:http://dx.doi.org/10.1145/945445.945450 Google ScholarDigital Library
- GitHub. 2014. LevelDB—A Fast and Lightweight Key/Value Database Library by Google. Retrieved July 2015, from http://code.google.com/p/leveldb/.Google Scholar
- GitHub. 2015b. LogCabin GitHub Repository. Retrieved July 2015, from https://github.com/logcabin/logcabin/.Google Scholar
- GitHub. 2015a. RAMCloud Git Repository. Retrieved July 2015, from https://github.com/PlatformLab/RAMCloud.git.Google Scholar
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google ScholarDigital Library
- Cary Gray and David Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP’89). ACM, New York, NY, 202--210. DOI:http://dx.doi.org/10.1145/74850.74870 Google ScholarDigital Library
- Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12, 3, 463--492. DOI:http://dx.doi.org/10.1145/78969.78972 Google ScholarDigital Library
- Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (USENIX ATC’10). 145--158. http://portal.acm.org/citation.cfm?id=1855840.1855851 Google ScholarDigital Library
- Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM SIGCOMM Conference (SIGCOMM’14). ACM, New York, NY, 295--306. DOI:http://dx.doi.org/10.1145/2619239.2626299 Google ScholarDigital Library
- Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: A high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment 1, 2, 1496--1499. DOI:http://dx.doi.org/10.1145/1454159.1454211 Google ScholarDigital Library
- Ankita Kejriwal, Arjun Gopalan, Ashish Gupta, Zhihao Jia, Stephen Yang, and John Ousterhout. 2015. SLIK: Scalable Low-Latency Indexes for a Key-Value Store. Technical Report. Stanford University, Stanford, CA.Google Scholar
- Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY. Google ScholarDigital Library
- LevelDB. 2014. LevelDB File Layouts and Compactions. Retrieved July 2015, from http://leveldb.googlecode.com/svn/trunk/doc/impl.html.Google Scholar
- Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI’14). 429--444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim. Google ScholarDigital Library
- Jeanna Neefe Matthews, Drew Roselli, Adam M. Costello, Randolph Y. Wang, and Thomas E. Anderson. 1997. Improving the performance of log-structured file systems with adaptive methods. SIGOPS Operating Systems Review 31, 5, 238--251. DOI:http://dx.doi.org/10.1145/269005.266700 Google ScholarDigital Library
- Memcached. 2011. Memcached: A Distributed Memory Object Caching System. Retrieved July 2015, from http://www.memcached.org/.Google Scholar
- Michael David Mitzenmacher. 1996. The Power of Two Choices in Randomized Load Balancing. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
- Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 456--471. DOI:http://dx.doi.org/10.1145/2517349.2522739 Google ScholarDigital Library
- Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14). 305--319. https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro. Google ScholarDigital Library
- Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. 2011. Fast crash recovery in RAMCloud. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). ACM, New York, NY, 29--41. DOI:http://dx.doi.org/10.1145/2043556.2043560 Google ScholarDigital Library
- John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Diego Ongaro, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2011. The case for RAMCloud. Communications of the ACM 54, 7, 121--130. DOI:http://dx.doi.org/10.1145/1965724.1965751 Google ScholarDigital Library
- John K. Ousterhout, Andrew R. Cherenson, Frederick Douglis, Michael N. Nelson, and Brent B. Welch. 1988. The sprite network operating system. Computer 21, 2, 23--36. DOI:http://dx.doi.org/10.1109/2.16 Google ScholarDigital Library
- Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 69--84. DOI:http://dx.doi.org/10.1145/2517349.2522716 Google ScholarDigital Library
- Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. 2014. Arrakis: The operating system is the control plane. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 1--16. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/peter. Google ScholarDigital Library
- Redis. 2014. Redis Home Page. Retrieved July 2015, from http://www.redis.io/.Google Scholar
- Dennis M. Ritchie and Ken Thompson. 1974. The UNIX time-sharing system. Communications of the ACM 17, 7, 365--375. DOI:http://dx.doi.org/10.1145/361011.361061 Google ScholarDigital Library
- Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems 10, 1, 26--52. DOI:http://dx.doi.org/10.1145/146941.146943 Google ScholarDigital Library
- Stephen M. Rumble. 2014. Memory and Object Management in RAMCloud. Ph.D. Dissertation. Stanford University, Stanford, CA.Google Scholar
- Stephen M. Rumble, Ankita Kejriwal, and John Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 1--16. http://dl.acm.org/citation.cfm?id=2591305.2591307 Google ScholarDigital Library
- Margo Seltzer, Keith Bostic, Marshall Kirk Mckusick, and Carl Staelin. 1993. An implementation of a log-structured file system for UNIX. In Proceedings of the 1993 Winter USENIX Technical Conference (USENIX’93). 307--326. http://dl.acm.org/citation.cfm?id=1267303.1267306 Google ScholarDigital Library
- Margo Seltzer, Keith A. Smith, Hari Balakrishnan, Jacqueline Chang, Sara McMains, and Venkata Padmanabhan. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the USENIX 1995 Technical Conference (TCON’95). 249--264. http://dl.acm.org/citation.cfm?id=1267411.1267432 Google ScholarDigital Library
- Vishal Sikka, Franz Färber, Wolfgang Lehner, Sang Kyun Cha, Thomas Peh, and Christof Bornhövd. 2012. Efficient transaction processing in SAP HANA database: The end of a column store myth. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). ACM, New York, NY, 731--742. DOI:http://dx.doi.org/10.1145/2213836.2213946 Google ScholarDigital Library
- SourceForge. 2013. Google Performance Tools. Retrieved July 2015, from http://goog-perftools.sourceforge.net/.Google Scholar
- Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. 2003. Chord: A scalable peer-to-peer lookup protocol for Internet applications. IEEE/ACM Transactions on Networking 11, 1, 17--32. DOI:http://dx.doi.org/10.1109/TNET.2002.808407 Google ScholarDigital Library
- Ryan Stutsman, Collin Lee, and John Ousterhout. 2015. Experience with rules-based programming for distributed, concurrent, fault-tolerant code. In Proceedings of the 2015 USENIX Annual Technical Conference (USENIX ATC’15). 17--30. Google ScholarDigital Library
- Ryan S. Stutsman. 2013. Durability and Crash Recovery in Distributed In-Memory Storage Systems. Ph.D. Dissertation. Stanford University, Stanford, CA.Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). 2. http://dl.acm.org/citation.cfm?id=2228298.2228301 Google ScholarDigital Library
Index Terms
- The RAMCloud Storage System
Recommendations
Storage systems for movies-on-demand video servers
MSS '95: Proceedings of the 14th IEEE Symposium on Mass Storage SystemsWe evaluate storage system alternatives for movies-on-demand video servers. We begin by characterizing the movies-on-demand workload. We briefly discuss performance in disk arrays. First, we study disk farms in which one movie is stored per disk. This ...
Read-Performance Optimization for Deduplication-Based Storage Systems in the Cloud
Data deduplication has been demonstrated to be an effective technique in reducing the total data transferred over the network and the storage space in cloud backup, archiving, and primary storage systems, such as VM (virtual machine) platforms. However, ...
Fast crash recovery in RAMCloud
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems PrinciplesRAMCloud is a DRAM-based storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it ...
Comments