ABSTRACT
Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for a variety of cloud data management systems including Hadoop (a MapReduce system), MObStor (a multimedia serving system), and PNUTS (an extended key-value serving system). Thus, a key performance challenge is to meet the latency and throughput requirements of the wide range of workloads commonly observed across these diverse systems. The motivation for Walnut is to leverage a carefully optimized low-level storage system, with support for elasticity and high-availability, across all of Yahoo!'s data clouds. This would enable sharing of hardware resources across hitherto siloed clouds of different types, offering greater potential for intelligent load balancing and efficient elastic operation, and simplify the operational tasks related to data storage.
In this paper, we discuss the motivation for unifying different storage clouds, describe the requirements of a common storage layer, and present the Walnut design, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. A unique contribution of Walnut is its hybrid object strategy, which efficiently supports both small and large objects. We present experiments based on both synthetic and real data traces, showing that Walnut works well over a wide range of workloads, and can indeed serve as a common low-level storage layer across a range of cloud systems.
- B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ul Haq, M. I. ul Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas. Windows Azure storage: a highly available cloud storage service with strong consistency. In SOSP, 2011. Google ScholarDigital Library
- D. G. Campbell, G. Kakivaya, and N. Ellis. Extreme scale with full SQL language support in Microsoft SQL Azure. In SIGMOD, 2010. Google ScholarDigital Library
- F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
- B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2), 2008. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC, 2010. Google ScholarDigital Library
- http://couchdb.apache.org.Google Scholar
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
- G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007. Google ScholarDigital Library
- B. Dees. Native command queuing-advanced performance in desktop storage. Potentials, IEEE, 24(4):4--7, 2005.Google ScholarCross Ref
- S. Ghemawat, H. Gobioff, and S. T. Leung. The Google file system. In SOSP, 2003. Google ScholarDigital Library
- http://hbase.apache.org/.Google Scholar
- J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1):51--81, 1988. Google ScholarDigital Library
- P. Hunt, M. Konar, F. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for Internet-scale systems. In USENIX ATC, 2010. Google ScholarDigital Library
- http://kosmosfs.googlecode.com/.Google Scholar
- A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2), April 2010. Google ScholarDigital Library
- L. Lamport. Paxos made simple. SIGACT News, 2001.Google Scholar
- S. Lee, B. Moon, and C. Park. Advances in flash memory SSD technology for enterprise database applications. In SIGMOD, 2009. Google ScholarDigital Library
- D. Lomet, A. Fekete, G. Weikum, and M. Zwilling. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, 2011.Google Scholar
- https://github.com/m1ch1/mapkeeper/.Google Scholar
- http://mongodb.org.Google Scholar
- M. Nelson, B. Welch, and J. Ousterhout. Caching in the sprite network file system. TOCS, 6(1):134--154, 1988. Google ScholarDigital Library
- J. Rao, E. J. Shekita, and S. Tata. Using Paxos to build a scalable, consistent, and highly available datastore. PVLDB., 4(4):243--254, 2011. Google ScholarDigital Library
- P. Schwan. Lustre: Building a file system for 1000-node clusters. In Linux Symposium, 2003.Google Scholar
- R. Sears and R. Ramakrishnan. bLSM: A general purpose log structured merge tree. In SIGMOD, 2012. Google ScholarDigital Library
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In MSST, 2010. Google ScholarDigital Library
- http://swift.openstack.org/.Google Scholar
- R. Van Renesse and F. Schneider. Chain replication for supporting high throughput and availability. In OSDI, 2004. Google ScholarDigital Library
- S. Weil, A. Leung, S. Brandt, and C. Maltzahn. Rados: a scalable, reliable storage service for petabyte-scale storage clusters. In Workshop on Petascale Data Storage, 2007. Google ScholarDigital Library
- M. Widenius and D. Axmark. MySQL Manual.Google Scholar
Index Terms
- Walnut: a unified cloud object store
Recommendations
MyCassandra: a cloud storage supporting both read heavy and write heavy workloads
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage ConferenceA cloud storage with persistence shows solid performance only with a read heavy or write heavy workload. There is a trade-off between the read-optimized and write-optimized design of a cloud storage. This is dominated by its storage engine, which is a ...
Hybris: Robust Hybrid Cloud Storage
Special Issue on FAST 2017 and Regular PapersBesides well-known benefits, commodity cloud storage also raises concerns that include security, reliability, and consistency. We present Hybris key-value store, the first robust hybrid cloud storage system, aiming at addressing these concerns ...
Middleware enabled data sharing on cloud storage services
MW4SOC '10: Proceedings of the 5th International Workshop on Middleware for Service Oriented ComputingWith the emergence of public cloud storage platforms like Amazon, Microsoft and Google etc, individual applications and some enterprise storage are being increasingly deployed on Clouds. However, dynamic data sharing in public clouds face problems of ...
Comments