skip to main content
10.1145/2213836.2213947acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Walnut: a unified cloud object store

Published:20 May 2012Publication History

ABSTRACT

Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for a variety of cloud data management systems including Hadoop (a MapReduce system), MObStor (a multimedia serving system), and PNUTS (an extended key-value serving system). Thus, a key performance challenge is to meet the latency and throughput requirements of the wide range of workloads commonly observed across these diverse systems. The motivation for Walnut is to leverage a carefully optimized low-level storage system, with support for elasticity and high-availability, across all of Yahoo!'s data clouds. This would enable sharing of hardware resources across hitherto siloed clouds of different types, offering greater potential for intelligent load balancing and efficient elastic operation, and simplify the operational tasks related to data storage.

In this paper, we discuss the motivation for unifying different storage clouds, describe the requirements of a common storage layer, and present the Walnut design, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. A unique contribution of Walnut is its hybrid object strategy, which efficiently supports both small and large objects. We present experiments based on both synthetic and real data traces, showing that Walnut works well over a wide range of workloads, and can indeed serve as a common low-level storage layer across a range of cloud systems.

References

  1. B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ul Haq, M. I. ul Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas. Windows Azure storage: a highly available cloud storage service with strong consistency. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. G. Campbell, G. Kakivaya, and N. Ellis. Extreme scale with full SQL language support in Microsoft SQL Azure. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. http://couchdb.apache.org.Google ScholarGoogle Scholar
  7. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Dees. Native command queuing-advanced performance in desktop storage. Potentials, IEEE, 24(4):4--7, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Ghemawat, H. Gobioff, and S. T. Leung. The Google file system. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. http://hbase.apache.org/.Google ScholarGoogle Scholar
  12. J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1):51--81, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Hunt, M. Konar, F. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for Internet-scale systems. In USENIX ATC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. http://kosmosfs.googlecode.com/.Google ScholarGoogle Scholar
  15. A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2), April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Lamport. Paxos made simple. SIGACT News, 2001.Google ScholarGoogle Scholar
  17. S. Lee, B. Moon, and C. Park. Advances in flash memory SSD technology for enterprise database applications. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lomet, A. Fekete, G. Weikum, and M. Zwilling. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, 2011.Google ScholarGoogle Scholar
  19. https://github.com/m1ch1/mapkeeper/.Google ScholarGoogle Scholar
  20. http://mongodb.org.Google ScholarGoogle Scholar
  21. M. Nelson, B. Welch, and J. Ousterhout. Caching in the sprite network file system. TOCS, 6(1):134--154, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Rao, E. J. Shekita, and S. Tata. Using Paxos to build a scalable, consistent, and highly available datastore. PVLDB., 4(4):243--254, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Schwan. Lustre: Building a file system for 1000-node clusters. In Linux Symposium, 2003.Google ScholarGoogle Scholar
  24. R. Sears and R. Ramakrishnan. bLSM: A general purpose log structured merge tree. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In MSST, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. http://swift.openstack.org/.Google ScholarGoogle Scholar
  27. R. Van Renesse and F. Schneider. Chain replication for supporting high throughput and availability. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Weil, A. Leung, S. Brandt, and C. Maltzahn. Rados: a scalable, reliable storage service for petabyte-scale storage clusters. In Workshop on Petascale Data Storage, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Widenius and D. Axmark. MySQL Manual.Google ScholarGoogle Scholar

Index Terms

  1. Walnut: a unified cloud object store

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
        May 2012
        886 pages
        ISBN:9781450312479
        DOI:10.1145/2213836

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 May 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '12 Paper Acceptance Rate48of289submissions,17%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader