ABSTRACT
While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely recognized and studied, we have recently seen an explosion in the number of systems developed for cloud data serving. These newer systems address "cloud OLTP" applications, though they typically do not support ACID transactions. Examples of systems proposed for cloud serving use include BigTable, PNUTS, Cassandra, HBase, Azure, CouchDB, SimpleDB, Voldemort, and many others. Further, they are being applied to a diverse range of applications that differ considerably from traditional (e.g., TPC-C like) serving workloads. The number of emerging cloud serving systems and the wide range of proposed applications, coupled with a lack of apples-to-apples performance comparisons, makes it difficult to understand the tradeoffs between systems and the workloads for which they are suited. We present the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems. We define a core set of benchmarks and report results for four widely used systems: Cassandra, HBase, Yahoo!'s PNUTS, and a simple sharded MySQL implementation. We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible--it supports easy definition of new workloads, in addition to making it easy to benchmark new systems.
- Amazon SimpleDB. http://aws.amazon.com/simpledb/.Google Scholar
- Apache Cassandra. http://incubator.apache.org/cassandra/.Google Scholar
- Apache CouchDB. http://couchdb.apache.org/.Google Scholar
- Apache HBase. http://hadoop.apache.org/hbase/.Google Scholar
- Dynomite Framework. http://wiki.github.com/cliffmoon/-dynomite/dynomite-framework.Google Scholar
- Google App Engine. http://appengine.google.com.Google Scholar
- Hypertable. http://www.hypertable.org/.Google Scholar
- mongodb. http://www.mongodb.org/.Google Scholar
- Project Voldemort. http://project-voldemort.com/.Google Scholar
- Solaris FileBench. http://www.solarisinternals.com/wiki/index.php/FileBench.Google Scholar
- SQL Data Services/Azure Services Platform. http://www.microsoft.com/azure/data.mspx.Google Scholar
- Storage Performance Council. http://www.storageperformance.org/home.Google Scholar
- Yahoo! Query Language. http://developer.yahoo.com/yql/.Google Scholar
- A. Arasu et al. Linear Road: a stream data management benchmark. In VLDB, 2004. Google ScholarDigital Library
- F. C. Botelho, D. Belazzougui, and M. Dietzfelbinger. Compress, hash and displace. In Proc. of the 17th European Symposium on Algorithms, 2009.Google Scholar
- F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
- B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB, 2008. Google ScholarDigital Library
- G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007. Google ScholarDigital Library
- D. J. DeWitt. The Wisconsin Benchmark: Past, present and future. In J. Gray, editor, The Benchmark Handbook. Morgan Kaufmann, 1993.Google Scholar
- I. Eure. Looking to the future with Cassandra. http://blog.digg.com/?p=966.Google Scholar
- S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2):51--59, 2002. Google ScholarDigital Library
- J. Gray, editor. The Benchmark Handbook For Database and Transaction Processing Systems. Morgan Kaufmann, 1993. Google ScholarDigital Library
- J. Gray et al. Quickly generating billion-record syntheti databases. In SIGMOD, 1994. Google ScholarDigital Library
- A. Lakshman, P. Malik, and K. Ranganathan. Cassandra: A structured storage system on a P2P network. In SIGMOD, 2008.Google Scholar
- B. C. Ooi and S. Parthasarathy. Special issue on data management on cloud computing platforms. IEEE Data Engineering Bul letin, vol. 32, 2009.Google Scholar
- A. Pavlo et al. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009. Google ScholarDigital Library
- R. Rawson. HBase intro. In NoSQL Oakland, 2009.Google Scholar
- A. Schmidt et al. Xmark: A benchmark for XML data management. In VLDB, 2002. Google ScholarDigital Library
- R. Sears, M. Callaghan, and E. Brewer. Rose: Compressed, log-structured replication. In VLDB, 2008. Google ScholarDigital Library
- M. Seltzer, D. Krinsky, K. A. Smith, and X. Zhang. The case for application-specific benchmarking. In Proc. HotOS, 1999. Google ScholarDigital Library
- P. Shivam et al. Cutting corners: Workbench automation for server benchmarking. In Proc. USENIX Annual Technical Conference, 2008. Google ScholarDigital Library
- M. Stonebraker et al. C-store: a column-oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
- B. White et al. An integrated experimental environment for distributed systems and networks. In OSDI, 2002. Google ScholarDigital Library
- K. Yocum et al. Scalability and accuracy in a large-scale network emulator. In OSDI, 2002.Google ScholarDigital Library
Index Terms
- Benchmarking cloud serving systems with YCSB
Recommendations
Benchmarking OLTP/web databases in the cloud: the OLTP-bench framework
CloudDB '12: Proceedings of the fourth international workshop on Cloud data managementBenchmarking is a key activity in building and tuning data management systems, but the lack of reference workloads and a common platform makes it a time consuming and painful task. The need for such a tool is heightened with the advent of cloud ...
Performance Benchmarking of Infrastructure-as-a-Service (IaaS) Clouds with Cloud WorkBench
ICPE '19: Companion of the 2019 ACM/SPEC International Conference on Performance EngineeringThe continuing growth of the cloud computing market has led to an unprecedented diversity of cloud services with different performance characteristics. To support service selection, researchers and practitioners conduct cloud performance benchmarking by ...
Issues in big data testing and benchmarking
DBTest '13: Proceedings of the Sixth International Workshop on Testing Database SystemsThe academic community and industry are currently researching and building next generation data management systems. These systems are designed to analyze data sets of high volume with high data ingest rates and short response times executing complex ...
Comments