Abstract
In this new era of "big data", traditional DBMSs are under attack from two sides. At one end of the spectrum, the use of document store NoSQL systems (e.g. MongoDB) threatens to move modern Web 2.0 applications away from traditional RDBMSs. At the other end of the spectrum, big data DSS analytics that used to be the domain of parallel RDBMSs is now under attack by another class of NoSQL data analytics systems, such as Hive on Hadoop. So, are the traditional RDBMSs, aka "big elephants", doomed as they are challenged from both ends of this "big data" spectrum? In this paper, we compare one representative NoSQL system from each end of this spectrum with SQL Server, and analyze the performance and scalability aspects of each of these approaches (NoSQL vs. SQL) on two workloads (decision support analysis and interactive data-serving) that represent the two ends of the application spectrum. We present insights from this evaluation and speculate on potential trends for the future.
- CouchDB. http://couchdb.apache.org/Google Scholar
- Hadoop. http://hadoop.apache.org/Google Scholar
- Hive. http://hive.apache.org/Google Scholar
- Hive Issue 2081. https://issues.apache.org/jira/browse/HIVE-2081Google Scholar
- Hive Issue 2130. https://issues.apache.org/jira/browse/HIVE-2130Google Scholar
- Microsoft SQL Server 2008 R2 Parallel Data Warehouse. http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/pdw.aspxGoogle Scholar
- MongoDB. http://www.mongodb.org/Google Scholar
- MongoDB -- Replica Sets. http://www.mongodb.org/display/DOCS/Replica+SetsGoogle Scholar
- MongoDB - Splitting Chunk Shards. http://www.mongodb.org/display/DOCS/Splitting+Shard+ChunksGoogle Scholar
- MongoDB - Mongostat. http://www.mongodb.org/display/DOCS/mongostatGoogle Scholar
- Riak. http://wiki.basho.com/Google Scholar
- Running TPC-H queries on Hive. https://issues.apache.org/jira/browse/HIVE-600Google Scholar
- The TPC-H Benchmark. http://www.tpc.org/tpch/Google Scholar
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking Cloud Serving Systems with YCSB. In SoCC, pages 143--154, 2010. Google Scholar
- M. Y. Eltabakh, Y. Tian, F. Özcan, Rainer Gemulla, Aljoscha Krettek, John McPherson: CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop. PVLDB 4(9): 575--585, 2011. Google Scholar
- A. Floratou, J. M. Patel, E. J. Shekita, and S. Tata. Column-Oriented Storage Techniques for MapReduce. PVLDB, 4(7): 419--429, 2011. Google Scholar
- Y. He, R. Lee, Y. Huai, Z. Shao, N. Jain, X. Zhang, and Z. Xu. RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems. In ICDE, pages 1199--1208, 2011. Google Scholar
- T.Kaldewey, E. J. Shekita, and S. Tata. Clydesdale: Structured Data Processing on MapReduce. In EDBT, pages 15--25, 2012. Google Scholar
- A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A Comparison of Approaches to Large-Scale Data Analysis. In SIGMOD, pages 165--178, 2009. Google Scholar
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy. Hive: A Petabyte Scale Data Warehouse Using Hadoop. In ICDE, pages 996--1005, 2010.Google Scholar
Index Terms
- Can the elephants handle the NoSQL onslaught?
Recommendations
NoSQL databases: MongoDB vs cassandra
C3S2E '13: Proceedings of the International C* Conference on Computer Science and Software EngineeringIn the past, relational databases were used in a large scope of applications due to their rich set of features, query capabilities and transaction management. However, they are not able to store and process big data effectively and are not very ...
Comments