Abstract
MapReduce complements DBMSs since databases are not designed for extract-transform-load tasks, a MapReduce specialty.
- Abadi, D.J., Madden, S.R., and Hachem, N. Column-stores vs. row-stores: How different are they really? In Proceedings of the SIGMOD Conference on Management of Data. ACM Press, New York, 2008 Google ScholarDigital Library
- Abadi, D.J., Marcus, A., Madden, S.R., and Hollenbach, K. Scalable semantic Web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very Large Databases, 2007 Google ScholarDigital Library
- Abadi, D.J. Column-stores for wide and sparse data. In Proceedings of the Conference on Innovative Data Systems Research, 2007.Google Scholar
- Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J. Silberschatz, A., and Rasin, A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In Proceedings of the Conference on Very Large Databases, 2009 Google ScholarDigital Library
- Boral, H. et al. Prototyping Bubba, a highly parallel database system. IEEE Transactions on Knowledge and Data Engineering 2, 1 (Mar. 1990), 4--24. Google ScholarDigital Library
- Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: Easy and efficient parallel processing of massive data sets. In Proceedings of the Conference on Very Large Databases, 2008. Google ScholarDigital Library
- Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Proceedings of the Sixth Conference on Operating System Design and Implementation (Berkeley, CA, 2004). Google ScholarDigital Library
- DeWitt, D.J. and Gray, J. Parallel database systems: The future of high-performance database systems. Commun. ACM 35, 6 (June 1992), 85--98. Google ScholarDigital Library
- DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., and Muralikrishna, M. GAMMA: A high-performance dataflow database machine. In Proceedings of the 12th International Conference on Very Large Databases. Morgan Kaufmann Publishers, Inc., 1986, 228--237. Google ScholarDigital Library
- Englert, S., Gray, J., Kocher, T., and shah, P. A benchmark of NonStop SQL Release 2 demonstrating near-linear speedup and scaleup on large databases. Sigmetrics Performance Evaluation Review 18, 1 (1990), 1990, 245--246. Google ScholarDigital Library
- Fushimi, S., Kitsuregawa, M., and Tanaka, H. An overview of the system software of a parallel relational database machine. In Proceedings of the 12th International Conference on Very Large Databases, Morgan Kaufmann Publishers, Inc., 1986, 209--219. Google ScholarDigital Library
- Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Operating System Review 41, 3 (2007), 59--72. Google ScholarDigital Library
- Monash, C. Some very, very, very large data warehouses. In NetworkWorld.com community blog, May 12, 2009; http://www.networkworld.com/community/node/41777.Google Scholar
- Monash, C. Cloudera presents the MapReduce bull case. In DBMS2.com blog, Apr. 15, 2009; http://www.dbms2.com/2009/04/15/cloudera-presents-the-mapreduce-bull-case/.Google Scholar
- Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the SIGMOD Conference. ACM Press, new York, 2008, 1099--1110. Google ScholarDigital Library
- Patterson, D.A. Technical perspective: The data center is the computer. Commun. ACM 51, 1 (Jan. 2008), 105. Google ScholarDigital Library
- Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.R., and Stonebraker, M. A comparison of approaches to large-scale data analysis. In Proceedings of the 35th SIGMOD International Conference on Management of Data. ACM Press, new York, 2009, 165--178. Google ScholarDigital Library
- Stonebraker, M. and Rowe, L. The design of Postgres. In Proceedings of the SIGMOD Conference, 1986, 340--355. Google ScholarDigital Library
- Stonebraker, M. The case for shared nothing. Data Engineering 9 (Mar. 1986), 4--9.Google Scholar
- Teradata Corp. Database Computer System Manual, Release 1.3. Los Angeles, CA, Feb. 1985.Google Scholar
- Thusoo, A. et al. Hive: A warehousing solution over a Map-Reduce framework. In Proceedings of the Conference on Very Large Databases, 2009, 1626--1629. Google ScholarDigital Library
Index Terms
- MapReduce and parallel DBMSs: friends or foes?
Recommendations
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Integrating MapReduce and RDBMSs
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative ResearchData processing needs are changing with the ever increasing amounts of both structured and unstructured data. While the processing of structured data typically relies on the well-developed field of relational database management systems (RDBMSs), ...
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics
EDBT '13: Proceedings of the 16th International Conference on Extending Database TechnologyText analytics has become increasingly important with the rapid growth of text data. Particularly, information extraction (IE), which extracts structured data from text, has received significant attention. Unfortunately, IE is often computationally ...
Comments