research-article

Free Access

MapReduce and parallel DBMSs: friends or foes?

Authors:
Michael Stonebraker

Massachusetts Institute of Technology, Cambridge, MA

Massachusetts Institute of Technology, Cambridge, MA
View Profile

,
Daniel Abadi

Yale University, New Haven, CT

Yale University, New Haven, CT
View Profile

,
David J. DeWitt

Microsoft Inc., Madison, WI

Microsoft Inc., Madison, WI
View Profile

,
Sam Madden

Massachusetts Institute of Technology, Cambridge, MA

Massachusetts Institute of Technology, Cambridge, MA
View Profile

,
Erik Paulson

University of Wisconsin-Madison, Madison, WI

University of Wisconsin-Madison, Madison, WI
View Profile

,
Andrew Pavlo

Brown University, Providence, RI

Brown University, Providence, RI
View Profile

,
Alexander Rasin

Brown University, Providence, RI

Brown University, Providence, RI
View Profile

Authors Info & Claims

Communications of the ACM Volume 53 Issue 1January 2010pp 64–71https://doi.org/10.1145/1629175.1629197

Published:01 January 2010Publication History

Communications of the ACM

Abstract

MapReduce complements DBMSs since databases are not designed for extract-transform-load tasks, a MapReduce specialty.

References

Abadi, D.J., Madden, S.R., and Hachem, N. Column-stores vs. row-stores: How different are they really? In Proceedings of the SIGMOD Conference on Management of Data. ACM Press, New York, 2008 Google ScholarDigital Library
Abadi, D.J., Marcus, A., Madden, S.R., and Hollenbach, K. Scalable semantic Web data management using vertical partitioning. In Proceedings of the 33rd International Conference on Very Large Databases, 2007 Google ScholarDigital Library
Abadi, D.J. Column-stores for wide and sparse data. In Proceedings of the Conference on Innovative Data Systems Research, 2007.Google Scholar
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J. Silberschatz, A., and Rasin, A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In Proceedings of the Conference on Very Large Databases, 2009 Google ScholarDigital Library
Boral, H. et al. Prototyping Bubba, a highly parallel database system. IEEE Transactions on Knowledge and Data Engineering 2, 1 (Mar. 1990), 4--24. Google ScholarDigital Library
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., and Zhou, J. SCOPE: Easy and efficient parallel processing of massive data sets. In Proceedings of the Conference on Very Large Databases, 2008. Google ScholarDigital Library
Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Proceedings of the Sixth Conference on Operating System Design and Implementation (Berkeley, CA, 2004). Google ScholarDigital Library
DeWitt, D.J. and Gray, J. Parallel database systems: The future of high-performance database systems. Commun. ACM 35, 6 (June 1992), 85--98. Google ScholarDigital Library
DeWitt, D.J., Gerber, R.H., Graefe, G., Heytens, M.L., Kumar, K.B., and Muralikrishna, M. GAMMA: A high-performance dataflow database machine. In Proceedings of the 12th International Conference on Very Large Databases. Morgan Kaufmann Publishers, Inc., 1986, 228--237. Google ScholarDigital Library
Englert, S., Gray, J., Kocher, T., and shah, P. A benchmark of NonStop SQL Release 2 demonstrating near-linear speedup and scaleup on large databases. Sigmetrics Performance Evaluation Review 18, 1 (1990), 1990, 245--246. Google ScholarDigital Library
Fushimi, S., Kitsuregawa, M., and Tanaka, H. An overview of the system software of a parallel relational database machine. In Proceedings of the 12th International Conference on Very Large Databases, Morgan Kaufmann Publishers, Inc., 1986, 209--219. Google ScholarDigital Library
Isard, M., Budiu, M., Yu, Y., Birrell, A., and Fetterly, D. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Operating System Review 41, 3 (2007), 59--72. Google ScholarDigital Library
Monash, C. Some very, very, very large data warehouses. In NetworkWorld.com community blog, May 12, 2009; http://www.networkworld.com/community/node/41777.Google Scholar
Monash, C. Cloudera presents the MapReduce bull case. In DBMS2.com blog, Apr. 15, 2009; http://www.dbms2.com/2009/04/15/cloudera-presents-the-mapreduce-bull-case/.Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the SIGMOD Conference. ACM Press, new York, 2008, 1099--1110. Google ScholarDigital Library
Patterson, D.A. Technical perspective: The data center is the computer. Commun. ACM 51, 1 (Jan. 2008), 105. Google ScholarDigital Library
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S.R., and Stonebraker, M. A comparison of approaches to large-scale data analysis. In Proceedings of the 35th SIGMOD International Conference on Management of Data. ACM Press, new York, 2009, 165--178. Google ScholarDigital Library
Stonebraker, M. and Rowe, L. The design of Postgres. In Proceedings of the SIGMOD Conference, 1986, 340--355. Google ScholarDigital Library
Stonebraker, M. The case for shared nothing. Data Engineering 9 (Mar. 1986), 4--9.Google Scholar
Teradata Corp. Database Computer System Manual, Release 1.3. Los Angeles, CA, Feb. 1985.Google Scholar
Thusoo, A. et al. Hive: A warehousing solution over a Map-Reduce framework. In Proceedings of the Conference on Very Large Databases, 2009, 1626--1629. Google ScholarDigital Library

Index Terms

Recommendations

MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More
Integrating MapReduce and RDBMSs
CASCON '10: Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research

Data processing needs are changing with the ever increasing amounts of both structured and unstructured data. While the processing of structured data typically relies on the well-developed field of relational database management systems (RDBMSs), ...
Read More
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics
EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology

Text analytics has become increasingly important with the rapid growth of text data. Particularly, information extraction (IE), which extracts structured data from text, has received significant attention. Unfortunately, IE is often computationally ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Communications of the ACM Volume 53, Issue 1
Amir Pnueli: Ahead of His Time
January 2010
142 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1629175
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 325
  Total Citations
  View Citations
- 59,970
  Total Downloads
- Downloads (Last 12 months)447
- Downloads (Last 6 weeks)290
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

MapReduce: Review and open challenges

Integrating MapReduce and RDBMSs

A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

MapReduce: Review and open challenges

Integrating MapReduce and RDBMSs

A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media