Abstract
MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs.
- Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D.J., Silberschatz, A., and Rasin, A. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. In Proceedings of the Conference on Very Large Databases (Lyon, France, 2009); http://db.cs.yale.edu/hadoopdb/ Google ScholarDigital Library
- Aster Data Systems, Inc. In-Database MapReduce for Rich Analytics; http://www.asterdata.com/product/mapreduce.php.Google Scholar
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R.E. Bigtable: A distributed storage system for structured data. In Proceedings of the Seventh Symposium on Operating System Design and Implementation (Seattle, WA, Nov. 6--8). Usenix Association, 2006; http://labs.google.com/papers/bigtable.html Google ScholarDigital Library
- Dean, J. and Ghemawat, S. MapReduce: Simplified data processing on large clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation (San Francisco, CA, Dec. 6--8). Usenix Association, 2004; http://labs.google.com/papers/mapreduce.html Google ScholarDigital Library
- Dewitt, D. and Stonebraker, M. MapReduce: A Major Step Backwards blogpost; http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-step-backwards/Google Scholar
- Dewitt, D. and Stonebraker, M. MapReduce II blogpost; http://databasecolumn.vertica.com/database-innovation/mapreduce-ii/Google Scholar
- Ghemawat, S., Gobioff, H., and Leung, S.-T. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (Lake George, NY, Oct. 19--22). ACM Press, New York, 2003; http://labs.google.com/papers/gfs.html Google ScholarDigital Library
- Google. Protocol Buffers: Google's Data Interchange Format. Documentation and open source release; http://code.google.com/p/protobuf/Google Scholar
- Greenplum. Greenplum MapReduce: Bringing Next-Generation Analytics Technology to the Enterprise; http://www.greenplum.com/resources/mapreduce/Google Scholar
- Hadoop. Documentation and open source release; http://hadoop.apache.org/core/Google Scholar
- Hadoop. Users list; http://wiki.apache.org/hadoop/PoweredByGoogle Scholar
- Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. Pig Latin: A not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD 2008 International Conference on Management of Data (Auckland, New Zealand, June 2008); http://hadoop.apache.org/pig/ Google ScholarDigital Library
- Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., and Stonebraker, M. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference (Providence, RI, June 29--July 2). ACM Press, New York, 2009; http://database.cs.brown.edu/projects/mapreduce-vs-dbms/ Google ScholarDigital Library
- Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal, Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure 13, 4, 227--298. http://labs.google.com/papers/sawzall.html Google ScholarDigital Library
Index Terms
- MapReduce: a flexible data processing tool
Recommendations
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on ServicesIn the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Byzantine Fault-Tolerant MapReduce: Faults are Not Just Crashes
CLOUDCOM '11: Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and ScienceMapReduce is often used to run critical jobs such as scientific data analysis. However, evidence in the literature shows that arbitrary faults do occur and can probably corrupt the results of MapReduce jobs. MapReduce runtimes like Hadoop tolerate crash ...
Comments