research-article

A model of computation for MapReduce

Authors:
Howard Karloff

AT&T Labs---Research

AT&T Labs---Research
View Profile

,
Siddharth Suri

Yahoo! Research

Yahoo! Research
View Profile

,
Sergei Vassilvitskii

Yahoo! Research

Yahoo! Research
View Profile

Authors Info & Claims

SODA '10: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithmsJanuary 2010Pages 938–948

Published:17 January 2010Publication History

SODA '10: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms

Pages 938–948

ABSTRACT

In recent years the MapReduce framework has emerged as one of the most widely used parallel computing platforms for processing data on terabyte and petabyte scales. Used daily at companies such as Yahoo!, Google, Amazon, and Facebook, and adopted more recently by several universities, it allows for easy parallelization of data intensive computations over many machines. One key feature of MapReduce that differentiates it from previous models of parallel computation is that it interleaves sequential and parallel computation. We propose a model of efficient computation using the MapReduce paradigm. Since MapReduce is designed for computations over massive data sets, our model limits the number of machines and the memory per machine to be substantially sublinear in the size of the input. On the other hand, we place very loose restrictions on the computational power of of any individual machine---our model allows each machine to perform sequential computations in time polynomial in the size of the original input.

We compare MapReduce to the PRAM model of computation. We prove a simulation lemma showing that a large class of PRAM algorithms can be efficiently simulated via MapReduce. The strength of MapReduce, however, lies in the fact that it uses both sequential and parallel computation. We demonstrate how algorithms can take advantage of this fact to compute an MST of a dense graph in only two rounds, as opposed to Ω(log(n)) rounds needed in the standard PRAM model. We show how to evaluate a wide class of functions using the MapReduce framework. We conclude by applying this result to show how to compute some basic algorithmic problems such as undirected s-t connectivity in the MapReduce framework.

References

References are not available

A model of computation for MapReduce
1. Computing methodologies
  1. Symbolic and algebraic manipulation
2. Theory of computation

Recommendations

MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on Services

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Read More
Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce
DaWaK'12: Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery

The computation of multidimensional OLAP(On-Line Analytical Processing) data cube takes much time, because a data cube with D dimensions consists of 2^D cuboids. To build ROLAP(Relational OLAP) data cubes efficiently, existing algorithms (e.g., GBLP, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SODA '10: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms
January 2010
1690 pages
ISBN:9780898716986
Program Chair:
Moses Charikar
Princeton University
Sponsors
In-Cooperation
Publisher
Society for Industrial and Applied Mathematics
United States
Publication History
- Published: 17 January 2010
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SODA '10 Paper Acceptance Rate135of445submissions,30%Overall Acceptance Rate411of1,322submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 158
  Total Citations
  View Citations
- 1,644
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A model of computation for MapReduce

SODA '10: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms

ABSTRACT

References

Cited By

Recommendations

MapReduce: Review and open challenges

Challenges for MapReduce in Big Data

Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce