research-article

PrIter: a distributed framework for prioritized iterative computations

Authors:
Yanfeng Zhang

Northeastern University, China and University of Massachusetts Amherst

Northeastern University, China and University of Massachusetts Amherst
View Profile

,
Qixin Gao

Northeastern University, China

Northeastern University, China
View Profile

,
Lixin Gao

University of Massachusetts Amherst

University of Massachusetts Amherst
View Profile

,
Cuirong Wang

Northeastern University, China

Northeastern University, China
View Profile

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud ComputingOctober 2011Article No.: 13Pages 1–14https://doi.org/10.1145/2038916.2038929

Published:26 October 2011Publication History

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

Pages 1–14

ABSTRACT

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud applications typically involve data sets of massive scale. Fast convergence of the iterative computation on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations and propose a distributed computing framework, PrIter, which enables fast iterative computation by providing the support of prioritized iteration. Instead of performing computations on all data records without discrimination, PrIter prioritizes the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms.

References

Amazon ec2. http://aws.amazon.com/ec2/.Google Scholar
Hadoop. http://hadoop.apache.org/.Google Scholar
Priter project. http://code.google.com/p/priter/.Google Scholar
Stanford dataset. http://snap.stanford.edu/data/.Google Scholar
S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW '08, pages 895--904, 2008. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW '98, pages 107--117, 1998. Google ScholarDigital Library
Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. In VLDB '10, 2010. Google ScholarDigital Library
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation -- Volume 7, OSDI '06, pages 15--15, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarDigital Library
Chu, Cheng T., Kim, Sang K., Lin, Yi A., Yu, Yuanyuan, Bradski, Gary R., Ng, Andrew Y., and Olukotun, Kunle. Map-Reduce for Machine Learning on Multicore. In NIPS, pages 281--288, 2006.Google Scholar
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04, pages 10--10, 2004. Google ScholarDigital Library
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In MapReduce '10, pages 810--818, 2010. Google ScholarDigital Library
B. He, M. Yang, Z, Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: batched stream processing for data intensive distributed computing. In SoCC '10, pages 63--74, 2010. Google ScholarDigital Library
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07, pages 59--72. Google ScholarDigital Library
U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM '09, pages 229--238, 2009. Google ScholarDigital Library
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 1953.Google ScholarCross Ref
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58(7):1019--1031, 2007. Google ScholarDigital Library
D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In SoCC '10, pages 51--62, 2010. Google ScholarDigital Library
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. CoRR, abs/1006.4990, 2010.Google Scholar
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10, pages 135--146, 2010. Google ScholarDigital Library
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In NSDI'11, 2011. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110, 2008. Google ScholarDigital Library
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD '09, pages 165--178, 2009. Google ScholarDigital Library
D. Peng and F. Dabe. Large-scale incremental processing using distributed transactions and notifications. In OSDI '10: Proceedings of the 9th conference on Symposium on Opearting Systems Design and Implementation, pages 1--15, 2010. Google ScholarDigital Library
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI'10, 2010. Google ScholarDigital Library
N. Slonim, N. Friedman, and N. Tishby. Unsupervised document classification using sequential information maximization. In SIGIR '02, pages 129--136, 2002. Google ScholarDigital Library
H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In IMC '09, pages 322--335, 2009. Google ScholarDigital Library
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. In VLDB '09, pages 1626--1629, 2009. Google ScholarDigital Library
C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys '09, pages 205--218, 2009. Google ScholarDigital Library
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In OSDI '08, pages 1--14, 2008. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud'10, pages 10--10, 2010. Google ScholarDigital Library
M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI '08, pages 29--42, 2008. Google ScholarDigital Library
Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. In DataCloud '11, 2011. Google ScholarDigital Library
T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and Y.-C. Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10):4511--4515, March 2010.Google ScholarCross Ref

Index Terms

PrIter: a distributed framework for prioritized iterative computations
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability
      1. Distributed retrieval
      2. Peer-to-peer retrieval
  2. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

PrIter: A Distributed Framework for Prioritizing Iterative Computations

Iterative computations are pervasive among data analysis applications, including web search, online social network analysis, recommendation systems, and so on. These applications typically involve data sets of massive scale. Fast convergence of the ...
Read More
Set-valued mixed quasi-variational inequalities and implicit resolvent equations

In this paper, we introduce and study a new class of variational inequalities, which is called the set-valued mixed quasi-variational inequality. The resolvent operator technique is used to establish the equivalence among generalized set-valued mixed ...
Read More
Primal-dual row-action method for convex programming

We present a primal-dual row-action method for the minimization of a convex function subject to general convex constraints. Constraints are used one at a time, no changes are made in the constraint functions and their Jacobian matrix (thus, the row-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
October 2011
377 pages
ISBN:9781450309769
DOI:10.1145/2038916
Program Chairs:
Jeffrey S. Chase
Duke University
,
Amr El Abbadi
Univ of California, Santa Barbara
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MapReduce
PrIter
distributed framework
iterative algorithms
prioritized iteration
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 81
  Total Citations
  View Citations
- 878
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PrIter: a distributed framework for prioritized iterative computations

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

PrIter: A Distributed Framework for Prioritizing Iterative Computations

Set-valued mixed quasi-variational inequalities and implicit resolvent equations

Primal-dual row-action method for convex programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PrIter: a distributed framework for prioritized iterative computations

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

PrIter: A Distributed Framework for Prioritizing Iterative Computations

Set-valued mixed quasi-variational inequalities and implicit resolvent equations

Primal-dual row-action method for convex programming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media