skip to main content
10.1145/2038916.2038929acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

PrIter: a distributed framework for prioritized iterative computations

Published:26 October 2011Publication History

ABSTRACT

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud applications typically involve data sets of massive scale. Fast convergence of the iterative computation on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations and propose a distributed computing framework, PrIter, which enables fast iterative computation by providing the support of prioritized iteration. Instead of performing computations on all data records without discrimination, PrIter prioritizes the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms.

References

  1. Amazon ec2. http://aws.amazon.com/ec2/.Google ScholarGoogle Scholar
  2. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  3. Priter project. http://code.google.com/p/priter/.Google ScholarGoogle Scholar
  4. Stanford dataset. http://snap.stanford.edu/data/.Google ScholarGoogle Scholar
  5. S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW '08, pages 895--904, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW '98, pages 107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. In VLDB '10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation -- Volume 7, OSDI '06, pages 15--15, Berkeley, CA, USA, 2006. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chu, Cheng T., Kim, Sang K., Lin, Yi A., Yu, Yuanyuan, Bradski, Gary R., Ng, Andrew Y., and Olukotun, Kunle. Map-Reduce for Machine Learning on Multicore. In NIPS, pages 281--288, 2006.Google ScholarGoogle Scholar
  10. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI'04, pages 10--10, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In MapReduce '10, pages 810--818, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. He, M. Yang, Z, Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: batched stream processing for data intensive distributed computing. In SoCC '10, pages 63--74, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07, pages 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM '09, pages 229--238, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Katz. A new status index derived from sociometric analysis. Psychometrika, 1953.Google ScholarGoogle ScholarCross RefCross Ref
  16. D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58(7):1019--1031, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In SoCC '10, pages 51--62, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new framework for parallel machine learning. CoRR, abs/1006.4990, 2010.Google ScholarGoogle Scholar
  19. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD '10, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In NSDI'11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD '08, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD '09, pages 165--178, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Peng and F. Dabe. Large-scale incremental processing using distributed transactions and notifications. In OSDI '10: Proceedings of the 9th conference on Symposium on Opearting Systems Design and Implementation, pages 1--15, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Slonim, N. Friedman, and N. Tishby. Unsupervised document classification using sequential information maximization. In SIGIR '02, pages 129--136, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In IMC '09, pages 322--335, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. In VLDB '09, pages 1626--1629, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys '09, pages 205--218, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In OSDI '08, pages 1--14, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In HotCloud'10, pages 10--10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. In OSDI '08, pages 29--42, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. In DataCloud '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. R. Wakeling, and Y.-C. Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of the National Academy of Sciences, 107(10):4511--4515, March 2010.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PrIter: a distributed framework for prioritized iterative computations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
          October 2011
          377 pages
          ISBN:9781450309769
          DOI:10.1145/2038916

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 October 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate169of722submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader