skip to main content
10.1145/2484838.2484843acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
research-article

GPS: a graph processing system

Published:29 July 2013Publication History

ABSTRACT

GPS (for Graph Processing System) is a complete open-source system we developed for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. This paper serves the dual role of describing the GPS system, and presenting techniques and experimental results for graph partitioning in distributed graph-processing systems like GPS. GPS is similar to Google's proprietary Pregel system, with three new features: (1) an extended API to make global computations more easily expressed and more efficient; (2) a dynamic repartitioning scheme that reassigns vertices to different workers during the computation, based on messaging patterns; and (3) an optimization that distributes adjacency lists of high-degree vertices across all compute nodes to improve performance. In addition to presenting the implementation of GPS and its novel features, we also present experimental results on the performance effects of both static and dynamic graph partitioning schemes, and we describe the compilation of a high-level domain-specific programming language to GPS, enabling easy expression of complex algorithms.

References

  1. Apache Incubator Giraph. http://incubator.apache.org/giraph/.Google ScholarGoogle Scholar
  2. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  3. Apache Mahout. http://mahout.apache.org/.Google ScholarGoogle Scholar
  4. Apache MINA. http://mina.apache.org/.Google ScholarGoogle Scholar
  5. P. Boldi, B. Codenotti, M. Santini, and S. Vigna. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software: Practice And Experience, 34(8):711--726, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. U. Brandes. A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology, 25(2):163--177, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Brin and L. Page. The Anatomy of Large-Scale Hypertextual Web Search Engine. In WWW, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient Iterative Data Processing on Large Clusters. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Buluç and J. R. Gilbert. The Combinatorial BLAS: Design, Implementation, and Applications. International Journal of High Performance Computing Applications, 25(4):496--509, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Cevahir, C. Aykanat, A. Turk, and B. B. Cambazoglu. Site-based Partitioning and Repartitioning Techniques for Parallel PageRank Computation. IEEE Transactions on Parallel Distributed Systems, 22(5):786--802, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Chen, X. Weng, B. He, and M. Yang. Large Graph Processing in the Cloud. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A Runtime for Iterative MapReduce. In HPDC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. GoldenOrb. http://www.raveldata.com/goldenorb/.Google ScholarGoogle Scholar
  17. D. Gregor and A. Lumsdaine. The Parallel BGL: A Generic Library for Distributed Graph Computations. In POOSC, 2005.Google ScholarGoogle Scholar
  18. Hadoop Distributed File System. http://hadoop.apache.org/hdfs/.Google ScholarGoogle Scholar
  19. S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for Easy and Efficient Graph Analysis. In ASPLOS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Hong, S. Salihoglu, J. Widom, and K. Olukotun. Compiling Green-Marl into GPS, Technical Report, Stanford University, October, 2012. http://ppl.stanford.edu/papers/tr_gm_gps.pdf.Google ScholarGoogle Scholar
  21. J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. U. Kang, C. E. Tsourakakis, and C. Faloutsos. PEGASUS: A peta-scale graph mining system -- Implementation and Observations. In ICDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. The Laboratory for Web Algorithmics. http://law.dsi.unimi.it/datasets.php.Google ScholarGoogle Scholar
  24. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A New Framework for Parallel Machine Learning. In UAI, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Lugowski, D. Alber, A. Buluç, J. R. Gilbert, S. Reinhardt, Y. Teng, and A. Waranis. A Flexible Open-source Toolbox for Scalable Complex Graph Analysis. In SIAM Conference on Data Mining, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  26. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. METIS Graph Partition Library. http://exoplanet.eu/catalog.php.Google ScholarGoogle Scholar
  28. MPICH2. http://www.mcs.anl.gov/research/projects/mpich2/.Google ScholarGoogle Scholar
  29. Open MPI. http://www.open-mpi.org/.Google ScholarGoogle Scholar
  30. Phoebus. https://github.com/xslogic/phoebus.Google ScholarGoogle Scholar
  31. RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer.Google ScholarGoogle Scholar
  32. SPARQL Query Language for RDF. W3C Working Draft 4. http://www.w3.org/TR/rdf-sparql-query/.Google ScholarGoogle Scholar
  33. I. Stanton and G. Kliot. Streaming Graph Partitioning for Large Distributed Graphs. In KDD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Stutz, A. Bernstein, and W. W. Cohen. Signal/Collect: Graph Algorithms for the (Semantic) Web. In ISWC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Trinity. http://http://research.microsoft.com/en-us/projects/trinity/default.aspx.Google ScholarGoogle Scholar
  36. L. G. Valiant. "A Bridging Model for Parallel Computation". Communications of the ACM, 33(8):103--111, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Yang, X. Yan, B. Zong, and A. Khan. Towards Effective Partition Management for Large Graphs. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing With Working Sets. In HotCloud, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Zhang, Q. Gao, L. Gao, and C. Wang. "iMapreduce: A Distributed Computing Framework for Iterative Computation". DataCloud, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    SSDBM '13: Proceedings of the 25th International Conference on Scientific and Statistical Database Management
    July 2013
    401 pages
    ISBN:9781450319218
    DOI:10.1145/2484838

    Copyright © 2013 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 29 July 2013

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate56of146submissions,38%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader