ABSTRACT
GPS (for Graph Processing System) is a complete open-source system we developed for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. This paper serves the dual role of describing the GPS system, and presenting techniques and experimental results for graph partitioning in distributed graph-processing systems like GPS. GPS is similar to Google's proprietary Pregel system, with three new features: (1) an extended API to make global computations more easily expressed and more efficient; (2) a dynamic repartitioning scheme that reassigns vertices to different workers during the computation, based on messaging patterns; and (3) an optimization that distributes adjacency lists of high-degree vertices across all compute nodes to improve performance. In addition to presenting the implementation of GPS and its novel features, we also present experimental results on the performance effects of both static and dynamic graph partitioning schemes, and we describe the compilation of a high-level domain-specific programming language to GPS, enabling easy expression of complex algorithms.
- Apache Incubator Giraph. http://incubator.apache.org/giraph/.Google Scholar
- Apache Hadoop. http://hadoop.apache.org/.Google Scholar
- Apache Mahout. http://mahout.apache.org/.Google Scholar
- Apache MINA. http://mina.apache.org/.Google Scholar
- P. Boldi, B. Codenotti, M. Santini, and S. Vigna. UbiCrawler: A Scalable Fully Distributed Web Crawler. Software: Practice And Experience, 34(8):711--726, 2004. Google ScholarDigital Library
- P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In WWW, 2011. Google ScholarDigital Library
- P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, 2004. Google ScholarDigital Library
- U. Brandes. A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology, 25(2):163--177, 2001.Google ScholarCross Ref
- S. Brin and L. Page. The Anatomy of Large-Scale Hypertextual Web Search Engine. In WWW, 1998. Google ScholarDigital Library
- Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. HaLoop: Efficient Iterative Data Processing on Large Clusters. In VLDB, 2010. Google ScholarDigital Library
- A. Buluç and J. R. Gilbert. The Combinatorial BLAS: Design, Implementation, and Applications. International Journal of High Performance Computing Applications, 25(4):496--509, 2011. Google ScholarDigital Library
- A. Cevahir, C. Aykanat, A. Turk, and B. B. Cambazoglu. Site-based Partitioning and Repartitioning Techniques for Parallel PageRank Computation. IEEE Transactions on Parallel Distributed Systems, 22(5):786--802, 2011. Google ScholarDigital Library
- R. Chen, X. Weng, B. He, and M. Yang. Large Graph Processing in the Cloud. In SIGMOD, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
- J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A Runtime for Iterative MapReduce. In HPDC, 2010. Google ScholarDigital Library
- GoldenOrb. http://www.raveldata.com/goldenorb/.Google Scholar
- D. Gregor and A. Lumsdaine. The Parallel BGL: A Generic Library for Distributed Graph Computations. In POOSC, 2005.Google Scholar
- Hadoop Distributed File System. http://hadoop.apache.org/hdfs/.Google Scholar
- S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for Easy and Efficient Graph Analysis. In ASPLOS, 2012. Google ScholarDigital Library
- S. Hong, S. Salihoglu, J. Widom, and K. Olukotun. Compiling Green-Marl into GPS, Technical Report, Stanford University, October, 2012. http://ppl.stanford.edu/papers/tr_gm_gps.pdf.Google Scholar
- J. Huang, D. J. Abadi, and K. Ren. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.Google ScholarDigital Library
- U. Kang, C. E. Tsourakakis, and C. Faloutsos. PEGASUS: A peta-scale graph mining system -- Implementation and Observations. In ICDM, 2009. Google ScholarDigital Library
- The Laboratory for Web Algorithmics. http://law.dsi.unimi.it/datasets.php.Google Scholar
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A New Framework for Parallel Machine Learning. In UAI, 2010.Google ScholarDigital Library
- A. Lugowski, D. Alber, A. Buluç, J. R. Gilbert, S. Reinhardt, Y. Teng, and A. Waranis. A Flexible Open-source Toolbox for Scalable Complex Graph Analysis. In SIAM Conference on Data Mining, 2012.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-Scale Graph Processing. In SIGMOD, 2011. Google ScholarDigital Library
- METIS Graph Partition Library. http://exoplanet.eu/catalog.php.Google Scholar
- MPICH2. http://www.mcs.anl.gov/research/projects/mpich2/.Google Scholar
- Open MPI. http://www.open-mpi.org/.Google Scholar
- Phoebus. https://github.com/xslogic/phoebus.Google Scholar
- RDF Primer. W3C Recommendation. http://www.w3.org/TR/rdf-primer.Google Scholar
- SPARQL Query Language for RDF. W3C Working Draft 4. http://www.w3.org/TR/rdf-sparql-query/.Google Scholar
- I. Stanton and G. Kliot. Streaming Graph Partitioning for Large Distributed Graphs. In KDD, 2011. Google ScholarDigital Library
- P. Stutz, A. Bernstein, and W. W. Cohen. Signal/Collect: Graph Algorithms for the (Semantic) Web. In ISWC, 2010. Google ScholarDigital Library
- Trinity. http://http://research.microsoft.com/en-us/projects/trinity/default.aspx.Google Scholar
- L. G. Valiant. "A Bridging Model for Parallel Computation". Communications of the ACM, 33(8):103--111, 1990. Google ScholarDigital Library
- S. Yang, X. Yan, B. Zong, and A. Khan. Towards Effective Partition Management for Large Graphs. In SIGMOD, 2012. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing With Working Sets. In HotCloud, 2010. Google ScholarDigital Library
- Y. Zhang, Q. Gao, L. Gao, and C. Wang. "iMapreduce: A Distributed Computing Framework for Iterative Computation". DataCloud, 2011. Google ScholarDigital Library
Recommendations
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes
An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs
A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Comments