Abstract
Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest and has comprehensive applications for processing big datasets.
In this article, we present a simple and distributed algorithm for graph clustering: For a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds and recovers a partition of the graph close to optimal. One of the main procedures behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the cluster-structure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this procedure is easy to implement in a distributed setting and runs fast in practice.
- Zeyuan Allen-Zhu, Silvio Lattanzi, and Vahab S. Mirrokni. 2013. A local algorithm for finding well-connected clusters. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 396--404.Google Scholar
- Joshua Batson, Daniel A. Spielman, and Nikhil Srivastava. 2012. Twice-Ramanujan sparsifiers. SIAM J. Comput. 41, 6 (2012), 1704--1721.Google ScholarDigital Library
- Luca Becchetti, Andrea Clementi, Emanuele Natale, Francesco Pasquale, and Luca Trevisan. 2017. Find your place: Simple distributed algorithms for community detection. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’17). 940--959.Google Scholar
- Luca Becchetti, Andrea E. F. Clementi, Pasin Manurangsi, Emanuele Natale, Francesco Pasquale, Prasad Raghavendra, and Luca Trevisan. 2018. Average whenever you meet: Opportunistic protocols for community detection. In Proceedings of the 26th Annual European Symposium on Algorithms (ESA’18). 7:1--7:13.Google Scholar
- András A. Benczúr and David R. Karger. 1996. Approximating s−t minimum cuts in Õ(n2) time. In Proceedings of the 28 Annual ACM Symposium on Theory of Computing (STOC’96). 47--55.Google Scholar
- Manuel Blum, Richard M. Karp, Oliver Vornberger, Christos H. Papadimitriou, and Mihalis Yannakakis. 1981. The complexity of testing whether a graph is a superconcentrator. Inf. Process. Lett. 13, 4/5 (1981), 164--167.Google ScholarCross Ref
- Jiecao Chen, He Sun, David P. Woodruff, and Qin Zhang. 2016. Communication-optimal distributed clustering. In Proceedings of the 29th Advances in Neural Information Processing Systems (NIPS’16). 3720--3728.Google Scholar
- Fan Chung and Linyuan Lu. 2006. Concentration inequalities and martingale inequalities: A survey. Internet Math. 3, 1 (2006), 79--127.Google ScholarCross Ref
- Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486, 3 (2010), 75--174.Google ScholarCross Ref
- Shayan Oveis Gharan and Luca Trevisan. 2012. Approximating the expansion profile and almost optimal local graph clustering. In Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’12). 187--196.Google ScholarDigital Library
- Pan Hui, Eiko Yoneki, Shu Yan Chan, and Jon Crowcroft. 2007. Distributed community detection in delay tolerant networks. In Proceedings of the 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture.Google ScholarDigital Library
- David Kempe and Frank McSherry. 2004. A decentralized algorithm for spectral analysis. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC’04). 561--568.Google ScholarDigital Library
- James R. Lee, Shayan Oveis Gharan, and Luca Trevisan. 2014. Multiway spectral partitioning and higher-order Cheeger inequalities. J. ACM 61, 6 (2014), 37:1--37:30.Google ScholarDigital Library
- Yin Tat Lee and He Sun. 2015. Constructing linear-sized spectral sparsification in almost-linear time. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS’15). 250--269.Google ScholarDigital Library
- Yin Tat Lee and He Sun. 2017. An SDP-based algorithm for linear-sized spectral sparsification. In Proceedings of the 49th Annual ACM Symposium on Theory of Computing (STOC’17).Google ScholarDigital Library
- Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th Advances in Neural Information Processing Systems (NIPS’01). 849--856.Google Scholar
- Shayan Oveis Gharan and Luca Trevisan. 2014. Partitioning into expanders. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 1256--1266.Google ScholarCross Ref
- Richard Peng, He Sun, and Luca Zanetti. 2015. Partitioning well-clustered graphs: Spectral clustering works! In Proceedings of the 28th Conference on Learning Theory (COLT’15). 1423--1455.Google Scholar
- Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905.Google ScholarDigital Library
- Daniel A. Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913--1926.Google ScholarDigital Library
- Daniel A. Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput. 42, 1 (2013), 1--26.Google ScholarDigital Library
- Daniel A Spielman and Shang-Hua Teng. 2011. Spectral sparsification of graphs. SIAM J. Comput. 40, 4 (2011), 981--1025.Google ScholarDigital Library
- Joel A. Tropp. 2012. User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12, 4 (2012), 389--434.Google ScholarCross Ref
- Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 4 (2007), 395--416.Google ScholarDigital Library
- Wenzhuo Yang and Huan Xu. 2015. A divide and conquer framework for distributed graph clustering. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 504--513.Google Scholar
Index Terms
- Distributed Graph Clustering and Sparsification
Recommendations
Local graph sparsification for scalable clustering
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataIn this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges ...
gSparsify: Graph Motif Based Sparsification for Graph Clustering
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementGraph clustering is a fundamental problem that partitions vertices of a graph into clusters with an objective to optimize the intuitive notions of intra-cluster density and intercluster sparsity. In many real-world applications, however, the sheer sizes ...
Parallel Edge Contraction for Large Nonplanar Graph Clustering
BDIOT '18: Proceedings of the 2018 2nd International Conference on Big Data and Internet of ThingsWith the flowering of graph mining and computation technology, the field of graph clustering has become popular. Particularly, to cluster increasingly massive data represented as graph become common today. There have been many research efforts on graph ...
Comments