research-article

Distributed Graph Clustering and Sparsification

Authors:
He Sun

University of Edinburgh, Edinburgh, United Kingdom

University of Edinburgh, Edinburgh, United Kingdom
View Profile

,
Luca Zanetti

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

Authors Info & Claims

ACM Transactions on Parallel Computing Volume 6 Issue 3Article No.: 17pp 1–23https://doi.org/10.1145/3364208

Published:02 November 2019Publication History

ACM Transactions on Parallel Computing

Abstract

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest and has comprehensive applications for processing big datasets.

In this article, we present a simple and distributed algorithm for graph clustering: For a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds and recovers a partition of the graph close to optimal. One of the main procedures behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the cluster-structure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this procedure is easy to implement in a distributed setting and runs fast in practice.

References

Zeyuan Allen-Zhu, Silvio Lattanzi, and Vahab S. Mirrokni. 2013. A local algorithm for finding well-connected clusters. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 396--404.Google Scholar
Joshua Batson, Daniel A. Spielman, and Nikhil Srivastava. 2012. Twice-Ramanujan sparsifiers. SIAM J. Comput. 41, 6 (2012), 1704--1721.Google ScholarDigital Library
Luca Becchetti, Andrea Clementi, Emanuele Natale, Francesco Pasquale, and Luca Trevisan. 2017. Find your place: Simple distributed algorithms for community detection. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’17). 940--959.Google Scholar
Luca Becchetti, Andrea E. F. Clementi, Pasin Manurangsi, Emanuele Natale, Francesco Pasquale, Prasad Raghavendra, and Luca Trevisan. 2018. Average whenever you meet: Opportunistic protocols for community detection. In Proceedings of the 26th Annual European Symposium on Algorithms (ESA’18). 7:1--7:13.Google Scholar
András A. Benczúr and David R. Karger. 1996. Approximating s−t minimum cuts in Õ(n²) time. In Proceedings of the 28 Annual ACM Symposium on Theory of Computing (STOC’96). 47--55.Google Scholar
Manuel Blum, Richard M. Karp, Oliver Vornberger, Christos H. Papadimitriou, and Mihalis Yannakakis. 1981. The complexity of testing whether a graph is a superconcentrator. Inf. Process. Lett. 13, 4/5 (1981), 164--167.Google ScholarCross Ref
Jiecao Chen, He Sun, David P. Woodruff, and Qin Zhang. 2016. Communication-optimal distributed clustering. In Proceedings of the 29th Advances in Neural Information Processing Systems (NIPS’16). 3720--3728.Google Scholar
Fan Chung and Linyuan Lu. 2006. Concentration inequalities and martingale inequalities: A survey. Internet Math. 3, 1 (2006), 79--127.Google ScholarCross Ref
Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486, 3 (2010), 75--174.Google ScholarCross Ref
Shayan Oveis Gharan and Luca Trevisan. 2012. Approximating the expansion profile and almost optimal local graph clustering. In Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’12). 187--196.Google ScholarDigital Library
Pan Hui, Eiko Yoneki, Shu Yan Chan, and Jon Crowcroft. 2007. Distributed community detection in delay tolerant networks. In Proceedings of the 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture.Google ScholarDigital Library
David Kempe and Frank McSherry. 2004. A decentralized algorithm for spectral analysis. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC’04). 561--568.Google ScholarDigital Library
James R. Lee, Shayan Oveis Gharan, and Luca Trevisan. 2014. Multiway spectral partitioning and higher-order Cheeger inequalities. J. ACM 61, 6 (2014), 37:1--37:30.Google ScholarDigital Library
Yin Tat Lee and He Sun. 2015. Constructing linear-sized spectral sparsification in almost-linear time. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS’15). 250--269.Google ScholarDigital Library
Yin Tat Lee and He Sun. 2017. An SDP-based algorithm for linear-sized spectral sparsification. In Proceedings of the 49th Annual ACM Symposium on Theory of Computing (STOC’17).Google ScholarDigital Library
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th Advances in Neural Information Processing Systems (NIPS’01). 849--856.Google Scholar
Shayan Oveis Gharan and Luca Trevisan. 2014. Partitioning into expanders. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 1256--1266.Google ScholarCross Ref
Richard Peng, He Sun, and Luca Zanetti. 2015. Partitioning well-clustered graphs: Spectral clustering works&excl; In Proceedings of the 28th Conference on Learning Theory (COLT’15). 1423--1455.Google Scholar
Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905.Google ScholarDigital Library
Daniel A. Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913--1926.Google ScholarDigital Library
Daniel A. Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput. 42, 1 (2013), 1--26.Google ScholarDigital Library
Daniel A Spielman and Shang-Hua Teng. 2011. Spectral sparsification of graphs. SIAM J. Comput. 40, 4 (2011), 981--1025.Google ScholarDigital Library
Joel A. Tropp. 2012. User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12, 4 (2012), 389--434.Google ScholarCross Ref
Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 4 (2007), 395--416.Google ScholarDigital Library
Wenzhuo Yang and Huan Xu. 2015. A divide and conquer framework for distributed graph clustering. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 504--513.Google Scholar

Index Terms

Distributed Graph Clustering and Sparsification
1. Theory of computation
  1. Design and analysis of algorithms
    1. Distributed algorithms
    2. Graph algorithms analysis
  2. Randomness, geometry and discrete structures
    1. Random walks and Markov chains

Recommendations

Local graph sparsification for scalable clustering
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges ...
Read More
gSparsify: Graph Motif Based Sparsification for Graph Clustering
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Graph clustering is a fundamental problem that partitions vertices of a graph into clusters with an objective to optimize the intuitive notions of intra-cluster density and intercluster sparsity. In many real-world applications, however, the sheer sizes ...
Read More
Parallel Edge Contraction for Large Nonplanar Graph Clustering
BDIOT '18: Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things

With the flowering of graph mining and computation technology, the field of graph clustering has become popular. Particularly, to cluster increasingly massive data represented as graph become common today. There have been many research efforts on graph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Parallel Computing Volume 6, Issue 3
Special Issue on SPAA 2017
September 2019
185 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3366783
Editor:
David A. Bader
Georgia Institute of Technology, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 November 2019
- Accepted: 1 August 2019
- Revised: 1 May 2019
- Received: 1 October 2017
Published in topc Volume 6, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Graph clustering
distributed computing
graph sparsification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 386
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Distributed Graph Clustering and Sparsification

ACM Transactions on Parallel Computing

Abstract

References

Cited By

Index Terms

Recommendations

Local graph sparsification for scalable clustering

gSparsify: Graph Motif Based Sparsification for Graph Clustering

Parallel Edge Contraction for Large Nonplanar Graph Clustering