skip to main content
research-article

Distributed Graph Clustering and Sparsification

Authors Info & Claims
Published:02 November 2019Publication History
Skip Abstract Section

Abstract

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral techniques or convex optimisation and cannot be directly applied for clustering many networks that occur in practice, whose information is often collected on different sites. Designing a simple and distributed clustering algorithm is of great interest and has comprehensive applications for processing big datasets.

In this article, we present a simple and distributed algorithm for graph clustering: For a wide class of graphs that are characterised by a strong cluster-structure, our algorithm finishes in a poly-logarithmic number of rounds and recovers a partition of the graph close to optimal. One of the main procedures behind our algorithm is a sampling scheme that, given a dense graph as input, produces a sparse subgraph that provably preserves the cluster-structure of the input. Compared with previous sparsification algorithms that require Laplacian solvers or involve combinatorial constructions, this procedure is easy to implement in a distributed setting and runs fast in practice.

References

  1. Zeyuan Allen-Zhu, Silvio Lattanzi, and Vahab S. Mirrokni. 2013. A local algorithm for finding well-connected clusters. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 396--404.Google ScholarGoogle Scholar
  2. Joshua Batson, Daniel A. Spielman, and Nikhil Srivastava. 2012. Twice-Ramanujan sparsifiers. SIAM J. Comput. 41, 6 (2012), 1704--1721.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luca Becchetti, Andrea Clementi, Emanuele Natale, Francesco Pasquale, and Luca Trevisan. 2017. Find your place: Simple distributed algorithms for community detection. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’17). 940--959.Google ScholarGoogle Scholar
  4. Luca Becchetti, Andrea E. F. Clementi, Pasin Manurangsi, Emanuele Natale, Francesco Pasquale, Prasad Raghavendra, and Luca Trevisan. 2018. Average whenever you meet: Opportunistic protocols for community detection. In Proceedings of the 26th Annual European Symposium on Algorithms (ESA’18). 7:1--7:13.Google ScholarGoogle Scholar
  5. András A. Benczúr and David R. Karger. 1996. Approximating s−t minimum cuts in Õ(n2) time. In Proceedings of the 28 Annual ACM Symposium on Theory of Computing (STOC’96). 47--55.Google ScholarGoogle Scholar
  6. Manuel Blum, Richard M. Karp, Oliver Vornberger, Christos H. Papadimitriou, and Mihalis Yannakakis. 1981. The complexity of testing whether a graph is a superconcentrator. Inf. Process. Lett. 13, 4/5 (1981), 164--167.Google ScholarGoogle ScholarCross RefCross Ref
  7. Jiecao Chen, He Sun, David P. Woodruff, and Qin Zhang. 2016. Communication-optimal distributed clustering. In Proceedings of the 29th Advances in Neural Information Processing Systems (NIPS’16). 3720--3728.Google ScholarGoogle Scholar
  8. Fan Chung and Linyuan Lu. 2006. Concentration inequalities and martingale inequalities: A survey. Internet Math. 3, 1 (2006), 79--127.Google ScholarGoogle ScholarCross RefCross Ref
  9. Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486, 3 (2010), 75--174.Google ScholarGoogle ScholarCross RefCross Ref
  10. Shayan Oveis Gharan and Luca Trevisan. 2012. Approximating the expansion profile and almost optimal local graph clustering. In Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’12). 187--196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Pan Hui, Eiko Yoneki, Shu Yan Chan, and Jon Crowcroft. 2007. Distributed community detection in delay tolerant networks. In Proceedings of the 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Kempe and Frank McSherry. 2004. A decentralized algorithm for spectral analysis. In Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC’04). 561--568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. James R. Lee, Shayan Oveis Gharan, and Luca Trevisan. 2014. Multiway spectral partitioning and higher-order Cheeger inequalities. J. ACM 61, 6 (2014), 37:1--37:30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yin Tat Lee and He Sun. 2015. Constructing linear-sized spectral sparsification in almost-linear time. In Proceedings of the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS’15). 250--269.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yin Tat Lee and He Sun. 2017. An SDP-based algorithm for linear-sized spectral sparsification. In Proceedings of the 49th Annual ACM Symposium on Theory of Computing (STOC’17).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th Advances in Neural Information Processing Systems (NIPS’01). 849--856.Google ScholarGoogle Scholar
  17. Shayan Oveis Gharan and Luca Trevisan. 2014. Partitioning into expanders. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’14). 1256--1266.Google ScholarGoogle ScholarCross RefCross Ref
  18. Richard Peng, He Sun, and Luca Zanetti. 2015. Partitioning well-clustered graphs: Spectral clustering works! In Proceedings of the 28th Conference on Learning Theory (COLT’15). 1423--1455.Google ScholarGoogle Scholar
  19. Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Daniel A. Spielman and Nikhil Srivastava. 2011. Graph sparsification by effective resistances. SIAM J. Comput. 40, 6 (2011), 1913--1926.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Daniel A. Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput. 42, 1 (2013), 1--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Daniel A Spielman and Shang-Hua Teng. 2011. Spectral sparsification of graphs. SIAM J. Comput. 40, 4 (2011), 981--1025.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Joel A. Tropp. 2012. User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12, 4 (2012), 389--434.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Stat. Comput. 17, 4 (2007), 395--416.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wenzhuo Yang and Huan Xu. 2015. A divide and conquer framework for distributed graph clustering. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 504--513.Google ScholarGoogle Scholar

Index Terms

  1. Distributed Graph Clustering and Sparsification

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Parallel Computing
          ACM Transactions on Parallel Computing  Volume 6, Issue 3
          Special Issue on SPAA 2017
          September 2019
          185 pages
          ISSN:2329-4949
          EISSN:2329-4957
          DOI:10.1145/3366783
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 November 2019
          • Accepted: 1 August 2019
          • Revised: 1 May 2019
          • Received: 1 October 2017
          Published in topc Volume 6, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format