skip to main content
research-article
Public Access

Almost Optimal Local Graph Clustering Using Evolving Sets

Published:04 May 2016Publication History
Skip Abstract Section

Abstract

Spectral partitioning is a simple, nearly linear time algorithm to find sparse cuts, and the Cheeger inequalities provide a worst-case guarantee for the quality of the approximation found by the algorithm. A local graph partitioning algorithm finds a set of vertices with small conductance (i.e., a sparse cut) by adaptively exploring part of a large graph G, starting from a specified vertex. For the algorithm to be local, its complexity must be bounded in terms of the size of the set that it outputs, with at most a weak dependence on the number n of vertices in G. Previous local partitioning algorithms find sparse cuts using random walks and personalized PageRank [Spielman and Teng 2013; Andersen et al. 2006].

In this article, we introduce a simple randomized local partitioning algorithm that finds a sparse cut by simulating the volume-biased evolving set process, which is a Markov chain on sets of vertices. We prove that for any ϵ > 0, and any set of vertices A that has conductance at most φ, for at least half of the starting vertices in A our algorithm will output (with constant probability) a set of conductance O(√φ /ϵ). We prove that for a given run of the algorithm, the expected ratio between its computational complexity and the volume of the set that it outputs is vol(A)ϵφ-1/2polylog(n), where vol(A) = ΣvAd(v) is the volume of the set A. This gives an algorithm with the same guarantee (up to a constant factor) as the Cheeger's inequality that runs in time slightly superlinear in the size of the output. This is the first sublinear (in the size of the input) time algorithm with almost the same guarantee as the Cheeger's inequality. In comparison, the best previous local partitioning algorithm, by Andersen et al. [2006], has a worse approximation guarantee of O(√φ log n) and a larger ratio of φ-1 polylog(n) between the complexity and output volume.

As a by-product of our results, we prove a bicriteria approximation algorithm for the expansion profile of any graph. For 0 < k ≤ vol(V)/2, let φ(k) : min S: vol(S) ≤ kφ(S). There is a polynomial time algorithm that, for any k, ϵ > 0, finds a set S of volume vol(S) ≤ O(k1 + ϵ) and expansion φ(S)≤ O(√φ (k)/ϵ). As a new technical tool, we show that for any set S of vertices of a graph, a lazy t-step random walk started from a randomly chosen vertex of S will remain entirely inside S with probability at least (1 - φ(S)/2)t. This itself provides a new lower bound to the uniform mixing time of any finite state reversible Markov chain.

References

  1. N Alon. 1986. Eigen values and expanders. Combinatorica 6 (January 1986), 83--96. Issue 2. http://portal.acm.org/citation.cfm?id&equals;18497.18498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon and V. Milman. 1985. Isoperimetric inequalities for graphs, and superconcentrators. J. Combin. Theor. Ser. B 38, 1 (Feb 1985), 73--88.Google ScholarGoogle ScholarCross RefCross Ref
  3. Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. 2006. Local graph partitioning using pagerank vectors. In FOCS. 475--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Reid Andersen and Kevin J. Lang. 2006. Communities from seed sets. In WWW. ACM, New York, NY, 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Reid Andersen and Yuval Peres. 2009. Finding sparse cuts locally using evolving sets. In STOC. 235--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sanjeev Arora, Boaz Barak, and David Steurer. 2010a. Subexponential algorithms for unique games and related problems. In FOCS. IEEE Computer Society, Washington, DC, 563--572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sanjeev Arora, Elad Hazan, and Satyen Kale. 2010b. O(&sqrt;log n) Approximation to sparsest cut in Õ(n<sup>2</sup>) time. SIAM J. Comput. 39, 5 (2010), 1748--1771.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nikhil Bansal, Uriel Feige, Robert Krauthgamer, Konstantin Makarychev, Viswanath Nagarajan, Joseph Naor, and Roy Schwartz. 2011. Min-max graph partitioning and small set expansion. In FOCS. IEEE, Washington, DC, 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. R. Blakley and Prabir Roy. 1965. A hölder type inequality for symmetric matrices with nonnegative entries. In Proc. Am. Math. Soc. Vol. 16. 1244--1245.Google ScholarGoogle Scholar
  10. Siu On Chan, Tsz Chiu Kwok, and Lap Chi Lau. 2015. Random Walks and Evolving Sets: Faster Convergences and Limitations. (2015). http://arxiv.org/abs/1507.02069.Google ScholarGoogle Scholar
  11. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. 2009. Introduction to Algorithms (3rd ed.). MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Persi Diaconis and James A. Fill. 1990. Strong stationary times via a new form of duality. Ann. Probab. 18, 4 (1990), 1483--1522.Google ScholarGoogle ScholarCross RefCross Ref
  13. Gary William Flake, Steve Lawrence, and C. Lee Giles. 2000. Efficient identification of Web communities. In SIGKDD. ACM, New York, NY, 150--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Jerrum and Alistair Sinclair. 1989. Approximating the permanent. SIAM J. Comput. 18, 6 (1989), 1149--1178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ravi Kannan, Santosh Vempala, and Adrian Vetta. 2004. On clusterings: Good, bad and spectral. J. ACM 51 (2004), 497--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46 (1999), 668--677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tsz Chiu Kwok and Lap Chi Lau. 2012. Finding small sparse cuts by random walk. In APPROX-RANDOM. 615--626.Google ScholarGoogle Scholar
  18. Tsz Chiu Kwok, Lap Chi Lau, Yin Tat Lee, Shayan Oveis Gharan, and Luca Trevisan. 2013. Improved cheeger’s inequality: Analysis of spectral partitioning algorithms through higher order spectral gap. In STOC. 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In WWW. ACM, New York, NY, 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6, 1 (2009), 29--123.Google ScholarGoogle ScholarCross RefCross Ref
  21. Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In WWW. New York, NY, 631--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. 2006. Markov chains and mixing times. American Mathematical Society.Google ScholarGoogle Scholar
  23. Angsheng Li, Yicheng Pan, and Pan Peng. 2011. Testing conductance in general graphs. Electron. Colloq. Comput. Complex. (ECCC) 18 (2011), 101.Google ScholarGoogle Scholar
  24. Angsheng Li and Pan Peng. 2011. Community structures in classical network models. Internet Math. 7, 2 (2011), 81--106.Google ScholarGoogle ScholarCross RefCross Ref
  25. David London. 1966. Inequalities in quadratic forms. Duke Math. J. 33, 3 (1966), 511--522.Google ScholarGoogle ScholarCross RefCross Ref
  26. László Lovász and Ravi Kannan. 1999. Faster mixing via average conductance. In STOC. ACM, 282--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ravi Montenegro. 2007. Sharp edge, vertex, and mixed cheeger inequalities for finite markov kernels. Electronic Communications in Probability {electronic only} 12 (2007), 377--389.Google ScholarGoogle Scholar
  28. Ravi Montenegro. 2009. The simple random walk and max-degree walk on a directed graph. Rand. Struct. Algorithm. 34, 3 (May 2009), 395--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ben Morris and Yuval Peres. 2003. Evolving sets and mixin. In STOC. 279--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. P. Mulholland and C. A. B. Smith. 1959. An inequality arising in genetical theory. Am. Math. Month. 66 (1959), 673--683.Google ScholarGoogle ScholarCross RefCross Ref
  31. Ryan O’Donnell and David Witmer. 2012. Improved small-set expansion from higher eigenvalues. CoRR abs/1204.4688 (2012).Google ScholarGoogle Scholar
  32. Lorenzo Orecchia, Sushant Sachdeva, and Nisheeth K. Vishnoi. 2012. Approximating the exponential, the lanczos method and an Õ(m)-time spectral algorithm for balanced separator. In STOC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lorenzo Orecchia and Nisheeth K. Vishnoi. 2011. Towards an sdp-based approach to spectral methods: A nearly-linear-time algorithm for graph partitioning and decomposition. In SODA. 532--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shayan Oveis Gharan and Luca Trevisan. 2012. Approximating the expansion profile and almost optimal local graph clustering. In FOCS. 187--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Prasad Raghavendra and David Steurer. 2010. Graph expansion and the unique games conjecture. In STOC. ACM, New York, NY, 755--764. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Prasad Raghavendra, David Steurer, and Prasad Tetali. 2010. Approximations for the isoperimetric and spectral profile of graphs and related parameters. In STOC. ACM, New York, NY, 631--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jonah Sherman. 2009. Breaking the multicommodity flow barrier for O(&sqrt;log n)-approximations to sparsest cut. In FOCS. 363--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (2000), 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Daniel A. Spielman and Shang-Hua Teng. 2004. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC. 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Daniel A. Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput. 42, 1 (2013), 1--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. David A. Tolliver and Gary L. Miller. 2006. Graph partitioning by spectral rounding: Applications in image segmentation and clustering. In CVPR. IEEE Computer Society, Washington, DC, 1053--1060. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. David Williams. 1991. Probability with Martingales (Cambridge Mathematical Textbooks). Cambridge University Press, Cambridge.Google ScholarGoogle Scholar

Index Terms

  1. Almost Optimal Local Graph Clustering Using Evolving Sets

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Journal of the ACM
          Journal of the ACM  Volume 63, Issue 2
          May 2016
          249 pages
          ISSN:0004-5411
          EISSN:1557-735X
          DOI:10.1145/2906142
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 May 2016
          • Accepted: 1 December 2015
          • Revised: 1 September 2015
          • Received: 1 February 2014
          Published in jacm Volume 63, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader