research-article

Local graph sparsification for scalable clustering

Authors:
Venu Satuluri

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

,
Srinivasan Parthasarathy

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

,
Yiye Ruan

The Ohio State University, Columbus, OH, USA

The Ohio State University, Columbus, OH, USA
View Profile

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataJune 2011Pages 721–732https://doi.org/10.1145/1989323.1989399

Published:12 June 2011Publication History

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Pages 721–732

ABSTRACT

In this paper we look at how to sparsify a graph i.e. how to reduce the edgeset while keeping the nodes intact, so as to enable faster graph clustering without sacrificing quality. The main idea behind our approach is to preferentially retain the edges that are likely to be part of the same cluster. We propose to rank edges using a simple similarity-based heuristic that we efficiently compute by comparing the minhash signatures of the nodes incident to the edge. For each node, we select the top few edges to be retained in the sparsified graph. Extensive empirical results on several real networks and using four state-of-the-art graph clustering and community discovery algorithms reveal that our proposed approach realizes excellent speedups (often in the range 10-50), with little or no deterioration in the quality of the resulting clusters. In fact, for at least two of the four clustering algorithms, our sparsification consistently enables higher clustering accuracies.

References

D. Achlioptas and F. McSherry. Fast computation of low rank matrix approximations. In STOC '01, pages 611--618, 2001. Google ScholarDigital Library
S. Arora, E. Hazan, and S. Kale. A fast random sampling algorithm for sparsifying matrices. APPROX-RANDOM '06, pages 272--279, 2006. Google ScholarDigital Library
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD '08, pages 16--24, 2008. Google ScholarDigital Library
A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in O(n2) time. In STOC '96, pages 47--55, 1996. Google ScholarDigital Library
T. Bohman, C. Coopery, and A. Friezez. Min-Wise independent linear permutations. the electronic journal of combinatorics, 7(R26):2, 2000.Google Scholar
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC '98, pages 327--336, 1998. Google ScholarDigital Library
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM '08, pages 95--106, 2008. Google ScholarDigital Library
A. Clauset, C. Shalizi, and M. Newman. Power-law distributions in empirical data. SIAM review, 51(4):661--703, 2009. Google ScholarDigital Library
M. Costanzo, A. Baryshnikova, J. Bellay, Y. Kim, E. Spear, C. Sevier, H. Ding, J. Koh, K. Toufighi, S. Mostafavi, et al. The genetic landscape of a cell. Science's STKE, 327(5964):425, 2010.Google Scholar
B. B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. PVLDB, 1(1):1189--1204, 2008. Google ScholarDigital Library
I. S. Dhillon, Y. Guan, and B. Kulis. Weighted Graph Cuts without Eigenvectors: A Multilevel Approach. IEEE Trans. PAMI, 29(11):1944--1957, 2007. Google ScholarDigital Library
S. Fortunato. Community detection in graphs. Phys. Reports, 486:75--174, 2010.Google ScholarCross Ref
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB '05, pages 721--732, 2005. Google ScholarDigital Library
J. Glattfelder and S. Battiston. Backbone of complex networks of corporations: The flow of control. Phys. Rev. E, 80(3):36104, 2009.Google ScholarCross Ref
R. C. K. and S. Sudarshan. Graph clustering for keyword search. In COMAD '09, 2009.Google Scholar
D. R. Karger. Random sampling in cut, flow, and network design problems. In STOC '94, pages 648--657, 1994. Google ScholarDigital Library
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359, 1999. Google ScholarDigital Library
V. Krishnamurthy, M. Faloutsos, M. Chrobak, L. Lao, J. Cui, and A. Percus. Reducing large internet topologies for faster simulations. NETWORKING '05, pages 328--341, 2005. Google ScholarDigital Library
H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW '10, pages 591--600, 2010. Google ScholarDigital Library
A. Lancichinetti, S. Fortunato, and F. Radicchi. Benchmark graphs for testing community detection algorithms. Phys. Rev. E, 78(4):46110, 2008.Google ScholarCross Ref
K. Lang and S. Rao. A flow-based method for improving the expansion or conductance of graph cuts. LNCS '04, pages 325--337, 2004.Google ScholarCross Ref
J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD '06, pages 631--636, 2006. Google ScholarDigital Library
J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Statistical properties of community structure in large social and information networks. In WWW '08, pages 695--704, 2008. Google ScholarDigital Library
S. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. The Journal of Machine Learning Research, 8:935--983, 2007. Google ScholarDigital Library
A. S. Maiya and T. Y. Berger-Wolf. Sampling community structure. In WWW '10, pages 701--710, 2010. Google ScholarDigital Library
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In IMC '07, pages 29--42, October 2007. Google ScholarDigital Library
M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113, Feb 2004.Google ScholarCross Ref
F. Provost and V. Kolluri. A survey of methods for scaling up inductive algorithms. DMKD, 3(2):131--169, 1999. Google ScholarDigital Library
S. Pu, J. Wong, B. Turner, E. Cho, and S. Wodak. Up-to-date catalogues of yeast protein complexes. Nucleic acids research, 37(3):825--831, 2009.Google Scholar
A. Ruepp et al. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Research, 36(suppl 1):D646, 2008.Google Scholar
V. Satuluri and S. Parthasarathy. Scalable graph clustering using stochastic flows: applications to community discovery. In KDD '09, pages 737--746, 2009. Google ScholarDigital Library
V. Satuluri and S. Parthasarathy. Symmetrizations for clustering directed graphs. In EDBT '11, 2011. Google ScholarDigital Library
V. Satuluri, S. Parthasarathy, and Y. Ruan. Local graph sparsification for scalable clustering. Technical Report OSU-CISRC-11/10-TR25, The Ohio State University.Google Scholar
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Trans. PAMI, 22(8), 2000. Google ScholarDigital Library
D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. In STOC '08, pages 563--568, 2008. Google ScholarDigital Library
D. A. Spielman and S. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC '04, pages 81--90, 2004. Google ScholarDigital Library
M. Tumminello, T. Aste, T. Di Matteo, and R. Mantegna. A tool for filtering information in complex systems. PNAS, 102(30):10421, 2005.Google ScholarCross Ref
S. Van Dongen. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, 2000.Google Scholar

Index Terms

Local graph sparsification for scalable clustering
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

gSparsify: Graph Motif Based Sparsification for Graph Clustering
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Graph clustering is a fundamental problem that partitions vertices of a graph into clusters with an objective to optimize the intuitive notions of intra-cluster density and intercluster sparsity. In many real-world applications, however, the sheer sizes ...
Read More
Distributed Graph Clustering and Sparsification
Special Issue on SPAA 2017

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design ...
Read More
Parallel Edge Contraction for Large Nonplanar Graph Clustering
BDIOT '18: Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things

With the flowering of graph mining and computation technology, the field of graph clustering has become popular. Particularly, to cluster increasingly massive data represented as graph become common today. There have been many research efforts on graph ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
General Chair:
Timos Sellis
IMIS/RC Athena
,
Program Chair:
Renée J. Miller
University of Toronto
,
Publications Chairs:
Anastasios Kementsietsidis
IBM T.J. Watson Research Center
,
Yannis Velegrakis
University of Trento
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph clustering
graph sparsification
minwise hashing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 120
  Total Citations
  View Citations
- 2,037
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Local graph sparsification for scalable clustering

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

gSparsify: Graph Motif Based Sparsification for Graph Clustering

Distributed Graph Clustering and Sparsification

Parallel Edge Contraction for Large Nonplanar Graph Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Local graph sparsification for scalable clustering

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

gSparsify: Graph Motif Based Sparsification for Graph Clustering

Distributed Graph Clustering and Sparsification

Parallel Edge Contraction for Large Nonplanar Graph Clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media