research-article

NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching

Authors:
Dingqi Yang

University of Fribourg, Fribourg, Switzerland

University of Fribourg, Fribourg, Switzerland
View Profile

,
Paolo Rosso

University of Fribourg, Fribourg, Switzerland

University of Fribourg, Fribourg, Switzerland
View Profile

,
Bin Li

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Philippe Cudre-Mauroux

University of Fribourg, Fribourg, Switzerland

University of Fribourg, Fribourg, Switzerland
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 1162–1172https://doi.org/10.1145/3292500.3330951

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1162–1172

ABSTRACT

Embeddings have become a key paradigm to learn graph representations and facilitate downstream graph analysis tasks. Existing graph embedding techniques either sample a large number of node pairs from a graph to learn node embeddings via stochastic optimization, or factorize a high-order proximity/adjacency matrix of the graph via expensive matrix factorization. However, these techniques usually require significant computational resources for the learning process, which hinders their applications on large-scale graphs. Moreover, the cosine similarity preserved by these techniques shows suboptimal efficiency in downstream graph analysis tasks, compared to Hamming similarity, for example. To address these issues, we propose NodeSketch, a highly-efficient graph embedding technique preserving high-order node proximity via recursive sketching. Specifically, built on top of an efficient data-independent hashing/sketching technique, NodeSketch generates node embeddings in Hamming space. For an input graph, it starts by sketching the self-loop-augmented adjacency matrix of the graph to output low-order node embeddings, and then recursively generates k-order node embeddings based on the self-loop-augmented adjacency matrix and (k-1)-order node embeddings. Our extensive evaluation compares NodeSketch against a sizable collection of state-of-the-art techniques using five real-world graphs on two graph analysis tasks. The results show that NodeSketch achieves state-of-the-art performance compared to these techniques, while showing significant speedup of 9x-372x in the embedding learning process and 1.19x-1.68x speedup when performing downstream graph analysis tasks.

References

Alex Arenas, Alberto Fernandez, and Sergio Gomez. 2008. Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, Vol. 10, 5 (2008), 053039.Google ScholarCross Ref
Andrei Z Broder. 1997. On the resemblance and containment of documents. In Proceedings of Compression and complexity of sequences 1997. IEEE, 21--29. Google ScholarDigital Library
Hongyun Cai, Vincent W Zheng, and Kevin Chang. 2018. A comprehensive survey of graph embedding: problems, techniques and applications. TKDE (2018).Google Scholar
Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. Grarep: Learning graph representations with global structural information. In CIKM'15. ACM, 891--900. Google ScholarDigital Library
Lianhua Chi, Bin Li, and Xingquan Zhu. 2014. Context-preserving hashing for fast text classification. In SDM'14. SIAM, 100--108.Google Scholar
Lianhua Chi and Xingquan Zhu. 2017. Hashing techniques: A survey and taxonomy. ACM Computing Surveys (CSUR) , Vol. 50, 1 (2017), 11. Google ScholarDigital Library
Aristides Gionis, Piotr Indyk, Rajeev Motwani, et almbox. 1999. Similarity search in high dimensions via hashing. In VLDB'99, Vol. 99. 518--529. Google ScholarDigital Library
Yunchao Gong and Svetlana Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In CVPR'11. IEEE, 817--824. Google ScholarDigital Library
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD'16. ACM, 855--864. Google ScholarDigital Library
Bernhard Haeupler, Mark Manasse, and Kunal Talwar. 2014. Consistent weighted sampling made fast, small, and easy. arXiv preprint arXiv:1410.4266 (2014).Google Scholar
Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American statistical association , Vol. 58, 301 (1963), 13--30.Google ScholarCross Ref
Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. 2018. Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings. In CIKM'18. ACM, 437--446. Google ScholarDigital Library
Sergey Ioffe. 2010. Improved consistent sampling, weighted minhash and l1 sketching. In ICDM'10. IEEE, 246--255. Google ScholarDigital Library
Qing-Yuan Jiang and Wu-Jun Li. 2015. Scalable Graph Hashing with Feature Transformation.. In IJCAI'15 . 2248--2254. Google ScholarDigital Library
Bin Li, Xingquan Zhu, Lianhua Chi, and Chengqi Zhang. 2012. Nested subtree hash kernels for large-scale graph classification over streams. In ICDM'12. IEEE, 399--408. Google ScholarDigital Library
Ping Li. 2015a. 0-bit consistent weighted sampling. In KDD'15. ACM, 665--674. Google ScholarDigital Library
Ping Li. 2015b. Min-max kernels. arXiv preprint arXiv:1503.01737 (2015).Google Scholar
Defu Lian, Kai Zheng, Vincent W Zheng, Yong Ge, Longbing Cao, Ivor W Tsang, and Xing Xie. 2018. High-order Proximity Preserving Information Network Hashing. In KDD'18. ACM, 1744--1753. Google ScholarDigital Library
Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang. 2014. Discrete graph hashing. In NIPS'14. 3419--3427. Google ScholarDigital Library
Mark Manasse, Frank McSherry, and Kunal Talwar. 2010. Consistent weighted sampling. Technical Report MSR-TR-2010--73 (2010).Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS . 3111--3119. Google ScholarDigital Library
Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric transitivity preserving graph embedding. In KDD'16. ACM, 1105--1114. Google ScholarDigital Library
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD'14. ACM, 701--710. Google ScholarDigital Library
Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM'18 . ACM, 459--467. Google ScholarDigital Library
Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In CVPR'15. 37--45.Google ScholarCross Ref
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW'15. 1067--1077. Google ScholarDigital Library
Lei Tang and Huan Liu. 2009a. Relational learning via latent social dimensions. In KDD'09. ACM, 817--826. Google ScholarDigital Library
Lei Tang and Huan Liu. 2009b. Scalable learning of collective behavior based on sparse social dimensions. In CIKM'09. ACM, 1107--1116. Google ScholarDigital Library
Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW'18. 539--548. Google ScholarDigital Library
Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In KDD?16. ACM, 1225--1234. Google ScholarDigital Library
Jingdong Wang, Ting Zhang, Song Jingkuan, Nicu Sebe, and Heng Tao Shen. 2018. A survey on learning to hash. TPAMI , Vol. 40, 4 (2018), 769--790.Google ScholarCross Ref
Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In NIPS'09. 1753--1760. Google ScholarDigital Library
Wei Wu, Bin Li, Ling Chen, and Chengqi Zhang. 2018. Efficient Attributed Network Embedding via Recursive Randomized Hashing.. In IJCAI'18 . 2861--2867. Google ScholarDigital Library
Dingqi Yang, Bin Li, and Philippe Cudré-Mauroux. 2016. POIsketch: Semantic Place Labeling over User Activity Streams. In IJCAI'16 . 2697--2703. Google ScholarDigital Library
Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. 2017. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift. In ICDM'17. IEEE, 545--554.Google Scholar
Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. 2018. D2HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms. TKDE , Vol. 1 (2018), 1--14.Google Scholar
Dingqi Yang, Bingqing Qu, Jie Yang, and Philippe Cudre-Mauroux. 2019. Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach. In WWW'19. ACM, 2147--2157. Google ScholarDigital Library
Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. KIS , Vol. 42, 1 (2015), 181--213. Google ScholarDigital Library
Dingyuan Zhu, Peng Cui, Daixin Wang, and Wenwu Zhu. 2018. Deep variational network embedding in wasserstein space. In KDD'18. ACM, 2827--2836. Google ScholarDigital Library

Index Terms

NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching
1. Information systems
  1. Information systems applications
    1. Data mining
  2. World Wide Web
    1. Web applications

Recommendations

The nonorientable genus of joins of complete graphs with large edgeless graphs

We show that for n=4 and n>=6, K"n has a nonorientable embedding in which all the facial walks are hamilton cycles. Moreover, when n is odd there is such an embedding that is 2-face-colorable. Using these results we consider the join of an edgeless ...
Read More
Families of Dot-Product Snarks on Orientable Surfaces of Low Genus

We introduce a generalized dot product and provide some embedding conditions under which the genus of a graph does not rise under a dot product with the Petersen graph. Using these conditions, we disprove a conjecture of Tinsley and Watkins on the genus ...
Read More
Embedding into Bipartite Graphs

The conjecture of Bollobás and Komlós, recently proved by Böttcher, Schacht, and Taraz [Math. Ann., 343 (2009), pp. 175-205], implies that for any $\gamma>0$, every balanced bipartite graph on $2n$ vertices with bounded degree and sublinear bandwidth ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data independent hashing
graph embedding
recursive sketching
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 936
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

The nonorientable genus of joins of complete graphs with large edgeless graphs

Families of Dot-Product Snarks on Orientable Surfaces of Low Genus

Embedding into Bipartite Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

The nonorientable genus of joins of complete graphs with large edgeless graphs

Families of Dot-Product Snarks on Orientable Surfaces of Low Genus

Embedding into Bipartite Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media