skip to main content
10.1145/3292500.3330951acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching

Published:25 July 2019Publication History

ABSTRACT

Embeddings have become a key paradigm to learn graph representations and facilitate downstream graph analysis tasks. Existing graph embedding techniques either sample a large number of node pairs from a graph to learn node embeddings via stochastic optimization, or factorize a high-order proximity/adjacency matrix of the graph via expensive matrix factorization. However, these techniques usually require significant computational resources for the learning process, which hinders their applications on large-scale graphs. Moreover, the cosine similarity preserved by these techniques shows suboptimal efficiency in downstream graph analysis tasks, compared to Hamming similarity, for example. To address these issues, we propose NodeSketch, a highly-efficient graph embedding technique preserving high-order node proximity via recursive sketching. Specifically, built on top of an efficient data-independent hashing/sketching technique, NodeSketch generates node embeddings in Hamming space. For an input graph, it starts by sketching the self-loop-augmented adjacency matrix of the graph to output low-order node embeddings, and then recursively generates k-order node embeddings based on the self-loop-augmented adjacency matrix and (k-1)-order node embeddings. Our extensive evaluation compares NodeSketch against a sizable collection of state-of-the-art techniques using five real-world graphs on two graph analysis tasks. The results show that NodeSketch achieves state-of-the-art performance compared to these techniques, while showing significant speedup of 9x-372x in the embedding learning process and 1.19x-1.68x speedup when performing downstream graph analysis tasks.

References

  1. Alex Arenas, Alberto Fernandez, and Sergio Gomez. 2008. Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, Vol. 10, 5 (2008), 053039.Google ScholarGoogle ScholarCross RefCross Ref
  2. Andrei Z Broder. 1997. On the resemblance and containment of documents. In Proceedings of Compression and complexity of sequences 1997. IEEE, 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hongyun Cai, Vincent W Zheng, and Kevin Chang. 2018. A comprehensive survey of graph embedding: problems, techniques and applications. TKDE (2018).Google ScholarGoogle Scholar
  4. Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. Grarep: Learning graph representations with global structural information. In CIKM'15. ACM, 891--900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lianhua Chi, Bin Li, and Xingquan Zhu. 2014. Context-preserving hashing for fast text classification. In SDM'14. SIAM, 100--108.Google ScholarGoogle Scholar
  6. Lianhua Chi and Xingquan Zhu. 2017. Hashing techniques: A survey and taxonomy. ACM Computing Surveys (CSUR) , Vol. 50, 1 (2017), 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aristides Gionis, Piotr Indyk, Rajeev Motwani, et almbox. 1999. Similarity search in high dimensions via hashing. In VLDB'99, Vol. 99. 518--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yunchao Gong and Svetlana Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In CVPR'11. IEEE, 817--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD'16. ACM, 855--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bernhard Haeupler, Mark Manasse, and Kunal Talwar. 2014. Consistent weighted sampling made fast, small, and easy. arXiv preprint arXiv:1410.4266 (2014).Google ScholarGoogle Scholar
  11. Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American statistical association , Vol. 58, 301 (1963), 13--30.Google ScholarGoogle ScholarCross RefCross Ref
  12. Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. 2018. Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings. In CIKM'18. ACM, 437--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sergey Ioffe. 2010. Improved consistent sampling, weighted minhash and l1 sketching. In ICDM'10. IEEE, 246--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Qing-Yuan Jiang and Wu-Jun Li. 2015. Scalable Graph Hashing with Feature Transformation.. In IJCAI'15 . 2248--2254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bin Li, Xingquan Zhu, Lianhua Chi, and Chengqi Zhang. 2012. Nested subtree hash kernels for large-scale graph classification over streams. In ICDM'12. IEEE, 399--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ping Li. 2015a. 0-bit consistent weighted sampling. In KDD'15. ACM, 665--674. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ping Li. 2015b. Min-max kernels. arXiv preprint arXiv:1503.01737 (2015).Google ScholarGoogle Scholar
  18. Defu Lian, Kai Zheng, Vincent W Zheng, Yong Ge, Longbing Cao, Ivor W Tsang, and Xing Xie. 2018. High-order Proximity Preserving Information Network Hashing. In KDD'18. ACM, 1744--1753. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang. 2014. Discrete graph hashing. In NIPS'14. 3419--3427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Manasse, Frank McSherry, and Kunal Talwar. 2010. Consistent weighted sampling. Technical Report MSR-TR-2010--73 (2010).Google ScholarGoogle Scholar
  21. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS . 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric transitivity preserving graph embedding. In KDD'16. ACM, 1105--1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD'14. ACM, 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM'18 . ACM, 459--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In CVPR'15. 37--45.Google ScholarGoogle ScholarCross RefCross Ref
  26. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW'15. 1067--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lei Tang and Huan Liu. 2009a. Relational learning via latent social dimensions. In KDD'09. ACM, 817--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lei Tang and Huan Liu. 2009b. Scalable learning of collective behavior based on sparse social dimensions. In CIKM'09. ACM, 1107--1116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW'18. 539--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In KDD?16. ACM, 1225--1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jingdong Wang, Ting Zhang, Song Jingkuan, Nicu Sebe, and Heng Tao Shen. 2018. A survey on learning to hash. TPAMI , Vol. 40, 4 (2018), 769--790.Google ScholarGoogle ScholarCross RefCross Ref
  32. Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In NIPS'09. 1753--1760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wei Wu, Bin Li, Ling Chen, and Chengqi Zhang. 2018. Efficient Attributed Network Embedding via Recursive Randomized Hashing.. In IJCAI'18 . 2861--2867. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Dingqi Yang, Bin Li, and Philippe Cudré-Mauroux. 2016. POIsketch: Semantic Place Labeling over User Activity Streams. In IJCAI'16 . 2697--2703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. 2017. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift. In ICDM'17. IEEE, 545--554.Google ScholarGoogle Scholar
  36. Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. 2018. D2HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms. TKDE , Vol. 1 (2018), 1--14.Google ScholarGoogle Scholar
  37. Dingqi Yang, Bingqing Qu, Jie Yang, and Philippe Cudre-Mauroux. 2019. Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach. In WWW'19. ACM, 2147--2157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. KIS , Vol. 42, 1 (2015), 181--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Dingyuan Zhu, Peng Cui, Daixin Wang, and Wenwu Zhu. 2018. Deep variational network embedding in wasserstein space. In KDD'18. ACM, 2827--2836. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        July 2019
        3305 pages
        ISBN:9781450362016
        DOI:10.1145/3292500

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader