ABSTRACT
Embeddings have become a key paradigm to learn graph representations and facilitate downstream graph analysis tasks. Existing graph embedding techniques either sample a large number of node pairs from a graph to learn node embeddings via stochastic optimization, or factorize a high-order proximity/adjacency matrix of the graph via expensive matrix factorization. However, these techniques usually require significant computational resources for the learning process, which hinders their applications on large-scale graphs. Moreover, the cosine similarity preserved by these techniques shows suboptimal efficiency in downstream graph analysis tasks, compared to Hamming similarity, for example. To address these issues, we propose NodeSketch, a highly-efficient graph embedding technique preserving high-order node proximity via recursive sketching. Specifically, built on top of an efficient data-independent hashing/sketching technique, NodeSketch generates node embeddings in Hamming space. For an input graph, it starts by sketching the self-loop-augmented adjacency matrix of the graph to output low-order node embeddings, and then recursively generates k-order node embeddings based on the self-loop-augmented adjacency matrix and (k-1)-order node embeddings. Our extensive evaluation compares NodeSketch against a sizable collection of state-of-the-art techniques using five real-world graphs on two graph analysis tasks. The results show that NodeSketch achieves state-of-the-art performance compared to these techniques, while showing significant speedup of 9x-372x in the embedding learning process and 1.19x-1.68x speedup when performing downstream graph analysis tasks.
- Alex Arenas, Alberto Fernandez, and Sergio Gomez. 2008. Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, Vol. 10, 5 (2008), 053039.Google ScholarCross Ref
- Andrei Z Broder. 1997. On the resemblance and containment of documents. In Proceedings of Compression and complexity of sequences 1997. IEEE, 21--29. Google ScholarDigital Library
- Hongyun Cai, Vincent W Zheng, and Kevin Chang. 2018. A comprehensive survey of graph embedding: problems, techniques and applications. TKDE (2018).Google Scholar
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. Grarep: Learning graph representations with global structural information. In CIKM'15. ACM, 891--900. Google ScholarDigital Library
- Lianhua Chi, Bin Li, and Xingquan Zhu. 2014. Context-preserving hashing for fast text classification. In SDM'14. SIAM, 100--108.Google Scholar
- Lianhua Chi and Xingquan Zhu. 2017. Hashing techniques: A survey and taxonomy. ACM Computing Surveys (CSUR) , Vol. 50, 1 (2017), 11. Google ScholarDigital Library
- Aristides Gionis, Piotr Indyk, Rajeev Motwani, et almbox. 1999. Similarity search in high dimensions via hashing. In VLDB'99, Vol. 99. 518--529. Google ScholarDigital Library
- Yunchao Gong and Svetlana Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In CVPR'11. IEEE, 817--824. Google ScholarDigital Library
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD'16. ACM, 855--864. Google ScholarDigital Library
- Bernhard Haeupler, Mark Manasse, and Kunal Talwar. 2014. Consistent weighted sampling made fast, small, and easy. arXiv preprint arXiv:1410.4266 (2014).Google Scholar
- Wassily Hoeffding. 1963. Probability inequalities for sums of bounded random variables. Journal of the American statistical association , Vol. 58, 301 (1963), 13--30.Google ScholarCross Ref
- Rana Hussein, Dingqi Yang, and Philippe Cudré-Mauroux. 2018. Are Meta-Paths Necessary?: Revisiting Heterogeneous Graph Embeddings. In CIKM'18. ACM, 437--446. Google ScholarDigital Library
- Sergey Ioffe. 2010. Improved consistent sampling, weighted minhash and l1 sketching. In ICDM'10. IEEE, 246--255. Google ScholarDigital Library
- Qing-Yuan Jiang and Wu-Jun Li. 2015. Scalable Graph Hashing with Feature Transformation.. In IJCAI'15 . 2248--2254. Google ScholarDigital Library
- Bin Li, Xingquan Zhu, Lianhua Chi, and Chengqi Zhang. 2012. Nested subtree hash kernels for large-scale graph classification over streams. In ICDM'12. IEEE, 399--408. Google ScholarDigital Library
- Ping Li. 2015a. 0-bit consistent weighted sampling. In KDD'15. ACM, 665--674. Google ScholarDigital Library
- Ping Li. 2015b. Min-max kernels. arXiv preprint arXiv:1503.01737 (2015).Google Scholar
- Defu Lian, Kai Zheng, Vincent W Zheng, Yong Ge, Longbing Cao, Ivor W Tsang, and Xing Xie. 2018. High-order Proximity Preserving Information Network Hashing. In KDD'18. ACM, 1744--1753. Google ScholarDigital Library
- Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang. 2014. Discrete graph hashing. In NIPS'14. 3419--3427. Google ScholarDigital Library
- Mark Manasse, Frank McSherry, and Kunal Talwar. 2010. Consistent weighted sampling. Technical Report MSR-TR-2010--73 (2010).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS . 3111--3119. Google ScholarDigital Library
- Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. 2016. Asymmetric transitivity preserving graph embedding. In KDD'16. ACM, 1105--1114. Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD'14. ACM, 701--710. Google ScholarDigital Library
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM'18 . ACM, 459--467. Google ScholarDigital Library
- Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised discrete hashing. In CVPR'15. 37--45.Google ScholarCross Ref
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW'15. 1067--1077. Google ScholarDigital Library
- Lei Tang and Huan Liu. 2009a. Relational learning via latent social dimensions. In KDD'09. ACM, 817--826. Google ScholarDigital Library
- Lei Tang and Huan Liu. 2009b. Scalable learning of collective behavior based on sparse social dimensions. In CIKM'09. ACM, 1107--1116. Google ScholarDigital Library
- Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. 2018. VERSE: Versatile Graph Embeddings from Similarity Measures. In WWW'18. 539--548. Google ScholarDigital Library
- Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural deep network embedding. In KDD?16. ACM, 1225--1234. Google ScholarDigital Library
- Jingdong Wang, Ting Zhang, Song Jingkuan, Nicu Sebe, and Heng Tao Shen. 2018. A survey on learning to hash. TPAMI , Vol. 40, 4 (2018), 769--790.Google ScholarCross Ref
- Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In NIPS'09. 1753--1760. Google ScholarDigital Library
- Wei Wu, Bin Li, Ling Chen, and Chengqi Zhang. 2018. Efficient Attributed Network Embedding via Recursive Randomized Hashing.. In IJCAI'18 . 2861--2867. Google ScholarDigital Library
- Dingqi Yang, Bin Li, and Philippe Cudré-Mauroux. 2016. POIsketch: Semantic Place Labeling over User Activity Streams. In IJCAI'16 . 2697--2703. Google ScholarDigital Library
- Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. 2017. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift. In ICDM'17. IEEE, 545--554.Google Scholar
- Dingqi Yang, Bin Li, Laura Rettig, and Philippe Cudré-Mauroux. 2018. D2HistoSketch: Discriminative and Dynamic Similarity-Preserving Sketching of Streaming Histograms. TKDE , Vol. 1 (2018), 1--14.Google Scholar
- Dingqi Yang, Bingqing Qu, Jie Yang, and Philippe Cudre-Mauroux. 2019. Revisiting User Mobility and Social Relationships in LBSNs: A Hypergraph Embedding Approach. In WWW'19. ACM, 2147--2157. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. KIS , Vol. 42, 1 (2015), 181--213. Google ScholarDigital Library
- Dingyuan Zhu, Peng Cui, Daixin Wang, and Wenwu Zhu. 2018. Deep variational network embedding in wasserstein space. In KDD'18. ACM, 2827--2836. Google ScholarDigital Library
Index Terms
- NodeSketch: Highly-Efficient Graph Embeddings via Recursive Sketching
Recommendations
The nonorientable genus of joins of complete graphs with large edgeless graphs
We show that for n=4 and n>=6, K"n has a nonorientable embedding in which all the facial walks are hamilton cycles. Moreover, when n is odd there is such an embedding that is 2-face-colorable. Using these results we consider the join of an edgeless ...
Families of Dot-Product Snarks on Orientable Surfaces of Low Genus
We introduce a generalized dot product and provide some embedding conditions under which the genus of a graph does not rise under a dot product with the Petersen graph. Using these conditions, we disprove a conjecture of Tinsley and Watkins on the genus ...
Embedding into Bipartite Graphs
The conjecture of Bollobás and Komlós, recently proved by Böttcher, Schacht, and Taraz [Math. Ann., 343 (2009), pp. 175-205], implies that for any $\gamma>0$, every balanced bipartite graph on $2n$ vertices with bounded degree and sublinear bandwidth ...
Comments