ABSTRACT
Graph embedding learns low-dimensional representations for nodes in a graph and effectively preserves the graph structure. Recently, a significant amount of progress has been made toward this emerging research area. However, there are several fundamental problems that remain open. First, existing methods fail to preserve the out-degree distributions on directed graphs. Second, many existing methods employ random walk based proximities and thus suffer from conflicting optimization goals on undirected graphs. Finally, existing factorization methods are unable to achieve scalability and non-linearity simultaneously. This paper presents an in-depth study on graph embedding techniques on both directed and undirected graphs. We analyze the fundamental reasons that lead to the distortion of out-degree distributions and to the conflicting optimization goals. We propose transpose proximity, a unified approach that solves both problems. Based on the concept of transpose proximity, we design STRAP, a factorization based graph embedding algorithm that achieves scalability and non-linearity simultaneously. STRAP makes use of the backward push algorithm to efficiently compute the sparse Personalized PageRank (PPR) as its transpose proximities. By imposing the sparsity constraint, we are able to apply non-linear operations to the proximity matrix and perform efficient matrix factorization to derive the embedding vectors. Finally, we present an extensive experimental study that evaluates the effectiveness of various graph embedding algorithms, and we show that \strap outperforms the state-of-the-art methods in terms of effectiveness and scalability.
- http://snap.stanford.edu/data/index.html.Google Scholar
- http://law.di.unimi.it/datasets.php.Google Scholar
- https://github.com/leoribeiro/struc2vec/.Google Scholar
- Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J Smola. Distributed large-scale natural graph factorization. In WWW, pages 37--48. ACM, 2013. Google ScholarDigital Library
- Rodrigo Aldecoa, Chiara Orsini, and Dmitri Krioukov. Hyperbolic graph generator. Computer Physics Communications, 196:492--496, 2015.Google ScholarCross Ref
- Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. Fast incremental and personalized pagerank. VLDB, 4(3):173--184, 2010. Google ScholarDigital Library
- Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, pages 585--591, 2002. Google ScholarDigital Library
- Mansurul Bhuiyan and Mohammad Al Hasan. Representing graphs as bag of vertices and partitions for graph classification. Data Science and Engineering, 3(2):150--165, 2018.Google ScholarCross Ref
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891--900. ACM, 2015. Google ScholarDigital Library
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. Deep neural networks for learning graph representations. In AAAI, pages 1145--1152, 2016. Google ScholarDigital Library
- Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. Heterogeneous network embedding via deep architectures. In SIGKDD, pages 119--128. ACM, 2015. Google ScholarDigital Library
- Siheng Chen, Sufeng Niu, Leman Akoglu, Jelena Kovavc ević, and Christos Faloutsos. Fast, warped graph embedding: Unifying framework and one-click algorithm. arXiv preprint arXiv:1702.05764, 2017.Google Scholar
- Kenneth L Clarkson and David P Woodruff. Low-rank approximation and regression in input sparsity time. JACM, 63(6):54, 2017. Google ScholarDigital Library
- Peng Cui, Xiao Wang, Jian Pei, and Wenwu Zhu. A survey on network embedding. TKDE, 2018.Google Scholar
- Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In SIGKDD, pages 135--144. ACM, 2017. Google ScholarDigital Library
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. JMLR, 9(Aug):1871--1874, 2008. Google ScholarDigital Library
- Xu Feng, Yuyang Xie, Mingye Song, Wenjian Yu, and Jie Tang. Fast randomized pca for sparse data. In Asian Conference on Machine Learning, pages 710--725, 2018.Google Scholar
- Palash Goyal and Emilio Ferrara. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151:78--94, 2018.Google ScholarCross Ref
- Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855--864. ACM, 2016. Google ScholarDigital Library
- Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In SIGKDD, pages 538--543, 2002. Google ScholarDigital Library
- Junghwan Kim, Haekyu Park, Ji-Eun Lee, and U Kang. Side: representation learning in signed directed networks. In WWW, pages 509--518. International World Wide Web Conferences Steering Committee, 2018. Google ScholarDigital Library
- Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. ICLR, 2017.Google Scholar
- Kyle Kloster and David F Gleich. Heat kernel based community detection. In SIGKDD, pages 1386--1395. ACM, 2014. Google ScholarDigital Library
- Dmitri Krioukov, Fragkiskos Papadopoulos, Maksim Kitsak, Amin Vahdat, and Marián Boguná. Hyperbolic geometry of complex networks. Physical Review E, 82(3):036106, 2010.Google ScholarCross Ref
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. Personalized pagerank estimation and search: A bidirectional approach. In WSDM, pages 163--172, 2016. Google ScholarDigital Library
- Jianxin Ma, Peng Cui, and Wenwu Zhu. Depthlgp: Learning embeddings of out-of-sample nodes in dynamic networks. AAAI, 2018.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarDigital Library
- Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural networks for graphs. In ICML, pages 2014--2023, 2016. Google ScholarDigital Library
- Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In SIGKDD, pages 1105--1114. ACM, 2016. Google ScholarDigital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: bringing order to the web. 1999.Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages 701--710. ACM, 2014. Google ScholarDigital Library
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459--467. ACM, 2018. Google ScholarDigital Library
- Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. struc2vec: Learning node representations from structural identity. In SIGKDD, pages 385--394. ACM, 2017. Google ScholarDigital Library
- Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500):2323--2326, 2000.Google Scholar
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Large-scale information network embedding. In WWW, pages 1067--1077. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
- Lei Tang, Suju Rajan, and Vijay K Narayanan. Large scale multi-label classification via metalabeler. In WWW, pages 211--220. ACM, 2009. Google ScholarDigital Library
- Anton Tsitsulin, Davide Mottin, Panagiotis Karras, and Emmanuel Müller. Verse: Versatile graph embeddings from similarity measures. In WWW, pages 539--548. International World Wide Web Conferences Steering Committee, 2018. Google ScholarDigital Library
- Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Mining multi-label data. In Data mining and knowledge discovery handbook, pages 667--685. Springer, 2009.Google ScholarCross Ref
- Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In KDD, pages 1225--1234. ACM, 2016. Google ScholarDigital Library
- Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456. ACM, 2018. Google ScholarDigital Library
- Shuhan Yuan, Xintao Wu, and Yang Xiang. Sne: signed network embedding. In PAKDD, pages 183--195. Springer, 2017.Google ScholarCross Ref
- Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Network representation learning: A survey. IEEE transactions on Big Data, 2018.Google Scholar
- Ziwei Zhang, Peng Cui, Xiao Wang, Jian Pei, Xuanrong Yao, and Wenwu Zhu. Arbitrary-order proximity preserved network embedding. In SIGKDD, pages 2778--2786. ACM, 2018. Google ScholarDigital Library
- Chang Zhou, Yuqiong Liu, Xiaofei Liu, Zhongyi Liu, and Jun Gao. Scalable graph embedding for asymmetric proximity. In AAAI, pages 2942--2948, 2017. Google ScholarDigital Library
- Lekui Zhou, Yang Yang, Xiang Ren, Fei Wu, and Yueting Zhuang. Dynamic network embedding by modeling triadic closure process. 2018.Google Scholar
Index Terms
- Scalable Graph Embeddings via Sparse Transpose Proximities
Recommendations
Odd complete minors in even embeddings on surfaces
In this paper, we study the odd K m -minor problem in even embeddings on surfaces. We first establish a general theory for even embeddings with odd K m -minors. Given an integer m we show that for every surface F 2 of sufficiently high genus there ...
Graph Ear Decompositions and Graph Embeddings
Ear decomposition of a graph has been extensively studied in relation to graph connectivity. In this paper, a connection of ear decomposition to graph embeddings is exhibited. It is shown that constructing a maximum-paired ear decomposition of a graph ...
SubRank: Subgraph Embeddings via a Subgraph Proximity Measure
Advances in Knowledge Discovery and Data MiningAbstractRepresentation learning for graph data has gained a lot of attention in recent years. However, state-of-the-art research is focused mostly on node embeddings, with little effort dedicated to the closely related task of computing subgraph ...
Comments