ABSTRACT
Massive and dynamic networks arise in many practical applications such as social media, security and public health. Given an evolutionary network, it is crucial to detect structural anomalies, such as vertices and edges whose "behaviors'' deviate from underlying majority of the network, in a real-time fashion. Recently, network embedding has proven a powerful tool in learning the low-dimensional representations of vertices in networks that can capture and preserve the network structure. However, most existing network embedding approaches are designed for static networks, and thus may not be perfectly suited for a dynamic environment in which the network representation has to be constantly updated. In this paper, we propose a novel approach, NetWalk, for anomaly detection in dynamic networks by learning network representations which can be updated dynamically as the network evolves. We first encode the vertices of the dynamic network to vector representations by clique embedding, which jointly minimizes the pairwise distance of vertex representations of each walk derived from the dynamic networks, and the deep autoencoder reconstruction error serving as a global regularization. The vector representations can be computed with constant space requirements using reservoir sampling. On the basis of the learned low-dimensional vertex representations, a clustering-based technique is employed to incrementally and dynamically detect network anomalies. Compared with existing approaches, NetWalk has several advantages: 1) the network embedding can be updated dynamically, 2) streaming network nodes and edges can be encoded efficiently with constant memory space usage, 3). flexible to be applied on different types of networks, and 4) network anomalies can be detected in real-time. Extensive experiments on four real datasets demonstrate the effectiveness of NetWalk.
Supplemental Material
- Charu C Aggarwal . 2013. Outlier Analysis. Springer. Google Scholar
- Charu C Aggarwal, Yuchen Zhao, and S Yu Philip . 2010. On Clustering Graph Streams.. In SDM. SIAM, 478--489.Google Scholar
- Charu C Aggarwal, Yuchen Zhao, and S Yu Philip . 2011. Outlier detection in graph streams. In ICDE. IEEE, 399--409. Google ScholarDigital Library
- Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni . 2009. Streaming k-means approximation. In NIPS. 10--18. Google ScholarDigital Library
- Leman Akoglu and Christos Faloutsos . 2013. Anomaly, event, and fraud detection in large network datasets WSDM. ACM, 773--774. Google ScholarDigital Library
- Leman Akoglu, Mary McGlohon, and Christos Faloutsos . 2010. Oddball: Spotting anomalies in weighted graphs. In PAKDD. Springer, 410--421. Google ScholarDigital Library
- Leman Akoglu, Hanghang Tong, and Danai Koutra . 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery Vol. 29, 3 (2015), 626--688. Google ScholarDigital Library
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin . 2003. A neural probabilistic language model. JMLR Vol. 3, Feb (2003), 1137--1155. Google ScholarDigital Library
- Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang . 2015. Heterogeneous network embedding via deep architectures SIGKDD. ACM, 119--128. Google ScholarDigital Library
- Wei Cheng, Kai Zhang, Haifeng Chen, Guofei Jiang, and Wei Wang . 2016. Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations SIGKDD. Google ScholarDigital Library
- Graham Cormode and S Muthukrishnan . 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms Vol. 55, 1 (2005), 58--75. Google ScholarDigital Library
- Aaron Defazio, Francis R. Bach, and Simon Lacoste-Julien . 2014. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives. In NIPS. 1646--1654. Google ScholarDigital Library
- William Eberle and Lawrence Holder . 2007. Anomaly detection in data represented as graphs. Intelligent Data Analysis Vol. 11, 6 (2007), 663--689. Google ScholarDigital Library
- Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, and Jiawei Han . 2010. On community outliers and their efficient detection in information networks SIGKDD. ACM, 813--822. Google ScholarDigital Library
- Aditya Grover and Jure Leskovec . 2016. node2vec: Scalable Feature Learning for Networks. (2016).Google Scholar
- Manish Gupta, Jing Gao, Charu C Aggarwal, and Jiawei Han . 2014 a. Outlier Detection for Temporal Data: A Survey. TKDE Vol. 9, 26 (2014), 2250--2267.Google Scholar
- Manish Gupta, Jing Gao, Yizhou Sun, and Jiawei Han . 2012 a. Community trend outlier detection using soft temporal pattern mining ECML/PKDD. Springer, 692--708.Google Scholar
- Manish Gupta, Jing Gao, Yizhou Sun, and Jiawei Han . 2012 b. Integrating community matching and outlier detection for mining evolutionary community outliers. In SIGKDD. ACM, 859--867. Google ScholarDigital Library
- Manish Gupta, Arun Mallya, Subhro Roy, Jason HD Cho, and Jiawei Han . 2014 b. Local Learning for Mining Outlier Subgraphs from Network Datasets SDM. 73--81.Google Scholar
- Piotr Indyk and Rajeev Motwani . 1998. Approximate nearest neighbors: towards removing the curse of dimensionality ACM Symposium on Theory of Computing. ACM, 604--613. Google ScholarDigital Library
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos . 2007. Graph evolution: Densification and shrinking diameters. TKDD Vol. 1, 1 (2007), 2. Google ScholarDigital Library
- Omer Levy and Yoav Goldberg . 2014. Neural word embedding as implicit matrix factorization NIPS. 2177--2185. Google ScholarDigital Library
- Emaad A Manzoor, Sadegh Momeni, Venkat N Venkatakrishnan, and Leman Akoglu . 2016. Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs KDD. Google ScholarDigital Library
- Ryan McConville, Weiru Liu, and Paul Miller . 2015. Vertex clustering of augmented graph streams. SDM (2015).Google Scholar
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013 a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013 b. Distributed representations of words and phrases and their compositionality NIPS. 3111--3119. Google ScholarDigital Library
- Andrew Ng . 2011. Sparse autoencoder. CS294A Lecture notes Vol. 72, 2011 (2011), 1--19.Google Scholar
- Tore Opsahl and Pietro Panzarasa . 2009. Clustering in weighted networks. Social networks (2009), 155--163.Google Scholar
- Bryan Perozzi and Leman Akoglu . 2016. Scalable anomaly ranking of attributed neighborhoods SDM. SIAM, 207--215.Google Scholar
- Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, and Emmanuel Müller . 2014 a. Focused Clustering and Outlier Detection in Large Attributed Graphs SIGKDD. 1346--1355. Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena . 2014 b. Deepwalk: Online learning of social representations SIGKDD. ACM, 701--710. Google ScholarDigital Library
- Stephen Ranshous, Steve Harenberg, Kshitij Sharma, Nagiza F Samatova, et almbox. . 2016. A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations. In SDM.Google Scholar
- Stephen Ranshous, Shitian Shen, Danai Koutra, Steve Harenberg, Christos Faloutsos, and Nagiza F Samatova . 2015. Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics Vol. 7, 3 (2015), 223--247. Google ScholarDigital Library
- David E Rumelhart, Geoffrey E Hinton, Ronald J Williams, et almbox. . 1988. Learning representations by back-propagating errors. Cognitive modeling Vol. 5, 3 (1988), 1.Google Scholar
- Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos . 2005. Neighborhood formation and anomaly detection in bipartite graphs ICDM. IEEE, 8--17.Google Scholar
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei . 2015. Line: Large-scale information network embedding. In WWW. ACM, 1067--1077. Google ScholarDigital Library
- Fei Tian, Bin Gao, Qing Cui, Enhong Chen, and Tie-Yan Liu . 2014. Learning Deep Representations for Graph Clustering. AAAI. 1293--1299. Google ScholarDigital Library
- Ulrike Von Luxburg . 2007. A tutorial on spectral clustering. Statistics and computing Vol. 17, 4 (2007), 395--416. Google ScholarDigital Library
- Daixin Wang, Peng Cui, and Wenwu Zhu . 2016. Structural deep network embedding. In SIGKDD. ACM, 1225--1234. Google ScholarDigital Library
- Wenchao Yu, Charu C Aggarwal, and Wei Wang . 2017. Temporally factorized network modeling for evolutionary network analysis WSDM. ACM, 455--464. Google ScholarDigital Library
- Wenchao Yu, Guangxiang Zeng, Ping Luo, Fuzhen Zhuang, Qing He, and Zhongzhi Shi . 2013. Embedding with autoencoder regularization. In ECML/PKDD. Springer, 208--223. Google ScholarDigital Library
- Wayne W Zachary . 1977. An information flow model for conflict and fission in small groups. Journal of Anthropological Research (1977), 452--473.Google Scholar
- Yuchen Zhao and Philip Yu . 2013. On graph stream clustering with side information. In SDM. SIAM, 139--150.Google Scholar
- Lekui Zhou, Yang Yang, Xiang Ren, Fei Wu, and Yueting Zhuang . 2018. Dynamic Network Embedding by Modeling Triadic Closure Process. (2018).Google Scholar
Index Terms
- NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks
Recommendations
Dynamic network embedding survey
AbstractSince many real world networks are evolving over time, such as social networks and user-item networks, there are increasing research efforts on dynamic network embedding in recent years. They learn node representations from a sequence ...
D2NE: Deep Dynamic Network Embedding
Advanced Data Mining and ApplicationsAbstractIn this paper, we propose a new approach named D2NE, short for Deep Dynamic Network Embedding, to learn the vertex representations for dynamic networks. The algorithm utilizes the graph attention mechanism to refresh embeddings efficiently, in ...
Learning and Updating Node Embedding on Dynamic Heterogeneous Information Network
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data MiningHeterogeneous information networks consist of multiple types of edges and nodes, which have a strong ability to represent the rich semantics underpinning network structures. Recently, the dynamics of networks has been studied in many tasks such as ...
Comments