skip to main content
10.1145/3219819.3220024acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks

Authors Info & Claims
Published:19 July 2018Publication History

ABSTRACT

Massive and dynamic networks arise in many practical applications such as social media, security and public health. Given an evolutionary network, it is crucial to detect structural anomalies, such as vertices and edges whose "behaviors'' deviate from underlying majority of the network, in a real-time fashion. Recently, network embedding has proven a powerful tool in learning the low-dimensional representations of vertices in networks that can capture and preserve the network structure. However, most existing network embedding approaches are designed for static networks, and thus may not be perfectly suited for a dynamic environment in which the network representation has to be constantly updated. In this paper, we propose a novel approach, NetWalk, for anomaly detection in dynamic networks by learning network representations which can be updated dynamically as the network evolves. We first encode the vertices of the dynamic network to vector representations by clique embedding, which jointly minimizes the pairwise distance of vertex representations of each walk derived from the dynamic networks, and the deep autoencoder reconstruction error serving as a global regularization. The vector representations can be computed with constant space requirements using reservoir sampling. On the basis of the learned low-dimensional vertex representations, a clustering-based technique is employed to incrementally and dynamically detect network anomalies. Compared with existing approaches, NetWalk has several advantages: 1) the network embedding can be updated dynamically, 2) streaming network nodes and edges can be encoded efficiently with constant memory space usage, 3). flexible to be applied on different types of networks, and 4) network anomalies can be detected in real-time. Extensive experiments on four real datasets demonstrate the effectiveness of NetWalk.

Skip Supplemental Material Section

Supplemental Material

yu_netwalk_approach.mp4

mp4

306.8 MB

References

  1. Charu C Aggarwal . 2013. Outlier Analysis. Springer. Google ScholarGoogle Scholar
  2. Charu C Aggarwal, Yuchen Zhao, and S Yu Philip . 2010. On Clustering Graph Streams.. In SDM. SIAM, 478--489.Google ScholarGoogle Scholar
  3. Charu C Aggarwal, Yuchen Zhao, and S Yu Philip . 2011. Outlier detection in graph streams. In ICDE. IEEE, 399--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni . 2009. Streaming k-means approximation. In NIPS. 10--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leman Akoglu and Christos Faloutsos . 2013. Anomaly, event, and fraud detection in large network datasets WSDM. ACM, 773--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Leman Akoglu, Mary McGlohon, and Christos Faloutsos . 2010. Oddball: Spotting anomalies in weighted graphs. In PAKDD. Springer, 410--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leman Akoglu, Hanghang Tong, and Danai Koutra . 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery Vol. 29, 3 (2015), 626--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin . 2003. A neural probabilistic language model. JMLR Vol. 3, Feb (2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang . 2015. Heterogeneous network embedding via deep architectures SIGKDD. ACM, 119--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Wei Cheng, Kai Zhang, Haifeng Chen, Guofei Jiang, and Wei Wang . 2016. Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations SIGKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Graham Cormode and S Muthukrishnan . 2005. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms Vol. 55, 1 (2005), 58--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aaron Defazio, Francis R. Bach, and Simon Lacoste-Julien . 2014. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives. In NIPS. 1646--1654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. William Eberle and Lawrence Holder . 2007. Anomaly detection in data represented as graphs. Intelligent Data Analysis Vol. 11, 6 (2007), 663--689. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, and Jiawei Han . 2010. On community outliers and their efficient detection in information networks SIGKDD. ACM, 813--822. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aditya Grover and Jure Leskovec . 2016. node2vec: Scalable Feature Learning for Networks. (2016).Google ScholarGoogle Scholar
  16. Manish Gupta, Jing Gao, Charu C Aggarwal, and Jiawei Han . 2014 a. Outlier Detection for Temporal Data: A Survey. TKDE Vol. 9, 26 (2014), 2250--2267.Google ScholarGoogle Scholar
  17. Manish Gupta, Jing Gao, Yizhou Sun, and Jiawei Han . 2012 a. Community trend outlier detection using soft temporal pattern mining ECML/PKDD. Springer, 692--708.Google ScholarGoogle Scholar
  18. Manish Gupta, Jing Gao, Yizhou Sun, and Jiawei Han . 2012 b. Integrating community matching and outlier detection for mining evolutionary community outliers. In SIGKDD. ACM, 859--867. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Manish Gupta, Arun Mallya, Subhro Roy, Jason HD Cho, and Jiawei Han . 2014 b. Local Learning for Mining Outlier Subgraphs from Network Datasets SDM. 73--81.Google ScholarGoogle Scholar
  20. Piotr Indyk and Rajeev Motwani . 1998. Approximate nearest neighbors: towards removing the curse of dimensionality ACM Symposium on Theory of Computing. ACM, 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos . 2007. Graph evolution: Densification and shrinking diameters. TKDD Vol. 1, 1 (2007), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Omer Levy and Yoav Goldberg . 2014. Neural word embedding as implicit matrix factorization NIPS. 2177--2185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Emaad A Manzoor, Sadegh Momeni, Venkat N Venkatakrishnan, and Leman Akoglu . 2016. Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ryan McConville, Weiru Liu, and Paul Miller . 2015. Vertex clustering of augmented graph streams. SDM (2015).Google ScholarGoogle Scholar
  25. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean . 2013 a. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  26. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013 b. Distributed representations of words and phrases and their compositionality NIPS. 3111--3119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Andrew Ng . 2011. Sparse autoencoder. CS294A Lecture notes Vol. 72, 2011 (2011), 1--19.Google ScholarGoogle Scholar
  28. Tore Opsahl and Pietro Panzarasa . 2009. Clustering in weighted networks. Social networks (2009), 155--163.Google ScholarGoogle Scholar
  29. Bryan Perozzi and Leman Akoglu . 2016. Scalable anomaly ranking of attributed neighborhoods SDM. SIAM, 207--215.Google ScholarGoogle Scholar
  30. Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, and Emmanuel Müller . 2014 a. Focused Clustering and Outlier Detection in Large Attributed Graphs SIGKDD. 1346--1355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena . 2014 b. Deepwalk: Online learning of social representations SIGKDD. ACM, 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Stephen Ranshous, Steve Harenberg, Kshitij Sharma, Nagiza F Samatova, et almbox. . 2016. A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations. In SDM.Google ScholarGoogle Scholar
  33. Stephen Ranshous, Shitian Shen, Danai Koutra, Steve Harenberg, Christos Faloutsos, and Nagiza F Samatova . 2015. Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics Vol. 7, 3 (2015), 223--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. David E Rumelhart, Geoffrey E Hinton, Ronald J Williams, et almbox. . 1988. Learning representations by back-propagating errors. Cognitive modeling Vol. 5, 3 (1988), 1.Google ScholarGoogle Scholar
  35. Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos . 2005. Neighborhood formation and anomaly detection in bipartite graphs ICDM. IEEE, 8--17.Google ScholarGoogle Scholar
  36. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei . 2015. Line: Large-scale information network embedding. In WWW. ACM, 1067--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Fei Tian, Bin Gao, Qing Cui, Enhong Chen, and Tie-Yan Liu . 2014. Learning Deep Representations for Graph Clustering. AAAI. 1293--1299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ulrike Von Luxburg . 2007. A tutorial on spectral clustering. Statistics and computing Vol. 17, 4 (2007), 395--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Daixin Wang, Peng Cui, and Wenwu Zhu . 2016. Structural deep network embedding. In SIGKDD. ACM, 1225--1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wenchao Yu, Charu C Aggarwal, and Wei Wang . 2017. Temporally factorized network modeling for evolutionary network analysis WSDM. ACM, 455--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wenchao Yu, Guangxiang Zeng, Ping Luo, Fuzhen Zhuang, Qing He, and Zhongzhi Shi . 2013. Embedding with autoencoder regularization. In ECML/PKDD. Springer, 208--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wayne W Zachary . 1977. An information flow model for conflict and fission in small groups. Journal of Anthropological Research (1977), 452--473.Google ScholarGoogle Scholar
  43. Yuchen Zhao and Philip Yu . 2013. On graph stream clustering with side information. In SDM. SIAM, 139--150.Google ScholarGoogle Scholar
  44. Lekui Zhou, Yang Yang, Xiang Ren, Fei Wu, and Yueting Zhuang . 2018. Dynamic Network Embedding by Modeling Triadic Closure Process. (2018).Google ScholarGoogle Scholar

Index Terms

  1. NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2018
          2925 pages
          ISBN:9781450355520
          DOI:10.1145/3219819

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 July 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader