ABSTRACT
How do we spot interesting events from e-mail or transportation logs? How can we detect port scan or denial of service attacks from IP-IP communication data? In general, given a sequence of weighted, directed or bipartite graphs, each summarizing a snapshot of activity in a time window, how can we spot anomalous graphs containing the sudden appearance or disappearance of large dense subgraphs (e.g., near bicliques) in near real-time using sublinear memory? To this end, we propose a randomized sketching-based approach called SpotLight, which guarantees that an anomalous graph is mapped 'far' away from 'normal' instances in the sketch space with high probability for appropriate choice of parameters. Extensive experiments on real-world datasets show that SpotLight (a) improves accuracy by at least 8.4% compared to prior approaches, (b) is fast and can process millions of edges within a few minutes, (c) scales linearly with the number of edges and sketching dimensions and (d) leads to interesting discoveries in practice.
Supplemental Material
- 2018. NYC Taxi &Limousine Corporation - Trip Record Data. http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml.Google Scholar
- Charu C. Aggarwal, Yuchen Zhao, and Philip S. Yu . 2011. Outlier detection in graph streams. In ICDE. IEEE, 399--409. Google ScholarDigital Library
- Leman Akoglu, Mary McGlohon, and Christos Faloutsos . 2010. oddball: Spotting Anomalies in Weighted Graphs. In PAKDD, Vol. Vol. 6119. Springer, 410--421. Google ScholarDigital Library
- Leman Akoglu, Hanghang Tong, and Danai Koutra . 2015. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov. Vol. 29, 3 (2015), 626--688. Google ScholarDigital Library
- Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos . 2013. CopyCatch: stopping group attacks by spotting lockstep behavior in social networks WWW. ACM, 119--130. Google ScholarDigital Library
- Reinhard Diestel . 2012. Graph Theory, 4th Edition. Graduate texts in mathematics, Vol. Vol. 173. Springer.Google Scholar
- Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, Disha Makhija, and Mohit Kumar . 2017. ZooBP: Belief Propagation for Heterogeneous Networks. PVLDB Vol. 10, 5 (2017), 625--636. Google ScholarDigital Library
- Timothy La Fond, Jennifer Neville, and Brian Gallagher . 2014. Anomaly Detection in Dynamic Networks of Varying Size. CoRR Vol. abs/1411.3749 (2014).Google Scholar
- Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers . 2016. Robust Random Cut Forest Based Anomaly Detection on Streams ICML, Vol. Vol. 48. JMLR.org, 2712--2721. Google ScholarDigital Library
- Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash, and Hanghang Tong . 2010. Metric forensics: a multi-level approach for mining volatile graphs KDD. ACM, 163--172. Google ScholarDigital Library
- Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos . 2016. FRAUDAR: Bounding Graph Fraud in the Face of Camouflage KDD. ACM, 895--904. Google ScholarDigital Library
- Tsuyoshi Idé and Hisashi Kashima . 2004. Eigenspace-based anomaly detection in computer systems KDD. ACM, 440--449. Google ScholarDigital Library
- Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos . 2015. A General Suspiciousness Metric for Dense Blocks in Multimodal Data ICDM. IEEE, 781--786. Google ScholarDigital Library
- Danai Koutra, Neil Shah, Joshua T. Vogelstein, Brian Gallagher, and Christos Faloutsos . 2016. DeltaCon: Principled Massive-Graph Similarity Function with Attribution. TKDD Vol. 10, 3, 28:1--28:43. Google ScholarDigital Library
- Richard Lippmann, Robert K. Cunningham, David J. Fried, Isaac Graf, Kris R. Kendall, Seth E. Webster, and Marc A. Zissman . 1999. Results of the DARPA 1998 Offline Intrusion Detection Evaluation Recent Advances in Intrusion Detection.Google Scholar
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou . 2008. Isolation Forest. In ICDM. IEEE, 413--422.Google ScholarDigital Library
- Emaad A. Manzoor, Sadegh M. Milajerdi, and Leman Akoglu . 2016. Fast Memory-efficient Anomaly Detection in Streaming Heterogeneous Graphs KDD. ACM, 1035--1044. Google ScholarDigital Library
- Andrew McGregor . 2014. Graph stream algorithms: a survey. SIGMOD Record Vol. 43, 1 (2014), 9--20. Google ScholarDigital Library
- Andrew McGregor, David Tench, Sofya Vorotnikova, and Hoa T. Vu . 2015. Densest Subgraph in Dynamic Graph Streams. In MFCS, Vol. Vol. 9235. Springer, 472--482.Google Scholar
- Tomás Pevný . 2016. Loda: Lightweight on-line detector of anomalies. Machine Learning Vol. 102, 2 (2016), 275--304. Google ScholarDigital Library
- B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju, and Christos Faloutsos . 2010. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs PAKDD, Vol. Vol. 6119. Springer, 435--448. Google ScholarDigital Library
- Stephen Ranshous, Steve Harenberg, Kshitij Sharma, and Nagiza F Samatova . 2016. A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations. In SDM. SIAM, 189--197.Google Scholar
- Stephen Ranshous, Shitian Shen, Danai Koutra, Steve Harenberg, Christos Faloutsos, and Nagiza F Samatova . 2015. Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics Vol. 7, 3 (2015), 223--247. Google ScholarDigital Library
- Jitesh Shetty and Jafar Adibi . 2004. The Enron email dataset database schema and brief statistical report. Information sciences institute technical report, University of Southern California Vol. 4, 1 (2004), 120--128.Google Scholar
- Kijung Shin, Bryan Hooi, and Christos Faloutsos . 2016. M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees ECML/PKDD, Vol. Vol. 9851. Springer, 264--280. Google ScholarDigital Library
- Kumar Sricharan and Kamalika Das . 2014. Localizing anomalous changes in time-evolving graphs SIGMOD. ACM, 1347--1358. Google ScholarDigital Library
- Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S. Yu . 2007. GraphScope: parameter-free mining of large time-evolving graphs KDD. ACM, 687--696. Google ScholarDigital Library
- Jimeng Sun, Dacheng Tao, and Christos Faloutsos . 2006. Beyond streams and graphs: dynamic tensor analysis KDD. ACM, 374--383. Google ScholarDigital Library
- Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, and Philip S. Yu . 2014. RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection ICDM. IEEE, 600--609. Google ScholarDigital Library
- Yiming Yang and Xin Liu . 1999. A Re-Examination of Text Categorization Methods. In SIGIR. ACM, 42--49. Google ScholarDigital Library
- Weiren Yu, Charu C Aggarwal, Shuai Ma, and Haixun Wang . 2013. On anomalous hotspot discovery in graph streams. In ICDM. IEEE, 1271--1276.Google Scholar
Index Terms
- SpotLight: Detecting Anomalies in Streaming Graphs
Recommendations
Vertex and Hyperedge Connectivity in Dynamic Graph Streams
PODS '15: Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsA growing body of work addresses the challenge of processing dynamic graph streams: a graph is defined by a sequence of edge insertions and deletions and the goal is to construct synopses and compute properties of the graph while using only limited ...
Dynamic Graph Stream Algorithms in o(n) Space
In this paper we study graph problems in the dynamic streaming model, where the input is defined by a sequence of edge insertions and deletions. As many natural problems require $$\varOmega (n)$$Ω(n) space, where n is the number of vertices, existing ...
Fast Distributed Algorithms for Connectivity and MST in Large Graphs
SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and ArchitecturesMotivated by the increasing need to understand the algorithmic foundations of distributed large-scale graph computations, we study a number of fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly ...
Comments