ABSTRACT
With precipitously growing demand to detect outliers in data streams, many studies have been conducted aiming to develop extensions of well-known outlier detection algorithm called Local Outlier Factor (LOF), for data streams. However, existing LOF-based algorithms for data streams still suffer from two inherent limitations: 1) Large amount of memory space is required. 2) A long sequence of outliers is not detected. In this paper, we propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations. To this end, we first develop a novel density-based sampling algorithm to summarize past data and then propose a new strategy for detecting a sequence of outliers. It is worth noting that our proposing algorithms do not require any prior knowledge or assumptions on data distribution. Moreover, we accelerate the execution time of DILOF about 15 times by developing a powerful distance approximation technique. Our comprehensive experiments on real-world datasets demonstrate that DILOF significantly outperforms the state-of-the-art competitors in terms of accuracy and execution time. The source code for the proposed algorithm is available at our website: http://di.postech.ac.kr/DILOF.
- C. C. Aggarwal and S. Sathe. 2015. Theoretical Foundations and Algorithms for Outlier Ensembles. ACM SIGKDD Exploarations Newsletter 17, 1 (2015). Google ScholarDigital Library
- M. Arjovsky, S. Chintala, and L. Bottou. 2017. Wasserstein GAN. arXiv:1701.07875 (2017).Google Scholar
- M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. 2000. LOF: Identifying Density-based Local Outliers. In SIGMOD. ACM, 93--104. Google ScholarDigital Library
- K. Grauman and T. Darrell. 2004. Fast Counter Matching Using Approximate Earth Mover's Distance. In CVPR.Google Scholar
- A. K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010). Google ScholarDigital Library
- G. Kollios, D. Gunopulos, N. Koudas, and S. Berchtold. 2003. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering 15, 5 (2003). Google ScholarDigital Library
- X. Mu, F. Zhu, J. Du, E. Lim, and Z. Zhou. 2017. Streaming Classification with Emerging New Class by Class Matrix Sketching. In Proceedings of the Thiry-First AAAI Conference on Artificial Intelligence. 2373--2379.Google Scholar
- B. Poczos, L. Xiong, and J. Schneider. 2011. Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. 599--608. Google ScholarDigital Library
- D. Pokrajac, A. Lazarevic, and L. J. Latecki. 2007. Incremental Local Outlier Detection for Data Streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. 504--515.Google Scholar
- F. Ros and S. Guillaume. 2016. DENDIS: A new density-based sampling for clustering algorithm. Expert Systems With Applications 56, 1 (2016). Google ScholarDigital Library
- M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang. 2016. Fast Memory Efficient Local Outlier Detection in Data Streams. IEEE Transactions on Knowledge and Data Engineering 28, 12 (2016). Google ScholarDigital Library
- Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. 2013. Earth Mover's Distance based Similarity Search at Scale. Proceedings of the VLDB Endowment 7, 4 (2013). Google ScholarDigital Library
- Y. Xiao, B. Liu, Z. Hao, and L. Cao. 2014. A K-Farthest-Neighbor-based approach for support vector data description. Applied Intelligence 41, 1 (2014). Google ScholarDigital Library
- K. Yamanishi, J. Takeuchi, and G. Williams. 2000. On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. In KDD. 320--324. Google ScholarDigital Library
- Y. Yan, L. Cao, C. Kuhlman, and E. A. Rundensteiner. 2017. Distributed Local Outlier Detection in Big Data. In KDD. 1225--1234. Google ScholarDigital Library
- Y. Yan, L. Cao, and E. A. Rundensteiner. 2017. Scalable Top-n Local Outlier Detection. In KDD. 1235--1244. Google ScholarDigital Library
- D. Yang, E. A. Rundensteiner, and M. O. Ward. 2011. Summarization and matching of density-based clusters in streaming environments. Proceedings of the VLDB Endowment 5, 2 (2011). Google ScholarDigital Library
Index Terms
- DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams
Recommendations
xStream: Outlier Detection in Feature-Evolving Data Streams
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningThis work addresses the outlier detection problem for feature-evolving streams, which has not been studied before. In this setting both (1) data points may evolve, with feature values changing, as well as (2) feature space may evolve, with newly-...
Detecting anomalies from high-dimensional wireless network data streams: a case study
Special issue on Recent advances on machine learning and CyberneticsIn this paper, we study the problem of anomaly detection in wireless network streams. We have developed a new technique, called Stream Projected Outlier deTector (SPOT), to deal with the problem of anomaly detection from multi-dimensional or high-...
Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataAnomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an ...
Comments