skip to main content
10.1145/3219819.3220022acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams

Authors Info & Claims
Published:19 July 2018Publication History

ABSTRACT

With precipitously growing demand to detect outliers in data streams, many studies have been conducted aiming to develop extensions of well-known outlier detection algorithm called Local Outlier Factor (LOF), for data streams. However, existing LOF-based algorithms for data streams still suffer from two inherent limitations: 1) Large amount of memory space is required. 2) A long sequence of outliers is not detected. In this paper, we propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations. To this end, we first develop a novel density-based sampling algorithm to summarize past data and then propose a new strategy for detecting a sequence of outliers. It is worth noting that our proposing algorithms do not require any prior knowledge or assumptions on data distribution. Moreover, we accelerate the execution time of DILOF about 15 times by developing a powerful distance approximation technique. Our comprehensive experiments on real-world datasets demonstrate that DILOF significantly outperforms the state-of-the-art competitors in terms of accuracy and execution time. The source code for the proposed algorithm is available at our website: http://di.postech.ac.kr/DILOF.

References

  1. C. C. Aggarwal and S. Sathe. 2015. Theoretical Foundations and Algorithms for Outlier Ensembles. ACM SIGKDD Exploarations Newsletter 17, 1 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Arjovsky, S. Chintala, and L. Bottou. 2017. Wasserstein GAN. arXiv:1701.07875 (2017).Google ScholarGoogle Scholar
  3. M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. 2000. LOF: Identifying Density-based Local Outliers. In SIGMOD. ACM, 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Grauman and T. Darrell. 2004. Fast Counter Matching Using Approximate Earth Mover's Distance. In CVPR.Google ScholarGoogle Scholar
  5. A. K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Kollios, D. Gunopulos, N. Koudas, and S. Berchtold. 2003. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering 15, 5 (2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Mu, F. Zhu, J. Du, E. Lim, and Z. Zhou. 2017. Streaming Classification with Emerging New Class by Class Matrix Sketching. In Proceedings of the Thiry-First AAAI Conference on Artificial Intelligence. 2373--2379.Google ScholarGoogle Scholar
  8. B. Poczos, L. Xiong, and J. Schneider. 2011. Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. 599--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Pokrajac, A. Lazarevic, and L. J. Latecki. 2007. Incremental Local Outlier Detection for Data Streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. 504--515.Google ScholarGoogle Scholar
  10. F. Ros and S. Guillaume. 2016. DENDIS: A new density-based sampling for clustering algorithm. Expert Systems With Applications 56, 1 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang. 2016. Fast Memory Efficient Local Outlier Detection in Data Streams. IEEE Transactions on Knowledge and Data Engineering 28, 12 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. 2013. Earth Mover's Distance based Similarity Search at Scale. Proceedings of the VLDB Endowment 7, 4 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Xiao, B. Liu, Z. Hao, and L. Cao. 2014. A K-Farthest-Neighbor-based approach for support vector data description. Applied Intelligence 41, 1 (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Yamanishi, J. Takeuchi, and G. Williams. 2000. On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. In KDD. 320--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Yan, L. Cao, C. Kuhlman, and E. A. Rundensteiner. 2017. Distributed Local Outlier Detection in Big Data. In KDD. 1225--1234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Yan, L. Cao, and E. A. Rundensteiner. 2017. Scalable Top-n Local Outlier Detection. In KDD. 1235--1244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Yang, E. A. Rundensteiner, and M. O. Ward. 2011. Summarization and matching of density-based clusters in streaming environments. Proceedings of the VLDB Endowment 5, 2 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          July 2018
          2925 pages
          ISBN:9781450355520
          DOI:10.1145/3219819

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 July 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader