research-article

DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams

Authors:
Gyoung S. Na

POSTECH (Pohang University of Science and Technology), Pohang, South Korea

POSTECH (Pohang University of Science and Technology), Pohang, South Korea
View Profile

,
Donghyun Kim

POSTECH (Pohang University of Science and Technology), Pohang, South Korea

POSTECH (Pohang University of Science and Technology), Pohang, South Korea
View Profile

,
Hwanjo Yu

POSTECH (Pohang University of Science and Technology), Pohang, South Korea

POSTECH (Pohang University of Science and Technology), Pohang, South Korea
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 1993–2002https://doi.org/10.1145/3219819.3220022

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1993–2002

ABSTRACT

With precipitously growing demand to detect outliers in data streams, many studies have been conducted aiming to develop extensions of well-known outlier detection algorithm called Local Outlier Factor (LOF), for data streams. However, existing LOF-based algorithms for data streams still suffer from two inherent limitations: 1) Large amount of memory space is required. 2) A long sequence of outliers is not detected. In this paper, we propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations. To this end, we first develop a novel density-based sampling algorithm to summarize past data and then propose a new strategy for detecting a sequence of outliers. It is worth noting that our proposing algorithms do not require any prior knowledge or assumptions on data distribution. Moreover, we accelerate the execution time of DILOF about 15 times by developing a powerful distance approximation technique. Our comprehensive experiments on real-world datasets demonstrate that DILOF significantly outperforms the state-of-the-art competitors in terms of accuracy and execution time. The source code for the proposed algorithm is available at our website: http://di.postech.ac.kr/DILOF.

References

C. C. Aggarwal and S. Sathe. 2015. Theoretical Foundations and Algorithms for Outlier Ensembles. ACM SIGKDD Exploarations Newsletter 17, 1 (2015). Google ScholarDigital Library
M. Arjovsky, S. Chintala, and L. Bottou. 2017. Wasserstein GAN. arXiv:1701.07875 (2017).Google Scholar
M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. 2000. LOF: Identifying Density-based Local Outliers. In SIGMOD. ACM, 93--104. Google ScholarDigital Library
K. Grauman and T. Darrell. 2004. Fast Counter Matching Using Approximate Earth Mover's Distance. In CVPR.Google Scholar
A. K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010). Google ScholarDigital Library
G. Kollios, D. Gunopulos, N. Koudas, and S. Berchtold. 2003. Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Transactions on Knowledge and Data Engineering 15, 5 (2003). Google ScholarDigital Library
X. Mu, F. Zhu, J. Du, E. Lim, and Z. Zhou. 2017. Streaming Classification with Emerging New Class by Class Matrix Sketching. In Proceedings of the Thiry-First AAAI Conference on Artificial Intelligence. 2373--2379.Google Scholar
B. Poczos, L. Xiong, and J. Schneider. 2011. Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence. 599--608. Google ScholarDigital Library
D. Pokrajac, A. Lazarevic, and L. J. Latecki. 2007. Incremental Local Outlier Detection for Data Streams. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining. 504--515.Google Scholar
F. Ros and S. Guillaume. 2016. DENDIS: A new density-based sampling for clustering algorithm. Expert Systems With Applications 56, 1 (2016). Google ScholarDigital Library
M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Zhang. 2016. Fast Memory Efficient Local Outlier Detection in Data Streams. IEEE Transactions on Knowledge and Data Engineering 28, 12 (2016). Google ScholarDigital Library
Y. Tang, L. H. U, Y. Cai, N. Mamoulis, and R. Cheng. 2013. Earth Mover's Distance based Similarity Search at Scale. Proceedings of the VLDB Endowment 7, 4 (2013). Google ScholarDigital Library
Y. Xiao, B. Liu, Z. Hao, and L. Cao. 2014. A K-Farthest-Neighbor-based approach for support vector data description. Applied Intelligence 41, 1 (2014). Google ScholarDigital Library
K. Yamanishi, J. Takeuchi, and G. Williams. 2000. On-line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. In KDD. 320--324. Google ScholarDigital Library
Y. Yan, L. Cao, C. Kuhlman, and E. A. Rundensteiner. 2017. Distributed Local Outlier Detection in Big Data. In KDD. 1225--1234. Google ScholarDigital Library
Y. Yan, L. Cao, and E. A. Rundensteiner. 2017. Scalable Top-n Local Outlier Detection. In KDD. 1235--1244. Google ScholarDigital Library
D. Yang, E. A. Rundensteiner, and M. O. Ward. 2011. Summarization and matching of density-based clusters in streaming environments. Proceedings of the VLDB Endowment 5, 2 (2011). Google ScholarDigital Library

Index Terms

DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Anomaly detection
  2. Symbolic and algebraic manipulation
    1. Symbolic and algebraic algorithms
      1. Optimization algorithms
2. Mathematics of computing
  1. Discrete mathematics
    1. Combinatorics
      1. Combinatorial optimization

Recommendations

xStream: Outlier Detection in Feature-Evolving Data Streams
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

This work addresses the outlier detection problem for feature-evolving streams, which has not been studied before. In this setting both (1) data points may evolve, with feature values changing, as well as (2) feature space may evolve, with newly-...
Read More
Detecting anomalies from high-dimensional wireless network data streams: a case study
Special issue on Recent advances on machine learning and Cybernetics

In this paper, we study the problem of anomaly detection in wireless network streams. We have developed a new technique, called Stream Projected Outlier deTector (SPOT), to deal with the problem of anomaly detection from multi-dimensional or high-...
Read More
Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Anomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data streams
density-based sampling
outlier detection
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 49
  Total Citations
  View Citations
- 1,814
  Total Downloads
- Downloads (Last 12 months)116
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

xStream: Outlier Detection in Feature-Evolving Data Streams

Detecting anomalies from high-dimensional wireless network data streams: a case study

Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

xStream: Outlier Detection in Feature-Evolving Data Streams

Detecting anomalies from high-dimensional wireless network data streams: a case study

Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media