skip to main content
10.1145/3292500.3330748acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Anomaly Detection for an E-commerce Pricing System

Published:25 July 2019Publication History

ABSTRACT

Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approaches we developed and deployed for a large-scale online pricing system at Walmart. Our system detects anomalies both in batch and real-time streaming settings, and the items flagged are reviewed and actioned based on priority and business impact. We found that having the right architecture design was critical to facilitate model performance at scale, and business impact and speed were important factors influencing model selection, parameter choice, and prioritization in a production environment for a large-scale system. We conducted analyses on the performance of various approaches on a test set using real-world retail data and fully deployed our approach into production. We found that our approach was able to detect the most important anomalies with high precision.

Skip Supplemental Material Section

Supplemental Material

p1917-ramakrishnan.mp4

mp4

949.9 MB

References

  1. Charu C. Aggarwal. 2016. Outlier Analysis 2nd ed.). Springer Publishing Company, Incorporated. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Subutai Ahmad and Scott Purdy. 2016. Real-Time Anomaly Detection for Streaming Analytics. CoRR , Vol. abs/1607.02480 (2016).Google ScholarGoogle Scholar
  3. Fabrizio Angiulli and Clara Pizzuti. 2002. Fast Outlier Detection in High Dimensional Spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD '02). Springer-Verlag, London, UK, UK, 15--26. http://dl.acm.org/citation.cfm?id=645806.670167 Google ScholarGoogle ScholarCross RefCross Ref
  4. Anodot. {n.d.}. Nipping it in the Bud: How real-time anomaly detection can prevent e-commerce glitches from becoming disasters. https://www.anodot.com/blog/real-time-anomaly-detection-can-prevent-ecommerce-retail-glitches/.Google ScholarGoogle Scholar
  5. Leo Breiman. 2001. Random Forests. Mach. Learn. , Vol. 45, 1 (Oct. 2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-based Local Outliers. SIGMOD Rec. , Vol. 29, 2 (May 2000), 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 785--794. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Miroslav Dudik, David M. Blei, and Robert E. Schapire. 2007. Hierarchical Maximum Entropy Density Estimation. In Proceedings of the 24th International Conference on Machine Learning (ICML '07). ACM, New York, NY, USA, 249--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Ann. Statist. , Vol. 29, 5 (10 2001), 1189--1232.Google ScholarGoogle Scholar
  10. Huiyuan Fu, Huadong Ma, and Anlong Ming. 2011. EGMM: An enhanced Gaussian mixture model for detecting moving objects with intermittent stops. Proceedings - IEEE International Conference on Multimedia and Expo, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ben D. Fulcher and Nick S. Jones. 2014. Highly Comparative Feature-Based Time-Series Classification. IEEE Transactions on Knowledge and Data Engineering , Vol. 26 (2014), 3026--3037.Google ScholarGoogle ScholarCross RefCross Ref
  12. Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward Supervised Anomaly Detection. J. Artif. Int. Res. , Vol. 46, 1 (Jan. 2013), 235--262. http://dl.acm.org/citation.cfm?id=2512538.2512545 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Yang, Huizhong Duan, Qing Zhang, Nick Barrow-Williams, Bradley C. Turnbull, Brendan M. Collins, and Thomas Legrand. 2018. Applying Deep Learning To Airbnb Search. CoRR , Vol. abs/1810.09591 (2018). arxiv: 1810.09591 http://arxiv.org/abs/1810.09591Google ScholarGoogle Scholar
  14. R. J. Hyndman, E. Wang, and N. Laptev. 2015. Large-Scale Unusual Time Series Detection. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW). 1616--1619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sevvandi Kandanaarachchi, Mario A Munoz, Rob J Hyndman, and Kate Smith-Miles. 2018. On normalization and algorithm selection for unsupervised outlier detection. Monash Econometrics and Business Statistics Working Papers 16/18. Monash University, Department of Econometrics and Business Statistics. https://ideas.repec.org/p/msh/ebswps/2018--16.htmlGoogle ScholarGoogle Scholar
  16. JooSeuk Kim and Clayton D. Scott. 2011. Robust Kernel Density Estimation. Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on , Vol. 13 (07 2011).Google ScholarGoogle Scholar
  17. Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (12 2014).Google ScholarGoogle Scholar
  18. Hans-Peter Kriegel, Matthias Schubert, and Arthur Zimek. 2008. Angle-based Outlier Detection in High-dimensional Data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 444--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nikolay Laptev. 2018. AnoGen: Deep Anomaly Generator. Technical Report. Facebook. https://research.fb.com/wp-content/uploads/2018/11/AnoGen-Deep-Anomaly-Generator.pdf?Google ScholarGoogle Scholar
  20. Nikolay Laptev, Saeed Amizadeh, and Ian Flint. 2015. Generic and Scalable Framework for Automated Time-series Anomaly Detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). ACM, New York, NY, USA, 1939--1947. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM '08). IEEE Computer Society, Washington, DC, USA, 413--422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Travis Oliphant. 2006--. NumPy: A guide to NumPy . USA: Trelgol Publishing. http://www.numpy.org/ {Online; accessed today}.Google ScholarGoogle Scholar
  23. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research , Vol. 12 (2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tomávs Pevn? 2016. Loda: Lightweight On-line Detector of Anomalies. Mach. Learn. , Vol. 102, 2 (Feb. 2016), 275--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Maheshkumar R Sabhnani, Daniel B Neill, and Andrew W Moore. 2005. Detecting anomalous patterns in pharmacy retail data. (01 2005).Google ScholarGoogle Scholar
  26. Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alexander J. Smola, and Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Computation , Vol. 13 (2001), 1443--1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bernhard Schölkopf, Robert Williamson, Alex Smola, John Shawe-Taylor, and John Platt. 1999. Support Vector Method for Novelty Detection. In Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS'99). MIT Press, Cambridge, MA, USA, 582--588. http://dl.acm.org/citation.cfm?id=3009657.3009740 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Dominique Shipmon, Jason Gurevitch, Paolo M Piselli, and Steve Edwards. 2017. Time Series Anomaly Detection: Detection of Anomalous Drops with Limited Features and Sparse Examples in Noisy Periodic Data . Technical Report. Google Inc. https://arxiv.org/abs/1708.03665Google ScholarGoogle Scholar
  29. Md Amran Siddiqui, Alan Fern, Thomas G. Dietterich, and Weng-Keen Wong. 2019. Sequential Feature Explanations for Anomaly Detection. ACM Trans. Knowl. Discov. Data , Vol. 13, 1, Article 1 (Jan. 2019), bibinfonumpages22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Karanjit Singh and Shuchita Upadhyaya. 2012. Outlier Detection: Applications And Techniques. International Journal of Computer Science Issues , Vol. 9 (01 2012).Google ScholarGoogle Scholar
  31. David M.J. Tax and Robert P.W. Duin. 2004. Support Vector Data Description. Machine Learning , Vol. 54, 1 (01 Jan 2004), 45--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Owen Vallis, Jordan Hochenbaum, and Arun Kejariwal. 2014. A Novel Technique for Long-Term Anomaly Detection in the Cloud. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14). USENIX Association, Philadelphia, PA. https://www.usenix.org/conference/hotcloud14/workshop-program/presentation/vallis Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, and Vijay Ramaseshan Chandrasekhar. 2018. Efficient GAN-Based Anomaly Detection. CoRR , Vol. abs/1802.06222 (2018). arxiv: 1802.06222 http://arxiv.org/abs/1802.06222Google ScholarGoogle Scholar
  34. Shuangfei Zhai, Yu Cheng, Weining Lu, and Zhongfei Zhang. 2016. Deep Structured Energy Based Models for Anomaly Detection. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1100--1109. http://dl.acm.org/citation.cfm?id=3045390.3045507 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yue Zhao, Zain Nasrullah, and Zheng Li. 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. arXiv preprint arXiv:1901.01588 (2019). https://arxiv.org/abs/1901.01588Google ScholarGoogle Scholar
  36. Lingxue Zhu and Nikolay Laptev. 2017. Deep and Confident Prediction for Time Series at Uber. 103--110.Google ScholarGoogle Scholar

Index Terms

  1. Anomaly Detection for an E-commerce Pricing System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        July 2019
        3305 pages
        ISBN:9781450362016
        DOI:10.1145/3292500

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 July 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader