ABSTRACT
Online retailers execute a very large number of price updates when compared to brick-and-mortar stores. Even a few mis-priced items can have a significant business impact and result in a loss of customer trust. Early detection of anomalies in an automated real-time fashion is an important part of such a pricing system. In this paper, we describe unsupervised and supervised anomaly detection approaches we developed and deployed for a large-scale online pricing system at Walmart. Our system detects anomalies both in batch and real-time streaming settings, and the items flagged are reviewed and actioned based on priority and business impact. We found that having the right architecture design was critical to facilitate model performance at scale, and business impact and speed were important factors influencing model selection, parameter choice, and prioritization in a production environment for a large-scale system. We conducted analyses on the performance of various approaches on a test set using real-world retail data and fully deployed our approach into production. We found that our approach was able to detect the most important anomalies with high precision.
Supplemental Material
- Charu C. Aggarwal. 2016. Outlier Analysis 2nd ed.). Springer Publishing Company, Incorporated. Google ScholarDigital Library
- Subutai Ahmad and Scott Purdy. 2016. Real-Time Anomaly Detection for Streaming Analytics. CoRR , Vol. abs/1607.02480 (2016).Google Scholar
- Fabrizio Angiulli and Clara Pizzuti. 2002. Fast Outlier Detection in High Dimensional Spaces. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD '02). Springer-Verlag, London, UK, UK, 15--26. http://dl.acm.org/citation.cfm?id=645806.670167 Google ScholarCross Ref
- Anodot. {n.d.}. Nipping it in the Bud: How real-time anomaly detection can prevent e-commerce glitches from becoming disasters. https://www.anodot.com/blog/real-time-anomaly-detection-can-prevent-ecommerce-retail-glitches/.Google Scholar
- Leo Breiman. 2001. Random Forests. Mach. Learn. , Vol. 45, 1 (Oct. 2001), 5--32. Google ScholarDigital Library
- Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying Density-based Local Outliers. SIGMOD Rec. , Vol. 29, 2 (May 2000), 93--104. Google ScholarDigital Library
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA, 785--794. Google ScholarDigital Library
- Miroslav Dudik, David M. Blei, and Robert E. Schapire. 2007. Hierarchical Maximum Entropy Density Estimation. In Proceedings of the 24th International Conference on Machine Learning (ICML '07). ACM, New York, NY, USA, 249--256. Google ScholarDigital Library
- Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Ann. Statist. , Vol. 29, 5 (10 2001), 1189--1232.Google Scholar
- Huiyuan Fu, Huadong Ma, and Anlong Ming. 2011. EGMM: An enhanced Gaussian mixture model for detecting moving objects with intermittent stops. Proceedings - IEEE International Conference on Multimedia and Expo, 1--6. Google ScholarDigital Library
- Ben D. Fulcher and Nick S. Jones. 2014. Highly Comparative Feature-Based Time-Series Classification. IEEE Transactions on Knowledge and Data Engineering , Vol. 26 (2014), 3026--3037.Google ScholarCross Ref
- Nico Görnitz, Marius Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward Supervised Anomaly Detection. J. Artif. Int. Res. , Vol. 46, 1 (Jan. 2013), 235--262. http://dl.acm.org/citation.cfm?id=2512538.2512545 Google ScholarDigital Library
- Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Yang, Huizhong Duan, Qing Zhang, Nick Barrow-Williams, Bradley C. Turnbull, Brendan M. Collins, and Thomas Legrand. 2018. Applying Deep Learning To Airbnb Search. CoRR , Vol. abs/1810.09591 (2018). arxiv: 1810.09591 http://arxiv.org/abs/1810.09591Google Scholar
- R. J. Hyndman, E. Wang, and N. Laptev. 2015. Large-Scale Unusual Time Series Detection. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW). 1616--1619. Google ScholarDigital Library
- Sevvandi Kandanaarachchi, Mario A Munoz, Rob J Hyndman, and Kate Smith-Miles. 2018. On normalization and algorithm selection for unsupervised outlier detection. Monash Econometrics and Business Statistics Working Papers 16/18. Monash University, Department of Econometrics and Business Statistics. https://ideas.repec.org/p/msh/ebswps/2018--16.htmlGoogle Scholar
- JooSeuk Kim and Clayton D. Scott. 2011. Robust Kernel Density Estimation. Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on , Vol. 13 (07 2011).Google Scholar
- Diederik Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations (12 2014).Google Scholar
- Hans-Peter Kriegel, Matthias Schubert, and Arthur Zimek. 2008. Angle-based Outlier Detection in High-dimensional Data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08). ACM, New York, NY, USA, 444--452. Google ScholarDigital Library
- Nikolay Laptev. 2018. AnoGen: Deep Anomaly Generator. Technical Report. Facebook. https://research.fb.com/wp-content/uploads/2018/11/AnoGen-Deep-Anomaly-Generator.pdf?Google Scholar
- Nikolay Laptev, Saeed Amizadeh, and Ian Flint. 2015. Generic and Scalable Framework for Automated Time-series Anomaly Detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). ACM, New York, NY, USA, 1939--1947. Google ScholarDigital Library
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM '08). IEEE Computer Society, Washington, DC, USA, 413--422.Google ScholarDigital Library
- Travis Oliphant. 2006--. NumPy: A guide to NumPy . USA: Trelgol Publishing. http://www.numpy.org/ {Online; accessed today}.Google Scholar
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research , Vol. 12 (2011), 2825--2830. Google ScholarDigital Library
- Tomávs Pevn? 2016. Loda: Lightweight On-line Detector of Anomalies. Mach. Learn. , Vol. 102, 2 (Feb. 2016), 275--304. Google ScholarDigital Library
- Maheshkumar R Sabhnani, Daniel B Neill, and Andrew W Moore. 2005. Detecting anomalous patterns in pharmacy retail data. (01 2005).Google Scholar
- Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alexander J. Smola, and Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Computation , Vol. 13 (2001), 1443--1471. Google ScholarDigital Library
- Bernhard Schölkopf, Robert Williamson, Alex Smola, John Shawe-Taylor, and John Platt. 1999. Support Vector Method for Novelty Detection. In Proceedings of the 12th International Conference on Neural Information Processing Systems (NIPS'99). MIT Press, Cambridge, MA, USA, 582--588. http://dl.acm.org/citation.cfm?id=3009657.3009740 Google ScholarDigital Library
- Dominique Shipmon, Jason Gurevitch, Paolo M Piselli, and Steve Edwards. 2017. Time Series Anomaly Detection: Detection of Anomalous Drops with Limited Features and Sparse Examples in Noisy Periodic Data . Technical Report. Google Inc. https://arxiv.org/abs/1708.03665Google Scholar
- Md Amran Siddiqui, Alan Fern, Thomas G. Dietterich, and Weng-Keen Wong. 2019. Sequential Feature Explanations for Anomaly Detection. ACM Trans. Knowl. Discov. Data , Vol. 13, 1, Article 1 (Jan. 2019), bibinfonumpages22 pages. Google ScholarDigital Library
- Karanjit Singh and Shuchita Upadhyaya. 2012. Outlier Detection: Applications And Techniques. International Journal of Computer Science Issues , Vol. 9 (01 2012).Google Scholar
- David M.J. Tax and Robert P.W. Duin. 2004. Support Vector Data Description. Machine Learning , Vol. 54, 1 (01 Jan 2004), 45--66. Google ScholarDigital Library
- Owen Vallis, Jordan Hochenbaum, and Arun Kejariwal. 2014. A Novel Technique for Long-Term Anomaly Detection in the Cloud. In 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14). USENIX Association, Philadelphia, PA. https://www.usenix.org/conference/hotcloud14/workshop-program/presentation/vallis Google ScholarDigital Library
- Houssam Zenati, Chuan Sheng Foo, Bruno Lecouat, Gaurav Manek, and Vijay Ramaseshan Chandrasekhar. 2018. Efficient GAN-Based Anomaly Detection. CoRR , Vol. abs/1802.06222 (2018). arxiv: 1802.06222 http://arxiv.org/abs/1802.06222Google Scholar
- Shuangfei Zhai, Yu Cheng, Weining Lu, and Zhongfei Zhang. 2016. Deep Structured Energy Based Models for Anomaly Detection. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1100--1109. http://dl.acm.org/citation.cfm?id=3045390.3045507 Google ScholarDigital Library
- Yue Zhao, Zain Nasrullah, and Zheng Li. 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. arXiv preprint arXiv:1901.01588 (2019). https://arxiv.org/abs/1901.01588Google Scholar
- Lingxue Zhu and Nikolay Laptev. 2017. Deep and Confident Prediction for Time Series at Uber. 103--110.Google Scholar
Index Terms
- Anomaly Detection for an E-commerce Pricing System
Recommendations
Pricing games of mixed conventional and e-commerce distribution channels
In this paper, a distribution system is studied, in which a supplier sells a common product through conventional (physical retailer) and e-commerce (e-tailers) channels. We examine two types of Stackelberg pricing games and one type of Nash pricing game ...
A Model of Internet Pricing Under Price-Comparison Shopping
An empirical regularity in the price-promotion behavior of retailers of homogenous goods is explained theoretically. Based on this, a model is proposed for price competition in a market for a homogenous good with many asymmetrically positioned ...
Pricing Under Dynamic Competition When Loyal Consumers Stockpile
Managers, let stockpiling be but at a higher price—don’t hope to cut stockpiling by lopping off promotions.
One goal of promotions for frequently purchased products is increasing short-term sales. Increases could be at competitors’ expense, coming from consumers with relatively weak brand preferences. However, increased sales from brand-loyal consumers could ...
Comments