Abstract
Given a labeled dataset that contains a rare (or minority) class containing of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? The setting is different from traditional classification in that instances from novel minority subclasses might continually emerge over time—and hence is often referred as continual, life-long, or open-world classification. We introduce RaRecognize, which (i) estimates a general decision boundary between the rare class and the majority class, (ii) learns to recognize the individual rare subclasses that exist within the training data, as well as (iii) flags instances from previously unseen rare subclasses as newly emerging (i.e., novel). The learner in (i) is general in the sense that by construction it is dissimilar to the specialized learners in (ii), thus distinguishes minority from the majority without overly tuning to what is only seen in the training data. Thanks to this generality, RaRecognize ignores all future instances that it labels as majority and recognizes the recurring as well as emerging rare subclasses only. This saves effort at test time as well as ensures that the model size grows moderately over time as it only maintains specialized minority learners. Overall, we build an end-to-end system which consists of (1) a representation learning component that transforms data instances into suitable vector inputs; (2) a continual classifier that labels incoming instances as majority (not of interest), rare recurrent, or rare emerging; and (3) a clustering component that groups the rare emerging instances into novel subclasses for expert vetting and model re-training. Through extensive experiments, we show that RaRecognize outperforms state-of-the art baselines on three real-world datasets that contain documents related to corporate-risk and (natural and man-made) disasters as rare classes.
- Charu C. Aggarwal. 2013. A survey of stream clustering algorithms. In Data Clustering: Algorithms and Applications, C. Aggarwal and C. Reddy (Eds.). CRC Press.Google Scholar
- Fabrizio Angiulli and Fabio Fassetti. 2010. Distance-based outlier queries in data streams: The novel task and algorithms. Data Mining and Knowledge Discovery 20, 2 (2010), 290--324.Google ScholarDigital Library
- August A. Balkema and Laurens De Haan. 1974. Residual life time at great age. The Annals of Probability 2, 5 (1974), 792--804. DOI:10.1214/aop/1176996548Google ScholarCross Ref
- Zhiyuan Chen and Bing Liu. 2016. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 10, 3 (2016), 1--145.Google ScholarCross Ref
- Elaine R. Faria, João Gama, and André C. P. L. F. Carvalho. 2013. Novelty detection algorithm for data streams multi-class problems. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM, 795--800.Google Scholar
- Robert French. 1999. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences 3, 4 (1999), 128--135. DOI:https://doi.org/10.1016/S1364-6613(99)01294-2Google ScholarCross Ref
- Ronald Kemker and Christopher Kanan. 2018. FearNet: Brain-inspired model for incremental learning. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=SJ1Xmf-Rb.Google Scholar
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
- James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, and D. Hassabis. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America 114, 13 (2017), 3521--3526.Google ScholarCross Ref
- Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2011. Continuous monitoring of distance-based outliers over data streams. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering. IEEE, 135--146.Google ScholarDigital Library
- Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. Vol. 14. 1188--1196.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (May 2015), 436--444. DOI:https://doi.org/10.1038/nature14539Google ScholarCross Ref
- Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Overcoming catastrophic forgetting by incremental moment matching. In Proceedings of the Conference on Neural Information Processing Systems. 4652--4662.Google Scholar
- Emaad Manzoor, Hemank Lamba, and Leman Akoglu. 2018. Extremely fast decision tree. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. ACM, 1963--1972.Google Scholar
- Xin Mu, Kai Ming Ting, and Zhi-Hua Zhou. 2017. Classification under streaming emerging new classes: A solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering 29, 8 (2017) 1605--1618.Google ScholarDigital Library
- Xin Mu, Feida Zhu, Juan Du, Ee-Peng Lim, and Zhi-Hua Zhou. 2017. Streaming classification with emerging new class by class matrix sketching. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Empirical Methods in Natural Language Processing. 1532--1543. Retrieved from http://www.aclweb.org/anthology/D14-1162.Google Scholar
- Tomáš Pevnỳ. 2016. Loda: Lightweight on-line detector of anomalies. Machine Learning 102, 2 (2016), 275--304.Google ScholarDigital Library
- Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2001--2010.Google Scholar
- Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. In Proceedings of the Conference on Neural Information Processing Systems. 2990--2999.Google Scholar
- Lei Shu, Hu Xu, and Bing Liu. 2017. DOC: Deep open classification of text documents. In Proceedings of the Empirical Methods in Natural Language Processing.Google ScholarCross Ref
- Lei Shu, Hu Xu, and Bing Liu. 2018. Unseen class discovery in open-world classification. arXiv preprint arXiv:1801.05609 (2018).Google Scholar
- Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouet. 2017. Anomaly detection in streams with extreme value theory. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 1067--1075.Google ScholarDigital Library
- Eduardo J. Spinosa, André Ponce de Leon F. de Carvalho, and João Gama. 2007. Olindda: A cluster-based approach for detecting novelty and concept drift in data streams. In Proceedings of the 2007 ACM Symposium on Applied Computing. ACM, 448--452.Google ScholarDigital Library
- Swee Chuan Tan, Kai Ming Ting, and Tony Fei Liu. 2011. Fast anomaly detection for streaming data. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Google Scholar
- Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, and S. Yu Philip. 2014. RS-forest: A rapid density estimator for streaming anomaly detection. In Proceedings of the 2014 IEEE International Conference on Data Mining. IEEE, 600--609.Google Scholar
- Hu Xu, Bing Liu, Lei Shu, and P. Yu. 2019. Open-world learning and application to product classification. In Proceedings of the World Wide Web Conference.Google Scholar
- Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Proceedings of the Conference on Neural Information Processing Systems. 649--657.Google Scholar
Index Terms
- End-to-End Continual Rare-Class Recognition with Emerging Novel Subclasses
Recommendations
Continual Rare-Class Recognition with Emerging Novel Subclasses
Machine Learning and Knowledge Discovery in DatabasesAbstractGiven a labeled dataset that contains a rare (or minority) class of of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? We ...
Discriminative Distillation to Reduce Class Confusion in Continual Learning
Pattern Recognition and Computer VisionAbstractSuccessful continual learning of new knowledge would enable intelligent systems to recognize more and more classes of objects. However, current intelligent systems often fail to correctly recognize previously learned classes of objects when ...
Semi-supervised Continual Learning with Meta Self-training
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementContinual learning (CL) aims to enhance sequential learning by alleviating the forgetting of previously acquired knowledge. Recent advances in CL lack consideration of the real-world scenarios, where labeled data are scarce and unlabeled data are ...
Comments