End-to-End Continual Rare-Class Recognition with Emerging Novel Subclasses

Authors:
Hung Nguyen

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Xuejian Wang

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Leman Akoglu

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 14 Issue 5Article No.: 61pp 1–28https://doi.org/10.1145/3399660

Published:05 August 2020Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Given a labeled dataset that contains a rare (or minority) class containing of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? The setting is different from traditional classification in that instances from novel minority subclasses might continually emerge over time—and hence is often referred as continual, life-long, or open-world classification. We introduce RaRecognize, which (i) estimates a general decision boundary between the rare class and the majority class, (ii) learns to recognize the individual rare subclasses that exist within the training data, as well as (iii) flags instances from previously unseen rare subclasses as newly emerging (i.e., novel). The learner in (i) is general in the sense that by construction it is dissimilar to the specialized learners in (ii), thus distinguishes minority from the majority without overly tuning to what is only seen in the training data. Thanks to this generality, RaRecognize ignores all future instances that it labels as majority and recognizes the recurring as well as emerging rare subclasses only. This saves effort at test time as well as ensures that the model size grows moderately over time as it only maintains specialized minority learners. Overall, we build an end-to-end system which consists of (1) a representation learning component that transforms data instances into suitable vector inputs; (2) a continual classifier that labels incoming instances as majority (not of interest), rare recurrent, or rare emerging; and (3) a clustering component that groups the rare emerging instances into novel subclasses for expert vetting and model re-training. Through extensive experiments, we show that RaRecognize outperforms state-of-the art baselines on three real-world datasets that contain documents related to corporate-risk and (natural and man-made) disasters as rare classes.

References

Charu C. Aggarwal. 2013. A survey of stream clustering algorithms. In Data Clustering: Algorithms and Applications, C. Aggarwal and C. Reddy (Eds.). CRC Press.Google Scholar
Fabrizio Angiulli and Fabio Fassetti. 2010. Distance-based outlier queries in data streams: The novel task and algorithms. Data Mining and Knowledge Discovery 20, 2 (2010), 290--324.Google ScholarDigital Library
August A. Balkema and Laurens De Haan. 1974. Residual life time at great age. The Annals of Probability 2, 5 (1974), 792--804. DOI:10.1214/aop/1176996548Google ScholarCross Ref
Zhiyuan Chen and Bing Liu. 2016. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 10, 3 (2016), 1--145.Google ScholarCross Ref
Elaine R. Faria, João Gama, and André C. P. L. F. Carvalho. 2013. Novelty detection algorithm for data streams multi-class problems. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. ACM, 795--800.Google Scholar
Robert French. 1999. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences 3, 4 (1999), 128--135. DOI:https://doi.org/10.1016/S1364-6613(99)01294-2Google ScholarCross Ref
Ronald Kemker and Christopher Kanan. 2018. FearNet: Brain-inspired model for incremental learning. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=SJ1Xmf-Rb.Google Scholar
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarCross Ref
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, and D. Hassabis. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences of the United States of America 114, 13 (2017), 3521--3526.Google ScholarCross Ref
Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2011. Continuous monitoring of distance-based outliers over data streams. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering. IEEE, 135--146.Google ScholarDigital Library
Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. Vol. 14. 1188--1196.Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (May 2015), 436--444. DOI:https://doi.org/10.1038/nature14539Google ScholarCross Ref
Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Overcoming catastrophic forgetting by incremental moment matching. In Proceedings of the Conference on Neural Information Processing Systems. 4652--4662.Google Scholar
Emaad Manzoor, Hemank Lamba, and Leman Akoglu. 2018. Extremely fast decision tree. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery 8 Data Mining. ACM, 1963--1972.Google Scholar
Xin Mu, Kai Ming Ting, and Zhi-Hua Zhou. 2017. Classification under streaming emerging new classes: A solution using completely-random trees. IEEE Transactions on Knowledge and Data Engineering 29, 8 (2017) 1605--1618.Google ScholarDigital Library
Xin Mu, Feida Zhu, Juan Du, Ee-Peng Lim, and Zhi-Hua Zhou. 2017. Streaming classification with emerging new class by class matrix sketching. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Empirical Methods in Natural Language Processing. 1532--1543. Retrieved from http://www.aclweb.org/anthology/D14-1162.Google Scholar
Tomáš Pevnỳ. 2016. Loda: Lightweight on-line detector of anomalies. Machine Learning 102, 2 (2016), 275--304.Google ScholarDigital Library
Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H. Lampert. 2017. iCaRL: Incremental classifier and representation learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2001--2010.Google Scholar
Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. In Proceedings of the Conference on Neural Information Processing Systems. 2990--2999.Google Scholar
Lei Shu, Hu Xu, and Bing Liu. 2017. DOC: Deep open classification of text documents. In Proceedings of the Empirical Methods in Natural Language Processing.Google ScholarCross Ref
Lei Shu, Hu Xu, and Bing Liu. 2018. Unseen class discovery in open-world classification. arXiv preprint arXiv:1801.05609 (2018).Google Scholar
Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouet. 2017. Anomaly detection in streams with extreme value theory. In Proceedings of the SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 1067--1075.Google ScholarDigital Library
Eduardo J. Spinosa, André Ponce de Leon F. de Carvalho, and João Gama. 2007. Olindda: A cluster-based approach for detecting novelty and concept drift in data streams. In Proceedings of the 2007 ACM Symposium on Applied Computing. ACM, 448--452.Google ScholarDigital Library
Swee Chuan Tan, Kai Ming Ting, and Tony Fei Liu. 2011. Fast anomaly detection for streaming data. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Google Scholar
Ke Wu, Kun Zhang, Wei Fan, Andrea Edwards, and S. Yu Philip. 2014. RS-forest: A rapid density estimator for streaming anomaly detection. In Proceedings of the 2014 IEEE International Conference on Data Mining. IEEE, 600--609.Google Scholar
Hu Xu, Bing Liu, Lei Shu, and P. Yu. 2019. Open-world learning and application to product classification. In Proceedings of the World Wide Web Conference.Google Scholar
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Proceedings of the Conference on Neural Information Processing Systems. 649--657.Google Scholar

Index Terms

End-to-End Continual Rare-Class Recognition with Emerging Novel Subclasses
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Continual Rare-Class Recognition with Emerging Novel Subclasses
Machine Learning and Knowledge Discovery in Databases
Abstract
Given a labeled dataset that contains a rare (or minority) class of of-interest instances, as well as a large class of instances that are not of interest, how can we learn to recognize future of-interest instances over a continuous stream? We ...
Read More
Discriminative Distillation to Reduce Class Confusion in Continual Learning
Pattern Recognition and Computer Vision
Abstract
Successful continual learning of new knowledge would enable intelligent systems to recognize more and more classes of objects. However, current intelligent systems often fail to correctly recognize previously learned classes of objects when ...
Read More
Semi-supervised Continual Learning with Meta Self-training
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Continual learning (CL) aims to enhance sequential learning by alleviating the forgetting of previously acquired knowledge. Recent advances in CL lack consideration of the real-world scenarios, where labeled data are scarce and unlabeled data are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 14, Issue 5
Special Issue on KDD 2018, Regular Papers and Survey Paper
October 2020
376 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3407672
Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
Minginglamp Academy of Sciences, China
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 August 2020
- Accepted: 1 May 2020
- Revised: 1 January 2020
- Received: 1 July 2019
Published in tkdd Volume 14, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Continual learning
classification with emerging classes
generalization to novel classes
open-world classification
rare class recognition
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 251
  Total Downloads
- Downloads (Last 12 months)54
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

End-to-End Continual Rare-Class Recognition with Emerging Novel Subclasses

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Continual Rare-Class Recognition with Emerging Novel Subclasses

Discriminative Distillation to Reduce Class Confusion in Continual Learning

Semi-supervised Continual Learning with Meta Self-training