Abstract
Empirical risk minimization (ERM) provides a principled guideline for many machine learning and data mining algorithms. Under the ERM principle, one minimizes an upper bound of the true risk, which is approximated by the summation of empirical risk and the complexity of the candidate classifier class. To guarantee a satisfactory learning performance, ERM requires that the training data are i.i.d. sampled from the unknown source distribution. However, this may not be the case in active learning, where one selects the most informative samples to label, and these data may not follow the source distribution. In this article, we generalize the ERM principle to the active learning setting. We derive a novel form of upper bound for the true risk in the active learning setting; by minimizing this upper bound, we develop a practical batch mode active learning method. The proposed formulation involves a nonconvex integer programming optimization problem. We solve it efficiently by an alternating optimization method. Our method is shown to query the most informative samples while preserving the source distribution as much as possible, thus identifying the most uncertain and representative queries. We further extend our method to multiclass active learning by introducing novel pseudolabels in the multiclass case and developing an efficient algorithm. Experiments on benchmark datasets and real-world applications demonstrate the superior performance of our proposed method compared to state-of-the-art methods.
- Naoki Abe, Bianca Zadrozny, and John Langford. 2006. Outlier detection by active learning. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 504--509. Google ScholarDigital Library
- Peter L. Bartlett and Shahar Mendelson. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463--482. Google ScholarDigital Library
- Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. 2009. Importance weighted active learning. In Proceedings of the 26th International Conference on Machine Learning (ICML). 49--56. Google ScholarDigital Library
- James C. Bezdek and Richard J. Hathaway. 2003. Convergence of alternating optimization. Neural, Parallel, and Scientific Computations 11, 4, 351--368. Google ScholarDigital Library
- Karsten M. Borgwardt, Arthur Gretton, Malte J. Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J. Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, 14, 49--57. Google ScholarDigital Library
- Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3, 1--122. Google ScholarDigital Library
- Christopher J. C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121--167. Google ScholarDigital Library
- Colin Campbell, Nello Cristianini, and Alex J. Smola. 2000. Query learning with large margin classifiers. In Proceedings of the 17th International Conference on Machine Learning (ICML). 111--118. Google ScholarDigital Library
- Shayok Chakraborty, Vineeth Balasubramanian, and Sethuraman Panchanathan. 2011. Dynamic batch mode active learning. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2649--2656. Google ScholarDigital Library
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, 27:1--27:27. Google ScholarDigital Library
- Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien (Eds.). 2006. Semi-Supervised Learning. MIT Press, Cambridge, MA.Google Scholar
- Rita Chattopadhyay, Zheng Wang, Wei Fan, Ian Davidson, Sethuraman Panchanathan, and Jieping Ye. 2012. Batch mode active sampling based on marginal probability distribution matching. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 741--749. Google ScholarDigital Library
- Yuxin Chen and Andreas Krause. 2013. Near-optimal batch mode active learning and adaptive submodular optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML). 160--168.Google Scholar
- Yunmei Chen and Xiaojing Ye. 2011. Projection onto a simplex. arXiv preprint arXiv:1101.6081.Google Scholar
- David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 1, 129--145. Google ScholarDigital Library
- F. d’Alché Buc, Yves Grandvalet, and Christophe Ambroise. 2002. Semi-supervised MarginBoost. In Advances in Neural Information Processing Systems 14, 553--563.Google Scholar
- Sanjoy Dasgupta. 2011. Two faces of active learning. Theoretical Computer Science 412, 19, 1767--1781. Google ScholarDigital Library
- Richard M. Dudley. 2002. Real Analysis and Probability. Cambridge University Press.Google Scholar
- Andrew Frank and Arthur Asuncion. 2010. UCI Machine Learning Repository. Retrieved December 28, 2014, from http://archive.ics.uci.edu/ml.Google Scholar
- Yoav Freund, H. Sebastian Seung, Eli Shamir, and Naftali Tishby. 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2--3, 133--168. Google ScholarDigital Library
- Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. Journal of Machine Learning Research 13, 723--773. Google ScholarDigital Library
- Yuhong Guo. 2010. Active instance sampling via matrix partition. In Advances in Neural Information Processing Systems 23, 802--810.Google Scholar
- Yuhong Guo and Dale Schuurmans. 2008. Discriminative batch mode active learning. In Advances in Neural Information Processing Systems 20, 593--600.Google Scholar
- Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2006a. Batch mode active learning and its application to medical image classification. In Proceedings of the 23rd International Conference on Machine Learning (ICML). 417--424. Google ScholarDigital Library
- Steven C. H. Hoi, Rong Jin, and Michael R. Lyu. 2006b. Large-scale text categorization by batch mode active learning. In Proceedings of the 15th International Conference on World Wide Web (WWW). 633--642. Google ScholarDigital Library
- Steven C. H. Hoi, Rong Jin, and Michael R. Lyu. 2009a. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering 21, 9, 1233--1248. Google ScholarDigital Library
- Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2008. Semi-supervised SVM batch mode active learning for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--7.Google Scholar
- Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2009b. Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems 27, 3, Article No. 16. Google ScholarDigital Library
- Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou. 2010. Active learning by querying informative and representative examples. In Advances in Neural Information Processing Systems 23, 892--900.Google Scholar
- Ajay J. Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2010. Multi-class batch-mode active learning for image classification. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation (ICRA). 1873--1878.Google ScholarCross Ref
- Yehuda Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 426--434. Google ScholarDigital Library
- Hieu T. Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML). 79--86. Google ScholarDigital Library
- Ryan Rifkin and Aldebaro Klautau. 2004. In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101--141. Google ScholarDigital Library
- Nicholas Roy and Andrew McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML). 441--448. Google ScholarDigital Library
- Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin--Madison.Google Scholar
- H. Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the 5th Annual Conference on Computational Learning Theory (COLT). 287--294. Google ScholarDigital Library
- Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. 2010. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research 11, 1517--1561. Google ScholarDigital Library
- Masashi Sugiyama. 2006. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research 7, 141--166. Google ScholarDigital Library
- Simon Tong and Daphne Koller. 2002. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66. Google ScholarDigital Library
- Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley.Google Scholar
- Zheng Wang, Shuicheng Yan, and Changshui Zhang. 2011. Active learning with adaptive regularization. Pattern Recognition 44, 10--11, 2375--2383. Google ScholarDigital Library
- Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, and Christian Lemmen. 2001. Active learning in the drug discovery process. In Advances in Neural Information Processing Systems 14, 1449--1456.Google Scholar
- Zhao Xu, Kai Yu, Volker Tresp, Xiaowei Xu, and Jizhi Wang. 2003. Representative sampling for text classification using support vector machines. In Proceedings of the European Conference on Information Retrieval (ECIR). 393--407. Google ScholarDigital Library
- Kai Yu, Jinbo, and Volker Tresp. 2006. Active learning via transductive experimental design. In Proceedings of the 23rd International Conference on Machine Learning (ICML). 1081--1088. Google ScholarDigital Library
- Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. 2003. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.Google Scholar
Index Terms
- Querying Discriminative and Representative Samples for Batch Mode Active Learning
Recommendations
Querying discriminative and representative samples for batch mode active learning
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data miningEmpirical risk minimization (ERM) provides a useful guideline for many machine learning and data mining algorithms. Under the ERM principle, one minimizes an upper bound of the true risk, which is approximated by the summation of empirical risk and the ...
A batch-mode active learning framework by querying discriminative and representative samples for hyperspectral image classification
Batch-mode active learning approaches are dedicated on the training sample set selection for classification, where a batch of unlabeled samples is queried at each iteration. The current state-of-the-art AL techniques exploit different query functions, ...
Batch Mode Active Sampling Based on Marginal Probability Distribution Matching
Special Issue on ACM SIGKDD 2012Active Learning is a machine learning and data mining technique that selects the most informative samples for labeling and uses them as training data; it is especially useful when there are large amount of unlabeled data and labeling them is expensive. ...
Comments