skip to main content
research-article

Querying Discriminative and Representative Samples for Batch Mode Active Learning

Published:17 February 2015Publication History
Skip Abstract Section

Abstract

Empirical risk minimization (ERM) provides a principled guideline for many machine learning and data mining algorithms. Under the ERM principle, one minimizes an upper bound of the true risk, which is approximated by the summation of empirical risk and the complexity of the candidate classifier class. To guarantee a satisfactory learning performance, ERM requires that the training data are i.i.d. sampled from the unknown source distribution. However, this may not be the case in active learning, where one selects the most informative samples to label, and these data may not follow the source distribution. In this article, we generalize the ERM principle to the active learning setting. We derive a novel form of upper bound for the true risk in the active learning setting; by minimizing this upper bound, we develop a practical batch mode active learning method. The proposed formulation involves a nonconvex integer programming optimization problem. We solve it efficiently by an alternating optimization method. Our method is shown to query the most informative samples while preserving the source distribution as much as possible, thus identifying the most uncertain and representative queries. We further extend our method to multiclass active learning by introducing novel pseudolabels in the multiclass case and developing an efficient algorithm. Experiments on benchmark datasets and real-world applications demonstrate the superior performance of our proposed method compared to state-of-the-art methods.

References

  1. Naoki Abe, Bianca Zadrozny, and John Langford. 2006. Outlier detection by active learning. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 504--509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Peter L. Bartlett and Shahar Mendelson. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3, 463--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alina Beygelzimer, Sanjoy Dasgupta, and John Langford. 2009. Importance weighted active learning. In Proceedings of the 26th International Conference on Machine Learning (ICML). 49--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. James C. Bezdek and Richard J. Hathaway. 2003. Convergence of alternating optimization. Neural, Parallel, and Scientific Computations 11, 4, 351--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Karsten M. Borgwardt, Arthur Gretton, Malte J. Rasch, Hans-Peter Kriegel, Bernhard Schölkopf, and Alex J. Smola. 2006. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, 14, 49--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 3, 1--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christopher J. C. Burges. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 2, 121--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Colin Campbell, Nello Cristianini, and Alex J. Smola. 2000. Query learning with large margin classifiers. In Proceedings of the 17th International Conference on Machine Learning (ICML). 111--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Shayok Chakraborty, Vineeth Balasubramanian, and Sethuraman Panchanathan. 2011. Dynamic batch mode active learning. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2649--2656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, 27:1--27:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien (Eds.). 2006. Semi-Supervised Learning. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  12. Rita Chattopadhyay, Zheng Wang, Wei Fan, Ian Davidson, Sethuraman Panchanathan, and Jieping Ye. 2012. Batch mode active sampling based on marginal probability distribution matching. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 741--749. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yuxin Chen and Andreas Krause. 2013. Near-optimal batch mode active learning and adaptive submodular optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML). 160--168.Google ScholarGoogle Scholar
  14. Yunmei Chen and Xiaojing Ye. 2011. Projection onto a simplex. arXiv preprint arXiv:1101.6081.Google ScholarGoogle Scholar
  15. David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan. 1996. Active learning with statistical models. Journal of Artificial Intelligence Research 4, 1, 129--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. d’Alché Buc, Yves Grandvalet, and Christophe Ambroise. 2002. Semi-supervised MarginBoost. In Advances in Neural Information Processing Systems 14, 553--563.Google ScholarGoogle Scholar
  17. Sanjoy Dasgupta. 2011. Two faces of active learning. Theoretical Computer Science 412, 19, 1767--1781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Richard M. Dudley. 2002. Real Analysis and Probability. Cambridge University Press.Google ScholarGoogle Scholar
  19. Andrew Frank and Arthur Asuncion. 2010. UCI Machine Learning Repository. Retrieved December 28, 2014, from http://archive.ics.uci.edu/ml.Google ScholarGoogle Scholar
  20. Yoav Freund, H. Sebastian Seung, Eli Shamir, and Naftali Tishby. 1997. Selective sampling using the query by committee algorithm. Machine Learning 28, 2--3, 133--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. Journal of Machine Learning Research 13, 723--773. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yuhong Guo. 2010. Active instance sampling via matrix partition. In Advances in Neural Information Processing Systems 23, 802--810.Google ScholarGoogle Scholar
  23. Yuhong Guo and Dale Schuurmans. 2008. Discriminative batch mode active learning. In Advances in Neural Information Processing Systems 20, 593--600.Google ScholarGoogle Scholar
  24. Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2006a. Batch mode active learning and its application to medical image classification. In Proceedings of the 23rd International Conference on Machine Learning (ICML). 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Steven C. H. Hoi, Rong Jin, and Michael R. Lyu. 2006b. Large-scale text categorization by batch mode active learning. In Proceedings of the 15th International Conference on World Wide Web (WWW). 633--642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Steven C. H. Hoi, Rong Jin, and Michael R. Lyu. 2009a. Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering 21, 9, 1233--1248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2008. Semi-supervised SVM batch mode active learning for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--7.Google ScholarGoogle Scholar
  28. Steven C. H. Hoi, Rong Jin, Jianke Zhu, and Michael R. Lyu. 2009b. Semisupervised SVM batch mode active learning with applications to image retrieval. ACM Transactions on Information Systems 27, 3, Article No. 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou. 2010. Active learning by querying informative and representative examples. In Advances in Neural Information Processing Systems 23, 892--900.Google ScholarGoogle Scholar
  30. Ajay J. Joshi, Fatih Porikli, and Nikolaos Papanikolopoulos. 2010. Multi-class batch-mode active learning for image classification. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation (ICRA). 1873--1878.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yehuda Koren. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 426--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hieu T. Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML). 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ryan Rifkin and Aldebaro Klautau. 2004. In defense of one-vs-all classification. Journal of Machine Learning Research 5, 101--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nicholas Roy and Andrew McCallum. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning (ICML). 441--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin--Madison.Google ScholarGoogle Scholar
  36. H. Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the 5th Annual Conference on Computational Learning Theory (COLT). 287--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. 2010. Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research 11, 1517--1561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Masashi Sugiyama. 2006. Active learning in approximately linear regression based on conditional expectation of generalization error. Journal of Machine Learning Research 7, 141--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Simon Tong and Daphne Koller. 2002. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley.Google ScholarGoogle Scholar
  41. Zheng Wang, Shuicheng Yan, and Changshui Zhang. 2011. Active learning with adaptive regularization. Pattern Recognition 44, 10--11, 2375--2383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Manfred K. Warmuth, Gunnar Rätsch, Michael Mathieson, Jun Liao, and Christian Lemmen. 2001. Active learning in the drug discovery process. In Advances in Neural Information Processing Systems 14, 1449--1456.Google ScholarGoogle Scholar
  43. Zhao Xu, Kai Yu, Volker Tresp, Xiaowei Xu, and Jizhi Wang. 2003. Representative sampling for text classification using support vector machines. In Proceedings of the European Conference on Information Retrieval (ECIR). 393--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kai Yu, Jinbo, and Volker Tresp. 2006. Active learning via transductive experimental design. In Proceedings of the 23rd International Conference on Machine Learning (ICML). 1081--1088. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. 2003. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining.Google ScholarGoogle Scholar

Index Terms

  1. Querying Discriminative and Representative Samples for Batch Mode Active Learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 3
      TKDD Special Issue (SIGKDD'13)
      April 2015
      313 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2737800
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2015
      • Accepted: 1 September 2014
      • Revised: 1 April 2014
      • Received: 1 October 2013
      Published in tkdd Volume 9, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader