skip to main content
research-article

Vertical Ensemble Co-Training for Text Classification

Published:25 October 2017Publication History
Skip Abstract Section

Abstract

High-quality, labeled data is essential for successfully applying machine learning methods to real-world text classification problems. However, in many cases, the amount of labeled data is very small compared to that of the unlabeled, and labeling additional samples could be expensive and time consuming. Co-training algorithms, which make use of unlabeled data to improve classification, have proven to be very effective in such cases. Generally, co-training algorithms work by using two classifiers, trained on two different views of the data, to label large amounts of unlabeled data. Doing so can help minimize the human effort required for labeling new data, as well as improve classification performance. In this article, we propose an ensemble-based co-training approach that uses an ensemble of classifiers from different training iterations to improve labeling accuracy. This approach, which we call vertical ensemble, incurs almost no additional computational cost. Experiments conducted on six textual datasets show a significant improvement of over 45% in AUC compared with the original co-training algorithm.

References

  1. Maria-Florina Balcan, Avrim Blum, and Ke Yang. 2004. Co-training and expansion: Towards bridging theory and practice. In Advances in Neural Information Processing Systems. 89--96.Google ScholarGoogle Scholar
  2. Maria F. Balcan, Avrim Blum, and Ke Yang. 2005. Co-training and expansion: Towards bridging theory and practice. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  3. A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT’98). ACM, New York, NY, 92--100. DOI:http://dx.doi.org/10.1145/279943.279962 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ulf Brefeld and Tobias Scheffer. 2004. Co-EM support vector learning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2, 123--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Wray Buntine. 1992. Learning classification trees. Statistics and Computing 2, 2, 63--73. Google ScholarGoogle ScholarCross RefCross Ref
  8. Minmin Chen, Kilian Weinberger, and Yixin Chen. 2011. Automatic feature decomposition for single view co-training. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 953--960.Google ScholarGoogle Scholar
  9. C. M. Christoudias, R. Urtasun, and T. Darrell. 2008. Multi-view learning in the presence of view disagreement. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI’08). 88-96.Google ScholarGoogle Scholar
  10. Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1--30. http://dl.acm.org/citation.cfm?id=1248547.1248548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Francois Denis, Anne Laurent, Razmi Gilleron, and Marc Tommasi. 2003. Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 Workshop: The Continuum from Labeled to Unlabeled Data. 80--87.Google ScholarGoogle Scholar
  12. Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems. Springer, 1--15. Google ScholarGoogle ScholarCross RefCross Ref
  13. Thomas G. Dietterich. 2002. Ensemble learning. In The Handbook of Brain Theory and Neural Networks (2nd ed.). MIT Press, Cambridge, MA, 110--125.Google ScholarGoogle Scholar
  14. Pedro Domingos. 2012. A few useful things to know about machine learning. Communications of the ACM 55, 10, 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Du, C. X. Ling, and Z.-H. Zhou. 2011. When does cotraining work in real data? IEEE Transactions on Knowledge and Data Engineering 23, 5, 788--799. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems. 2944--2952.Google ScholarGoogle Scholar
  17. Yoav Freund and Robert E. Schapire. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory (EuroCOLT’95). 23--37. http://dl.acm.org/citation.cfm?id=646943.712093.Google ScholarGoogle Scholar
  18. Yoav Freund and Robert E. Schapire. 1999. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence 14, 5, 771--780.Google ScholarGoogle Scholar
  19. Jerome H. Friedman. 1997. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1, 1, 55--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rayid Ghani. 2002. Combining labeled and unlabeled data for multiclass text categorization. In Proceedings of the 19th International Conference on Machine Learning (ICML’02). 187--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2013. Researcher homepage classification using unlabeled data. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 471--482. http://dl.acm.org/citation.cfm?id=2488388.2488430.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2015. Improving researcher homepage classification with unlabeled data. ACM Transactions on the Web 9, 4, 17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1, 10--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Douglas M. Hawkins. 2004. The problem of overfitting. Journal of Chemical Information and Computer Sciences 44, 1, 1--12. Google ScholarGoogle ScholarCross RefCross Ref
  25. Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning(ICML’99). 200--209.Google ScholarGoogle Scholar
  26. Thorsten Joachims. 2006. Transductive support vector machines. In Semi-Supervised Learning, O. Chapelle, B. Scholkopf, and A. Zieneds (Eds.). MIT Press, Cambridge, MA, 105--118.Google ScholarGoogle Scholar
  27. Gilad Katz, Nir Ofek, and Bracha Shapira. 2015. ConSent. Knowledge-Based Systems 84, C, 162--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Gilad Katz, Asaf Shabtai, and Lior Rokach. 2014. Adapted features and instance selection for improving co-training. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Springer, 81--100. Google ScholarGoogle ScholarCross RefCross Ref
  29. Svetlana Kiritchenko and Stan Matwin. 2001. Email classification with co-training. In Proceedings of the 2001 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’01). 8. http://dl.acm.org/citation.cfm?id=782096.782104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Anders Krogh and Jesper Vedelsby. 1995. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems. 231--238.Google ScholarGoogle Scholar
  31. Anat Levin, Paul Viola, and Yoav Freund. 2003. Unsupervised improvement of visual detectors using co-training. In Proceedings of the 9th IEEE International Conference on Computer Vision, Volume 2 (ICCV’03). IEEE, Los Alamitos, CA, 626--633. http://dl.acm.org/citation.cfm?id=946247.946615.Google ScholarGoogle ScholarCross RefCross Ref
  32. Guangxia Li, Steven C. H. Hoi, and Kuiyu Chang. 2010. Two-view transductive support vector machines. In Proceedings of the 2010 SIAM International Conference on Data Mining. 235--244. Google ScholarGoogle ScholarCross RefCross Ref
  33. Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv:1603.06560.Google ScholarGoogle Scholar
  34. Rong Liu, Jian Cheng, and Hanqing Lu. 2009. A robust boosting tracker with minimum error bound in a co-training framework. In Proceedings of the 2009 IEEE 12th International Conference on Computer Visiong (ICCV’09). 1459--1466.Google ScholarGoogle Scholar
  35. Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang. 2008. A general model for multiple view unsupervised learning. In Proceedings of the 8th SIAM International Conference on Data Mining. 822--833. Google ScholarGoogle ScholarCross RefCross Ref
  36. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150. http://www.aclweb.org/anthology/P11-1015.Google ScholarGoogle Scholar
  37. Ching-Hao Mao, Hahn-Ming Lee, Devi Parikh, Tsuhan Chen, and Si-Yu Huang. 2009. Semi-supervised co-training and active learning based approach for multi-view intrusion detection. In Proceedings of the 2009 ACM Symposium on Applied Computing (SAC’09). ACM, New York, NY, 2042--2048. DOI:http://dx.doi.org/10.1145/1529282.1529735 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Eitan Menahem, Lior Rokach, and Yuval Elovici. 2009. Troika—an improved stacking schema for classification tasks. Information Sciences 179, 24, 4097--4122. DOI:http://dx.doi.org/10.1016/j.ins.2009.08.025 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In Proceedings of the HLT-NAACL 2004 Workshop: 8th Conference on Computational Natural Language Learning (CoNLL’04). 33--40.Google ScholarGoogle Scholar
  40. Christoph Müller, Stefan Rapp, and Michael Strube. 2002. Applying co-training to reference resolution. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). 352--359. DOI:http://dx.doi.org/10.3115/1073083.1073142 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ion Muslea, Steven Minton, and Craig A. Knoblock. 2002. Active + semi-supervised learning = robust multi-view learning. In Proceedings of the 19th International Conference on Machine Learning (ICML’02). 435--442. http://dl.acm.org/citation.cfm?id=645531.655845.Google ScholarGoogle Scholar
  42. Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM’00). 86--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 2--3, 103--134. DOI:http://dx.doi.org/10.1023/A:1007692713085 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Volume 10 (EMNLP’02). 79--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. John Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 10, 3, 61--74.Google ScholarGoogle Scholar
  46. Anoop Sarkar. 2001. Applying co-training methods to statistical parsing. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies (NAACL’01). 1--8. DOI:http://dx.doi.org/10.3115/1073336.1073359 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Aayush Sharma, Gang Hua, Zicheng Liu, and Zhengyou Zhang. 2008. Meta-tag propagation by co-training an ensemble classifier for improving image search relevance. In Proceedings of the 2008 IEEE Computer Society Conference on Computer Visiona and Pattern Recognition (CVPRW’08). Google ScholarGoogle ScholarCross RefCross Ref
  48. Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of the ICML Workshop on Learning with Multiple Views.Google ScholarGoogle Scholar
  49. Bradly C. Stadie, Sergey Levine, and Pieter Abbeel. 2015. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv:1507.00814.Google ScholarGoogle Scholar
  50. Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 847--855. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Katrin Tomanek and Udo Hahn. 2009. Semi-supervised active learning for sequence labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. 1039--1047. Google ScholarGoogle ScholarCross RefCross Ref
  52. Simon Tong and Daphne Koller. 2002. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 (ACL’09). 235--243. http://dl.acm.org/citation.cfm?id=1687878.1687913.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. William Yang Wang, Kapil Thadani, and Kathleen McKeown. 2011. Identifying event descriptions using co-training with online news summaries. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP’11).Google ScholarGoogle Scholar
  55. Min-Ling Zhang and Zhi-Hua Zhou. 2011. CoTrade: Confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics: Part B (Cybernetics) 41, 6, 1612--1626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhi-Hua Zhou. 2009. When semi-supervised learning meets ensemble learning. In Proceedings of the 8th International Workshop on Multiple Classifier Systems (MCS’09). 529--538. DOI:http://dx.doi.org/10.1007/978-3-642-02326-2_53 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zhi-Hua Zhou and Ming Li. 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 11, 1529--1541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. X. Zhu and A. B. Goldberg. 2009. Introduction to Semi-Supervised Learning. Morgan 8 Claypool.Google ScholarGoogle Scholar

Index Terms

  1. Vertical Ensemble Co-Training for Text Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 9, Issue 2
      Regular Papers
      March 2018
      191 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3154791
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 October 2017
      • Accepted: 1 August 2017
      • Revised: 1 June 2017
      • Received: 1 February 2017
      Published in tist Volume 9, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader