research-article

Vertical Ensemble Co-Training for Text Classification

Authors:
Gilad Katz

Ben-Gurion University of the Negev, Beer Sheve, Israel

Ben-Gurion University of the Negev, Beer Sheve, Israel

0000-0001-9478-7550
View Profile

,
Cornelia Caragea

University of North Texas, Denton, TX

University of North Texas, Denton, TX
View Profile

,
Asaf Shabtai

Ben-Gurion University of the Negev, Beer Sheve, Israel

Ben-Gurion University of the Negev, Beer Sheve, Israel
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 9 Issue 2Article No.: 21pp 1–23https://doi.org/10.1145/3137114

Published:25 October 2017Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

High-quality, labeled data is essential for successfully applying machine learning methods to real-world text classification problems. However, in many cases, the amount of labeled data is very small compared to that of the unlabeled, and labeling additional samples could be expensive and time consuming. Co-training algorithms, which make use of unlabeled data to improve classification, have proven to be very effective in such cases. Generally, co-training algorithms work by using two classifiers, trained on two different views of the data, to label large amounts of unlabeled data. Doing so can help minimize the human effort required for labeling new data, as well as improve classification performance. In this article, we propose an ensemble-based co-training approach that uses an ensemble of classifiers from different training iterations to improve labeling accuracy. This approach, which we call vertical ensemble, incurs almost no additional computational cost. Experiments conducted on six textual datasets show a significant improvement of over 45% in AUC compared with the original co-training algorithm.

References

Maria-Florina Balcan, Avrim Blum, and Ke Yang. 2004. Co-training and expansion: Towards bridging theory and practice. In Advances in Neural Information Processing Systems. 89--96.Google Scholar
Maria F. Balcan, Avrim Blum, and Ke Yang. 2005. Co-training and expansion: Towards bridging theory and practice. In Advances in Neural Information Processing Systems.Google Scholar
A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT’98). ACM, New York, NY, 92--100. DOI:http://dx.doi.org/10.1145/279943.279962 Google ScholarDigital Library
Ulf Brefeld and Tobias Scheffer. 2004. Co-EM support vector learning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 16. Google ScholarDigital Library
Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2, 123--140. Google ScholarDigital Library
Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32.Google ScholarDigital Library
Wray Buntine. 1992. Learning classification trees. Statistics and Computing 2, 2, 63--73. Google ScholarCross Ref
Minmin Chen, Kilian Weinberger, and Yixin Chen. 2011. Automatic feature decomposition for single view co-training. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 953--960.Google Scholar
C. M. Christoudias, R. Urtasun, and T. Darrell. 2008. Multi-view learning in the presence of view disagreement. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI’08). 88-96.Google Scholar
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1--30. http://dl.acm.org/citation.cfm?id=1248547.1248548.Google ScholarDigital Library
Francois Denis, Anne Laurent, Razmi Gilleron, and Marc Tommasi. 2003. Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 Workshop: The Continuum from Labeled to Unlabeled Data. 80--87.Google Scholar
Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems. Springer, 1--15. Google ScholarCross Ref
Thomas G. Dietterich. 2002. Ensemble learning. In The Handbook of Brain Theory and Neural Networks (2nd ed.). MIT Press, Cambridge, MA, 110--125.Google Scholar
Pedro Domingos. 2012. A few useful things to know about machine learning. Communications of the ACM 55, 10, 78--87. Google ScholarDigital Library
J. Du, C. X. Ling, and Z.-H. Zhou. 2011. When does cotraining work in real data? IEEE Transactions on Knowledge and Data Engineering 23, 5, 788--799. Google ScholarDigital Library
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems. 2944--2952.Google Scholar
Yoav Freund and Robert E. Schapire. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory (EuroCOLT’95). 23--37. http://dl.acm.org/citation.cfm?id=646943.712093.Google Scholar
Yoav Freund and Robert E. Schapire. 1999. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence 14, 5, 771--780.Google Scholar
Jerome H. Friedman. 1997. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1, 1, 55--77. Google ScholarDigital Library
Rayid Ghani. 2002. Combining labeled and unlabeled data for multiclass text categorization. In Proceedings of the 19th International Conference on Machine Learning (ICML’02). 187--194.Google ScholarDigital Library
Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2013. Researcher homepage classification using unlabeled data. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 471--482. http://dl.acm.org/citation.cfm?id=2488388.2488430.Google ScholarDigital Library
Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2015. Improving researcher homepage classification with unlabeled data. ACM Transactions on the Web 9, 4, 17.Google ScholarDigital Library
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1, 10--18.Google ScholarDigital Library
Douglas M. Hawkins. 2004. The problem of overfitting. Journal of Chemical Information and Computer Sciences 44, 1, 1--12. Google ScholarCross Ref
Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning(ICML’99). 200--209.Google Scholar
Thorsten Joachims. 2006. Transductive support vector machines. In Semi-Supervised Learning, O. Chapelle, B. Scholkopf, and A. Zieneds (Eds.). MIT Press, Cambridge, MA, 105--118.Google Scholar
Gilad Katz, Nir Ofek, and Bracha Shapira. 2015. ConSent. Knowledge-Based Systems 84, C, 162--178. Google ScholarDigital Library
Gilad Katz, Asaf Shabtai, and Lior Rokach. 2014. Adapted features and instance selection for improving co-training. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Springer, 81--100. Google ScholarCross Ref
Svetlana Kiritchenko and Stan Matwin. 2001. Email classification with co-training. In Proceedings of the 2001 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’01). 8. http://dl.acm.org/citation.cfm?id=782096.782104.Google ScholarDigital Library
Anders Krogh and Jesper Vedelsby. 1995. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems. 231--238.Google Scholar
Anat Levin, Paul Viola, and Yoav Freund. 2003. Unsupervised improvement of visual detectors using co-training. In Proceedings of the 9th IEEE International Conference on Computer Vision, Volume 2 (ICCV’03). IEEE, Los Alamitos, CA, 626--633. http://dl.acm.org/citation.cfm?id=946247.946615.Google ScholarCross Ref
Guangxia Li, Steven C. H. Hoi, and Kuiyu Chang. 2010. Two-view transductive support vector machines. In Proceedings of the 2010 SIAM International Conference on Data Mining. 235--244. Google ScholarCross Ref
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv:1603.06560.Google Scholar
Rong Liu, Jian Cheng, and Hanqing Lu. 2009. A robust boosting tracker with minimum error bound in a co-training framework. In Proceedings of the 2009 IEEE 12th International Conference on Computer Visiong (ICCV’09). 1459--1466.Google Scholar
Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang. 2008. A general model for multiple view unsupervised learning. In Proceedings of the 8th SIAM International Conference on Data Mining. 822--833. Google ScholarCross Ref
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150. http://www.aclweb.org/anthology/P11-1015.Google Scholar
Ching-Hao Mao, Hahn-Ming Lee, Devi Parikh, Tsuhan Chen, and Si-Yu Huang. 2009. Semi-supervised co-training and active learning based approach for multi-view intrusion detection. In Proceedings of the 2009 ACM Symposium on Applied Computing (SAC’09). ACM, New York, NY, 2042--2048. DOI:http://dx.doi.org/10.1145/1529282.1529735 Google ScholarDigital Library
Eitan Menahem, Lior Rokach, and Yuval Elovici. 2009. Troika—an improved stacking schema for classification tasks. Information Sciences 179, 24, 4097--4122. DOI:http://dx.doi.org/10.1016/j.ins.2009.08.025 Google ScholarDigital Library
Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In Proceedings of the HLT-NAACL 2004 Workshop: 8th Conference on Computational Natural Language Learning (CoNLL’04). 33--40.Google Scholar
Christoph Müller, Stefan Rapp, and Michael Strube. 2002. Applying co-training to reference resolution. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). 352--359. DOI:http://dx.doi.org/10.3115/1073083.1073142 Google ScholarDigital Library
Ion Muslea, Steven Minton, and Craig A. Knoblock. 2002. Active + semi-supervised learning = robust multi-view learning. In Proceedings of the 19th International Conference on Machine Learning (ICML’02). 435--442. http://dl.acm.org/citation.cfm?id=645531.655845.Google Scholar
Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM’00). 86--93. Google ScholarDigital Library
Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 2--3, 103--134. DOI:http://dx.doi.org/10.1023/A:1007692713085 Google ScholarDigital Library
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Volume 10 (EMNLP’02). 79--86.Google ScholarDigital Library
John Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 10, 3, 61--74.Google Scholar
Anoop Sarkar. 2001. Applying co-training methods to statistical parsing. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies (NAACL’01). 1--8. DOI:http://dx.doi.org/10.3115/1073336.1073359 Google ScholarDigital Library
Aayush Sharma, Gang Hua, Zicheng Liu, and Zhengyou Zhang. 2008. Meta-tag propagation by co-training an ensemble classifier for improving image search relevance. In Proceedings of the 2008 IEEE Computer Society Conference on Computer Visiona and Pattern Recognition (CVPRW’08). Google ScholarCross Ref
Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of the ICML Workshop on Learning with Multiple Views.Google Scholar
Bradly C. Stadie, Sergey Levine, and Pieter Abbeel. 2015. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv:1507.00814.Google Scholar
Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 847--855. Google ScholarDigital Library
Katrin Tomanek and Udo Hahn. 2009. Semi-supervised active learning for sequence labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. 1039--1047. Google ScholarCross Ref
Simon Tong and Daphne Koller. 2002. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66.Google ScholarDigital Library
Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 (ACL’09). 235--243. http://dl.acm.org/citation.cfm?id=1687878.1687913.Google ScholarDigital Library
William Yang Wang, Kapil Thadani, and Kathleen McKeown. 2011. Identifying event descriptions using co-training with online news summaries. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP’11).Google Scholar
Min-Ling Zhang and Zhi-Hua Zhou. 2011. CoTrade: Confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics: Part B (Cybernetics) 41, 6, 1612--1626. Google ScholarDigital Library
Zhi-Hua Zhou. 2009. When semi-supervised learning meets ensemble learning. In Proceedings of the 8th International Workshop on Multiple Classifier Systems (MCS’09). 529--538. DOI:http://dx.doi.org/10.1007/978-3-642-02326-2_53 Google ScholarDigital Library
Zhi-Hua Zhou and Ming Li. 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 11, 1529--1541. Google ScholarDigital Library
X. Zhu and A. B. Goldberg. 2009. Introduction to Semi-Supervised Learning. Morgan 8 Claypool.Google Scholar

Index Terms

Vertical Ensemble Co-Training for Text Classification
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Semi-supervised learning

Recommendations

Improving Text Classification Accuracy by Training Label Cleaning

In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting ...
Read More
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More
DCPE co-training for classification

Co-training is a well-known semi-supervised learning technique that applies two basic learners to train the data source, which uses the most confident unlabeled data to augment labeled data in the learning process. In the paper, we use the diversity of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 9, Issue 2
Regular Papers
March 2018
191 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3154791
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2017
- Accepted: 1 August 2017
- Revised: 1 June 2017
- Received: 1 February 2017
Published in tist Volume 9, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Co-training
ensemble
text classification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 356
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Vertical Ensemble Co-Training for Text Classification

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Improving Text Classification Accuracy by Training Label Cleaning

Inductive Semi-supervised Multi-Label Learning with Co-Training

DCPE co-training for classification