Abstract
High-quality, labeled data is essential for successfully applying machine learning methods to real-world text classification problems. However, in many cases, the amount of labeled data is very small compared to that of the unlabeled, and labeling additional samples could be expensive and time consuming. Co-training algorithms, which make use of unlabeled data to improve classification, have proven to be very effective in such cases. Generally, co-training algorithms work by using two classifiers, trained on two different views of the data, to label large amounts of unlabeled data. Doing so can help minimize the human effort required for labeling new data, as well as improve classification performance. In this article, we propose an ensemble-based co-training approach that uses an ensemble of classifiers from different training iterations to improve labeling accuracy. This approach, which we call vertical ensemble, incurs almost no additional computational cost. Experiments conducted on six textual datasets show a significant improvement of over 45% in AUC compared with the original co-training algorithm.
- Maria-Florina Balcan, Avrim Blum, and Ke Yang. 2004. Co-training and expansion: Towards bridging theory and practice. In Advances in Neural Information Processing Systems. 89--96.Google Scholar
- Maria F. Balcan, Avrim Blum, and Ke Yang. 2005. Co-training and expansion: Towards bridging theory and practice. In Advances in Neural Information Processing Systems.Google Scholar
- A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT’98). ACM, New York, NY, 92--100. DOI:http://dx.doi.org/10.1145/279943.279962 Google ScholarDigital Library
- Ulf Brefeld and Tobias Scheffer. 2004. Co-EM support vector learning. In Proceedings of the 21st International Conference on Machine Learning (ICML’04). 16. Google ScholarDigital Library
- Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2, 123--140. Google ScholarDigital Library
- Leo Breiman. 2001. Random forests. Machine Learning 45, 1, 5--32.Google ScholarDigital Library
- Wray Buntine. 1992. Learning classification trees. Statistics and Computing 2, 2, 63--73. Google ScholarCross Ref
- Minmin Chen, Kilian Weinberger, and Yixin Chen. 2011. Automatic feature decomposition for single view co-training. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 953--960.Google Scholar
- C. M. Christoudias, R. Urtasun, and T. Darrell. 2008. Multi-view learning in the presence of view disagreement. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI’08). 88-96.Google Scholar
- Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1--30. http://dl.acm.org/citation.cfm?id=1248547.1248548.Google ScholarDigital Library
- Francois Denis, Anne Laurent, Razmi Gilleron, and Marc Tommasi. 2003. Text classification and co-training from positive and unlabeled examples. In Proceedings of the ICML 2003 Workshop: The Continuum from Labeled to Unlabeled Data. 80--87.Google Scholar
- Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems. Springer, 1--15. Google ScholarCross Ref
- Thomas G. Dietterich. 2002. Ensemble learning. In The Handbook of Brain Theory and Neural Networks (2nd ed.). MIT Press, Cambridge, MA, 110--125.Google Scholar
- Pedro Domingos. 2012. A few useful things to know about machine learning. Communications of the ACM 55, 10, 78--87. Google ScholarDigital Library
- J. Du, C. X. Ling, and Z.-H. Zhou. 2011. When does cotraining work in real data? IEEE Transactions on Knowledge and Data Engineering 23, 5, 788--799. Google ScholarDigital Library
- Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems. 2944--2952.Google Scholar
- Yoav Freund and Robert E. Schapire. 1995. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory (EuroCOLT’95). 23--37. http://dl.acm.org/citation.cfm?id=646943.712093.Google Scholar
- Yoav Freund and Robert E. Schapire. 1999. A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence 14, 5, 771--780.Google Scholar
- Jerome H. Friedman. 1997. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1, 1, 55--77. Google ScholarDigital Library
- Rayid Ghani. 2002. Combining labeled and unlabeled data for multiclass text categorization. In Proceedings of the 19th International Conference on Machine Learning (ICML’02). 187--194.Google ScholarDigital Library
- Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2013. Researcher homepage classification using unlabeled data. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 471--482. http://dl.acm.org/citation.cfm?id=2488388.2488430.Google ScholarDigital Library
- Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2015. Improving researcher homepage classification with unlabeled data. ACM Transactions on the Web 9, 4, 17.Google ScholarDigital Library
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1, 10--18.Google ScholarDigital Library
- Douglas M. Hawkins. 2004. The problem of overfitting. Journal of Chemical Information and Computer Sciences 44, 1, 1--12. Google ScholarCross Ref
- Thorsten Joachims. 1999. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning(ICML’99). 200--209.Google Scholar
- Thorsten Joachims. 2006. Transductive support vector machines. In Semi-Supervised Learning, O. Chapelle, B. Scholkopf, and A. Zieneds (Eds.). MIT Press, Cambridge, MA, 105--118.Google Scholar
- Gilad Katz, Nir Ofek, and Bracha Shapira. 2015. ConSent. Knowledge-Based Systems 84, C, 162--178. Google ScholarDigital Library
- Gilad Katz, Asaf Shabtai, and Lior Rokach. 2014. Adapted features and instance selection for improving co-training. In Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. Springer, 81--100. Google ScholarCross Ref
- Svetlana Kiritchenko and Stan Matwin. 2001. Email classification with co-training. In Proceedings of the 2001 Conference of the Centre for Advanced Studies on Collaborative Research (CASCON’01). 8. http://dl.acm.org/citation.cfm?id=782096.782104.Google ScholarDigital Library
- Anders Krogh and Jesper Vedelsby. 1995. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems. 231--238.Google Scholar
- Anat Levin, Paul Viola, and Yoav Freund. 2003. Unsupervised improvement of visual detectors using co-training. In Proceedings of the 9th IEEE International Conference on Computer Vision, Volume 2 (ICCV’03). IEEE, Los Alamitos, CA, 626--633. http://dl.acm.org/citation.cfm?id=946247.946615.Google ScholarCross Ref
- Guangxia Li, Steven C. H. Hoi, and Kuiyu Chang. 2010. Two-view transductive support vector machines. In Proceedings of the 2010 SIAM International Conference on Data Mining. 235--244. Google ScholarCross Ref
- Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. Hyperband: A novel bandit-based approach to hyperparameter optimization. arXiv:1603.06560.Google Scholar
- Rong Liu, Jian Cheng, and Hanqing Lu. 2009. A robust boosting tracker with minimum error bound in a co-training framework. In Proceedings of the 2009 IEEE 12th International Conference on Computer Visiong (ICCV’09). 1459--1466.Google Scholar
- Bo Long, Philip S. Yu, and Zhongfei (Mark) Zhang. 2008. A general model for multiple view unsupervised learning. In Proceedings of the 8th SIAM International Conference on Data Mining. 822--833. Google ScholarCross Ref
- Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150. http://www.aclweb.org/anthology/P11-1015.Google Scholar
- Ching-Hao Mao, Hahn-Ming Lee, Devi Parikh, Tsuhan Chen, and Si-Yu Huang. 2009. Semi-supervised co-training and active learning based approach for multi-view intrusion detection. In Proceedings of the 2009 ACM Symposium on Applied Computing (SAC’09). ACM, New York, NY, 2042--2048. DOI:http://dx.doi.org/10.1145/1529282.1529735 Google ScholarDigital Library
- Eitan Menahem, Lior Rokach, and Yuval Elovici. 2009. Troika—an improved stacking schema for classification tasks. Information Sciences 179, 24, 4097--4122. DOI:http://dx.doi.org/10.1016/j.ins.2009.08.025 Google ScholarDigital Library
- Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In Proceedings of the HLT-NAACL 2004 Workshop: 8th Conference on Computational Natural Language Learning (CoNLL’04). 33--40.Google Scholar
- Christoph Müller, Stefan Rapp, and Michael Strube. 2002. Applying co-training to reference resolution. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). 352--359. DOI:http://dx.doi.org/10.3115/1073083.1073142 Google ScholarDigital Library
- Ion Muslea, Steven Minton, and Craig A. Knoblock. 2002. Active + semi-supervised learning = robust multi-view learning. In Proceedings of the 19th International Conference on Machine Learning (ICML’02). 435--442. http://dl.acm.org/citation.cfm?id=645531.655845.Google Scholar
- Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM’00). 86--93. Google ScholarDigital Library
- Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, and Tom Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 2--3, 103--134. DOI:http://dx.doi.org/10.1023/A:1007692713085 Google ScholarDigital Library
- Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Volume 10 (EMNLP’02). 79--86.Google ScholarDigital Library
- John Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 10, 3, 61--74.Google Scholar
- Anoop Sarkar. 2001. Applying co-training methods to statistical parsing. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies (NAACL’01). 1--8. DOI:http://dx.doi.org/10.3115/1073336.1073359 Google ScholarDigital Library
- Aayush Sharma, Gang Hua, Zicheng Liu, and Zhengyou Zhang. 2008. Meta-tag propagation by co-training an ensemble classifier for improving image search relevance. In Proceedings of the 2008 IEEE Computer Society Conference on Computer Visiona and Pattern Recognition (CVPRW’08). Google ScholarCross Ref
- Vikas Sindhwani, Partha Niyogi, and Mikhail Belkin. 2005. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of the ICML Workshop on Learning with Multiple Views.Google Scholar
- Bradly C. Stadie, Sergey Levine, and Pieter Abbeel. 2015. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv:1507.00814.Google Scholar
- Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 847--855. Google ScholarDigital Library
- Katrin Tomanek and Udo Hahn. 2009. Semi-supervised active learning for sequence labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. 1039--1047. Google ScholarCross Ref
- Simon Tong and Daphne Koller. 2002. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45--66.Google ScholarDigital Library
- Xiaojun Wan. 2009. Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 (ACL’09). 235--243. http://dl.acm.org/citation.cfm?id=1687878.1687913.Google ScholarDigital Library
- William Yang Wang, Kapil Thadani, and Kathleen McKeown. 2011. Identifying event descriptions using co-training with online news summaries. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP’11).Google Scholar
- Min-Ling Zhang and Zhi-Hua Zhou. 2011. CoTrade: Confident co-training with data editing. IEEE Transactions on Systems, Man, and Cybernetics: Part B (Cybernetics) 41, 6, 1612--1626. Google ScholarDigital Library
- Zhi-Hua Zhou. 2009. When semi-supervised learning meets ensemble learning. In Proceedings of the 8th International Workshop on Multiple Classifier Systems (MCS’09). 529--538. DOI:http://dx.doi.org/10.1007/978-3-642-02326-2_53 Google ScholarDigital Library
- Zhi-Hua Zhou and Ming Li. 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 11, 1529--1541. Google ScholarDigital Library
- X. Zhu and A. B. Goldberg. 2009. Introduction to Semi-Supervised Learning. Morgan 8 Claypool.Google Scholar
Index Terms
- Vertical Ensemble Co-Training for Text Classification
Recommendations
Improving Text Classification Accuracy by Training Label Cleaning
In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
DCPE co-training for classification
Co-training is a well-known semi-supervised learning technique that applies two basic learners to train the data source, which uses the most confident unlabeled data to augment labeled data in the learning process. In the paper, we use the diversity of ...
Comments