ABSTRACT
Support Vector Machines present an interesting and effective approach to solve automated classification tasks. Although it only handles binary and supervised problems by nature, it has been transformed into multiclass and semi-supervised approaches in several works. A previous study on supervised and semi-supervised SVM classification over binary taxonomies showed how the latter clearly outperforms the former, proving the suitability of unlabeled data for the learning phase in this kind of tasks. However, the suitability of unlabeled data for multiclass tasks using SVM has never been tested before. In this work, we present a study on whether unlabeled data could improve results for multiclass web page classification tasks using Support Vector Machines. As a conclusion, we encourage to rely only on labeled data, both for improving (or at least equaling) performance and for reducing the computational cost.
- B. E. Boser, I. Guyon and V. Vapnik. 1992. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the 5th Annual Workshop on computational Learning Theory. Google ScholarDigital Library
- C. Campbell. 2000. Algorithmic Approaches to Training Support Vector Machines: A Survey Proceedings of ESANN'2000, European Symposium on Artificial Neural Networks.Google Scholar
- O. Chapelle, M. Chi y A. Zien 2006. A Continuation Method for Semi-supervised SVMs. Proceedings of ICML'06, the 23rd International Conference on Machine Learning. Google ScholarDigital Library
- O. Chapelle, V. Sindhwani, S. Keerthi 2008. Optimization Techniques for Semi-Supervised Support Vector Machines. J. Mach. Learn. Res.. Google ScholarDigital Library
- C. Cortes and V. Vapnik. 1995. Support Vector Network. Machine Learning. Google ScholarDigital Library
- C.-H. Hsu and C.-J. Lin. 2002. A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks. Google ScholarDigital Library
- T. Joachims. 1998. Text Categorization with Support Vector Machines: Learning with many Relevant Features. Proceedings of ECML98, 10th European Conference on Machine Learning. Google ScholarDigital Library
- T. Joachims. 1999. Transductive Inference for Text Classification Using Support Vector Machines. Proceedings of ICML99, 16th International Conference on Machine Learning. Google ScholarDigital Library
- J. Kivinen and E. J. Smola and R. C. Williamson. 2002. Learning with Kernels.Google Scholar
- T. Mitchell. 1997. Machine Learning. McGraw Hill. Google ScholarDigital Library
- H.-N. Qi, J.-G. Yang, Y.-W. Zhong y C. Deng 2004. Multi-class SVM Based Remote Sensing Image Classification and its Semi-supervised Improvement Scheme. Proceedings of the 3rd ICMLC.Google Scholar
- X. Qi and B. D. Davison. 2007. Web Page Classification: Features and Algorithms. Technical Report LU-CSE-07-010.Google Scholar
- B. Schölkopf and A. Smola. 1999. Advances in Kernel Methods: Support Vector Learning. MIT Press.Google Scholar
- F. Sebastiani. 2002. Machine Learning in Automated Text Categorization. ACM Computing Surveys, pp. 1--47. Google ScholarDigital Library
- M. P. Sinka and D. W. Corne. 2002. A New Benchmark Dataset for Web Document Clustering. Soft Computing Systems.Google Scholar
- C. M. Tan, Y. F. Wang and C. D. Lee. 2002. The Use of Bigrams to Enhance Text Categorization. Information Processing and Management. Google ScholarDigital Library
- J. Weston and C. Watkins. 1999. Multi-class Support Vector Machines. Proceedings of ESAAN, the European Symposium on Artificial Neural Networks.Google Scholar
- L. Xu y D. Schuurmans. 2005. Unsupervised and Semi-supervised Multiclass Support Vector Machines. Proceedings of AAAI'05, the 20th National Conference on Artificial Intelligence. Google ScholarDigital Library
- Z. Xu, R. Jin, J. Zhu, I. King and M. R. Lyu. 2007. Efficient Convex Optimization for Transductive Support Vector Machine. Advances in Neural Information Processing Systems.Google Scholar
- Y. Yajima and T.-F. Kuo. 2006. Optimization Approaches for Semi-Supervised Multiclass Classification. Proceedings of ICDM '06 Workshops, the 6th International Conference on Data Mining. Google ScholarDigital Library
Index Terms
- Is unlabeled data suitable for multiclass SVM-based web page classification?
Recommendations
Employing unlabeled data to improve the classification performance of SVM, and its application in audio event classification
In many classification cases, the labeled samples are difficult to acquire. However, the unlabeled samples are easy to obtain. Active learning (AL) technology can be used to resolve the labeling problem. Among numerous kinds of AL algorithms, the one ...
Exploiting unlabeled data to enhance ensemble diversity
Ensemble learning learns from the training data by generating an ensemble of multiple base learners. It is well-known that to construct a good ensemble with strong generalization ability, the base learners are deemed to be accurate as well as diverse. ...
Efficient multi-class unlabeled constrained semi-supervised SVM
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSemi-supervised learning has been successfully applied to many fields such as knowledge management, information retrieval and data mining as it can utilize both labeled and unlabeled data. In this paper, we propose a general semi-supervised framework ...
Comments