skip to main content
10.5555/1621829.1621833dlproceedingsArticle/Chapter ViewAbstractPublication PagessemisuplearnConference Proceedingsconference-collections
research-article
Free Access

Is unlabeled data suitable for multiclass SVM-based web page classification?

Published:04 June 2009Publication History

ABSTRACT

Support Vector Machines present an interesting and effective approach to solve automated classification tasks. Although it only handles binary and supervised problems by nature, it has been transformed into multiclass and semi-supervised approaches in several works. A previous study on supervised and semi-supervised SVM classification over binary taxonomies showed how the latter clearly outperforms the former, proving the suitability of unlabeled data for the learning phase in this kind of tasks. However, the suitability of unlabeled data for multiclass tasks using SVM has never been tested before. In this work, we present a study on whether unlabeled data could improve results for multiclass web page classification tasks using Support Vector Machines. As a conclusion, we encourage to rely only on labeled data, both for improving (or at least equaling) performance and for reducing the computational cost.

References

  1. B. E. Boser, I. Guyon and V. Vapnik. 1992. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the 5th Annual Workshop on computational Learning Theory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Campbell. 2000. Algorithmic Approaches to Training Support Vector Machines: A Survey Proceedings of ESANN'2000, European Symposium on Artificial Neural Networks.Google ScholarGoogle Scholar
  3. O. Chapelle, M. Chi y A. Zien 2006. A Continuation Method for Semi-supervised SVMs. Proceedings of ICML'06, the 23rd International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Chapelle, V. Sindhwani, S. Keerthi 2008. Optimization Techniques for Semi-Supervised Support Vector Machines. J. Mach. Learn. Res.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cortes and V. Vapnik. 1995. Support Vector Network. Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-H. Hsu and C.-J. Lin. 2002. A Comparison of Methods for Multiclass Support Vector Machines. IEEE Transactions on Neural Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Joachims. 1998. Text Categorization with Support Vector Machines: Learning with many Relevant Features. Proceedings of ECML98, 10th European Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Joachims. 1999. Transductive Inference for Text Classification Using Support Vector Machines. Proceedings of ICML99, 16th International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Kivinen and E. J. Smola and R. C. Williamson. 2002. Learning with Kernels.Google ScholarGoogle Scholar
  10. T. Mitchell. 1997. Machine Learning. McGraw Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H.-N. Qi, J.-G. Yang, Y.-W. Zhong y C. Deng 2004. Multi-class SVM Based Remote Sensing Image Classification and its Semi-supervised Improvement Scheme. Proceedings of the 3rd ICMLC.Google ScholarGoogle Scholar
  12. X. Qi and B. D. Davison. 2007. Web Page Classification: Features and Algorithms. Technical Report LU-CSE-07-010.Google ScholarGoogle Scholar
  13. B. Schölkopf and A. Smola. 1999. Advances in Kernel Methods: Support Vector Learning. MIT Press.Google ScholarGoogle Scholar
  14. F. Sebastiani. 2002. Machine Learning in Automated Text Categorization. ACM Computing Surveys, pp. 1--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. P. Sinka and D. W. Corne. 2002. A New Benchmark Dataset for Web Document Clustering. Soft Computing Systems.Google ScholarGoogle Scholar
  16. C. M. Tan, Y. F. Wang and C. D. Lee. 2002. The Use of Bigrams to Enhance Text Categorization. Information Processing and Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Weston and C. Watkins. 1999. Multi-class Support Vector Machines. Proceedings of ESAAN, the European Symposium on Artificial Neural Networks.Google ScholarGoogle Scholar
  18. L. Xu y D. Schuurmans. 2005. Unsupervised and Semi-supervised Multiclass Support Vector Machines. Proceedings of AAAI'05, the 20th National Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Xu, R. Jin, J. Zhu, I. King and M. R. Lyu. 2007. Efficient Convex Optimization for Transductive Support Vector Machine. Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  20. Y. Yajima and T.-F. Kuo. 2006. Optimization Approaches for Semi-Supervised Multiclass Classification. Proceedings of ICDM '06 Workshops, the 6th International Conference on Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Is unlabeled data suitable for multiclass SVM-based web page classification?

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image DL Hosted proceedings
              SemiSupLearn '09: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
              June 2009
              96 pages
              ISBN:9781932432381

              Publisher

              Association for Computational Linguistics

              United States

              Publication History

              • Published: 4 June 2009

              Qualifiers

              • research-article

              Acceptance Rates

              SemiSupLearn '09 Paper Acceptance Rate10of17submissions,59%Overall Acceptance Rate10of17submissions,59%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader