skip to main content
article
Free Access

Large Scale Multiple Kernel Learning

Published:01 December 2006Publication History
Skip Abstract Section

Abstract

While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million real-world splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox <tt>SHOGUN</tt> for which the source code is publicly available at <tt>http://www.fml.tuebingen.mpg.de/raetsch/projects/shogun</tt>.

References

  1. F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In C. E. Brodley, editor, Twenty-first international conference on Machine learning. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. P. Bennett, M. Momma, and M. J. Embrechts. MARK: a boosting algorithm for heterogeneous kernel models. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 24-31. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Bi, T. Zhang, and K. P. Bennett. Column-generation boosting methods for mixture of kernels. In W. Kim, R. Kohavi, J. Gehrke, and W. DuMouchel, editors, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 521-526. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3):131-159, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Cortes and V. N. Vapnik. Support-vector networks. Machine Learning, 20(3):273-297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. Technical report #1551, University of Wisconsin Madison, January 2006.Google ScholarGoogle Scholar
  8. T. Fawcett. Roc graphs: Notes and practical considerations for data mining researchers. Technical report hpl-2003-4, HP Laboratories, Palo Alto, CA, USA, January 2003.Google ScholarGoogle Scholar
  9. E. Fredkin. Trie memory. Communications of the ACM, 3(9):490-499, 1960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in SVMs. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 553- 560, Cambridge, MA, 2003. MIT Press.Google ScholarGoogle Scholar
  11. R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods and applications. SIAM Review, 3:380-429, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In C. Nédellec and C. Rouveirol, editors, ECML '98: Proceedings of the 10th European Conference on Machine Learning, Lecture Notes in Computer Science, pages 137-142, Berlin / Heidelberg, 1998. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C.J.C. Burges, and A. J. Smola, editors, Advances in Kernel Methods -- Support Vector Learning, pages 169-184, Cambridge, MA, USA, 1999. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble. A statistical framework for genomic data fusion. Bioinformatics, 20:2626-2635, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In R. B. Altman, A. K. Dunker, L. Hunter, K. Lauderdale, and T. E. Klein, editors, Proceedings of the Pacific Symposium on Biocomputing, pages 564-575, Kaua'i, Hawaii, 2002.Google ScholarGoogle Scholar
  16. C. Leslie, R. Kuang, and E. Eskin. Inexact matching string kernels for protein classification. In Kernel Methods in Computational Biology, MIT Press series on Computational Molecular Biology, pages 95-112. MIT Press, 2004.Google ScholarGoogle Scholar
  17. R. Meir and G. Rätsch. An introduction to boosting and leveraging. In S. Mendelson and A. Smola, editors, Proc. of the first Machine Learning Summer School in Canberra, LNCS, pages 119-184. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. E. Metz. Basic principles of ROC analysis. Seminars in Nuclear Medicine, VIII(4), October 1978.Google ScholarGoogle Scholar
  19. C. S. Ong, A. J. Smola, and R. C. Williamson. Hyperkernels. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, volume 15, pages 478- 485, Cambridge, MA, 2003. MIT Press.Google ScholarGoogle Scholar
  20. J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods -- Support Vector Learning, pages 185-208, Cambridge, MA, USA, 1999. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Rätsch. Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, Potsdam, Germany, 2001.Google ScholarGoogle Scholar
  22. G. Rätsch and S. Sonnenburg. Accurate Splice Site Prediction for Caenorhabditis Elegans, pages 277-298. MIT Press series on Computational Molecular Biology. MIT Press, 2004.Google ScholarGoogle Scholar
  23. G. Rätsch and M. K. Warmuth. Efficient margin maximization with boosting. Journal of Machine Learning Research, 6:2131-2152, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Rätsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in infinite and finite hypothesis spaces. Machine Learning, 48(1-3):193-221, 2002. Special Issue on New Methods for Model Selection and Model Combination. Also NeuroCOLT2 Technical Report NC-TR-2000-085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Rätsch, S. Sonnenburg, and B. Schölkopf. RASE: Recognition of alternatively spliced exons in C. elegans. Bioinformatics, 21:i369-i377, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google ScholarGoogle Scholar
  27. S. Sonnenburg, G. Rätsch, and C. Schäfer. Learning interpretable SVMs for biological sequence classification. In S. Miyano, J. P. Mesirov, S. Kasif, S. Istrail, P. A. Pevzner, and M. Waterman, editors, Research in Computational Molecular Biology, 9th Annual International Conference, RECOMB 2005, volume 3500, pages 389-407. Springer-Verlag Berlin Heidelberg, 2005a. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sonnenburg, G. Rätsch, and B. Schölkopf. Large scale genomic sequence SVM classifiers. In L. D. Raedt and S. Wrobel, editors, ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 849-856, New York, NY, USA, 2005b. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. K. Warmuth, J. Liao, and G. Rätsch. Totally corrective boosting algorithms that maximize the margin. In ICML '06: Proceedings of the 23nd international conference on Machine learning. ACM Press, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large Scale Multiple Kernel Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image The Journal of Machine Learning Research
            The Journal of Machine Learning Research  Volume 7, Issue
            12/1/2006
            2725 pages
            ISSN:1532-4435
            EISSN:1533-7928
            Issue’s Table of Contents

            Publisher

            JMLR.org

            Publication History

            • Published: 1 December 2006
            Published in jmlr Volume 7, Issue

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader