article

Free Access

Large Scale Multiple Kernel Learning

The Journal of Machine Learning Research Volume 7pp 1531–1565

Published:01 December 2006Publication History

The Journal of Machine Learning Research

Abstract

While classical kernel-based learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semi-infinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and one-class classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million real-world splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox <tt>SHOGUN</tt> for which the source code is publicly available at <tt>http://www.fml.tuebingen.mpg.de/raetsch/projects/shogun</tt>.

References

F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In C. E. Brodley, editor, Twenty-first international conference on Machine learning. ACM, 2004. Google ScholarDigital Library
K. P. Bennett, M. Momma, and M. J. Embrechts. MARK: a boosting algorithm for heterogeneous kernel models. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 24-31. ACM, 2002. Google ScholarDigital Library
J. Bi, T. Zhang, and K. P. Bennett. Column-generation boosting methods for mixture of kernels. In W. Kim, R. Kohavi, J. Gehrke, and W. DuMouchel, editors, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 521-526. ACM, 2004. Google ScholarDigital Library
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004. Google ScholarDigital Library
O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3):131-159, 2002. Google ScholarDigital Library
C. Cortes and V. N. Vapnik. Support-vector networks. Machine Learning, 20(3):273-297, 1995. Google ScholarDigital Library
J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. Technical report #1551, University of Wisconsin Madison, January 2006.Google Scholar
T. Fawcett. Roc graphs: Notes and practical considerations for data mining researchers. Technical report hpl-2003-4, HP Laboratories, Palo Alto, CA, USA, January 2003.Google Scholar
E. Fredkin. Trie memory. Communications of the ACM, 3(9):490-499, 1960. Google ScholarDigital Library
Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in SVMs. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 553- 560, Cambridge, MA, 2003. MIT Press.Google Scholar
R. Hettich and K. O. Kortanek. Semi-infinite programming: Theory, methods and applications. SIAM Review, 3:380-429, 1993. Google ScholarDigital Library
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In C. Nédellec and C. Rouveirol, editors, ECML '98: Proceedings of the 10th European Conference on Machine Learning, Lecture Notes in Computer Science, pages 137-142, Berlin / Heidelberg, 1998. Springer-Verlag. Google ScholarDigital Library
T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C.J.C. Burges, and A. J. Smola, editors, Advances in Kernel Methods -- Support Vector Learning, pages 169-184, Cambridge, MA, USA, 1999. MIT Press. Google ScholarDigital Library
G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Noble. A statistical framework for genomic data fusion. Bioinformatics, 20:2626-2635, 2004. Google ScholarDigital Library
C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In R. B. Altman, A. K. Dunker, L. Hunter, K. Lauderdale, and T. E. Klein, editors, Proceedings of the Pacific Symposium on Biocomputing, pages 564-575, Kaua'i, Hawaii, 2002.Google Scholar
C. Leslie, R. Kuang, and E. Eskin. Inexact matching string kernels for protein classification. In Kernel Methods in Computational Biology, MIT Press series on Computational Molecular Biology, pages 95-112. MIT Press, 2004.Google Scholar
R. Meir and G. Rätsch. An introduction to boosting and leveraging. In S. Mendelson and A. Smola, editors, Proc. of the first Machine Learning Summer School in Canberra, LNCS, pages 119-184. Springer, 2003. Google ScholarDigital Library
C. E. Metz. Basic principles of ROC analysis. Seminars in Nuclear Medicine, VIII(4), October 1978.Google Scholar
C. S. Ong, A. J. Smola, and R. C. Williamson. Hyperkernels. In S. Thrun S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, volume 15, pages 478- 485, Cambridge, MA, 2003. MIT Press.Google Scholar
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods -- Support Vector Learning, pages 185-208, Cambridge, MA, USA, 1999. MIT Press. Google ScholarDigital Library
G. Rätsch. Robust Boosting via Convex Optimization. PhD thesis, University of Potsdam, Potsdam, Germany, 2001.Google Scholar
G. Rätsch and S. Sonnenburg. Accurate Splice Site Prediction for Caenorhabditis Elegans, pages 277-298. MIT Press series on Computational Molecular Biology. MIT Press, 2004.Google Scholar
G. Rätsch and M. K. Warmuth. Efficient margin maximization with boosting. Journal of Machine Learning Research, 6:2131-2152, 2005. Google ScholarDigital Library
G. Rätsch, A. Demiriz, and K. Bennett. Sparse regression ensembles in infinite and finite hypothesis spaces. Machine Learning, 48(1-3):193-221, 2002. Special Issue on New Methods for Model Selection and Model Combination. Also NeuroCOLT2 Technical Report NC-TR-2000-085. Google ScholarDigital Library
G. Rätsch, S. Sonnenburg, and B. Schölkopf. RASE: Recognition of alternatively spliced exons in C. elegans. Bioinformatics, 21:i369-i377, 2005. Google ScholarDigital Library
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
S. Sonnenburg, G. Rätsch, and C. Schäfer. Learning interpretable SVMs for biological sequence classification. In S. Miyano, J. P. Mesirov, S. Kasif, S. Istrail, P. A. Pevzner, and M. Waterman, editors, Research in Computational Molecular Biology, 9th Annual International Conference, RECOMB 2005, volume 3500, pages 389-407. Springer-Verlag Berlin Heidelberg, 2005a. Google ScholarDigital Library
S. Sonnenburg, G. Rätsch, and B. Schölkopf. Large scale genomic sequence SVM classifiers. In L. D. Raedt and S. Wrobel, editors, ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 849-856, New York, NY, USA, 2005b. ACM Press. Google ScholarDigital Library
M. K. Warmuth, J. Liao, and G. Rätsch. Totally corrective boosting algorithms that maximize the margin. In ICML '06: Proceedings of the 23nd international conference on Machine learning. ACM Press, 2006. Google ScholarDigital Library

Index Terms

Large Scale Multiple Kernel Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
      2. Factorization methods
        Canonical correlation analysis
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Regression analysis

Recommendations

Localized multiple kernel learning
ICML '08: Proceedings of the 25th international conference on Machine learning

Recently, instead of selecting a single kernel, multiple kernel learning (MKL) has been proposed which uses a convex combination of kernels, where the weight of each kernel is optimized during training. However, MKL assigns the same weight to a kernel ...
Read More
A pre-selecting base kernel method in multiple kernel learning

The pre-defined base kernel greatly affects the performance of multiple kernel learning (MKL), but selecting the pre-defined base kernel still has no theoretical guidance. In practice, it is very difficult to select a set of appropriate base kernels ...
Read More
Two-stage multiple kernel learning with multiclass kernel polarization

The success of kernel methods is very much dependent on the choice of kernels. Multiple kernel learning (MKL) aims at learning a combination of different kernels in order to better match the underlying problem instead of using a single fixed kernel. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

The Journal of Machine Learning Research Volume 7, Issue
12/1/2006
2725 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
JMLR.org
Publication History
- Published: 1 December 2006
Published in jmlr Volume 7, Issue
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 306
  Total Citations
  View Citations
- 2,037
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large Scale Multiple Kernel Learning

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Localized multiple kernel learning

A pre-selecting base kernel method in multiple kernel learning

Two-stage multiple kernel learning with multiclass kernel polarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Large Scale Multiple Kernel Learning

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Localized multiple kernel learning

A pre-selecting base kernel method in multiple kernel learning

Two-stage multiple kernel learning with multiclass kernel polarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media