article

Free Access

Variable Sparsity Kernel Learning

Authors:
Jonathan Aflalo

View Profile

,
Aharon Ben-Tal

View Profile

,
Chiranjib Bhattacharyya

View Profile

,
Jagarlapudi Saketha Nath

View Profile

,
Sankaran Raman

View Profile

Authors Info & Claims

The Journal of Machine Learning Research Volume 12pp 565–592

Published:01 February 2011Publication History

The Journal of Machine Learning Research

Abstract

This paper presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l₁ norm regularization for promoting sparsity within RKHS norms of each group and l_s, s≥2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels---hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O(m²n_tot log n_max/ε²) where m is no. training data points, n_max,n_tot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency.

References

F. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the International Conference on Machine Learning, 2004. Google Scholar
A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31:167-175, 2003. Google Scholar
A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization: Analysis, algorithms and engineering applications. MPS/ SIAM Series on Optimization, 1, 2001. Google Scholar
A. Ben-Tal, T. Margalit, and A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal of Optimization, 12(1):79-108, 2001. Google Scholar
A. C. Berg, T. L. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005. Google Scholar
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, second edition, 1999.Google Scholar
C. Blake and C. Merz. Uci repository of machine learning databases, 1998. URL http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google Scholar
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 2006. Google Scholar
A. D'Aspermont. Smooth optimization with approximate gradient. SIAM Journal on Optimization, 19(3):1171-1183, 2008. Google Scholar
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Generative-Model Based Vision, 2004. Google Scholar
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report CNS-TR-2007-001, California Institute of Technology, 2007.Google Scholar
M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. Non-sparse regularization and efficient training with multiple kernels. Technical Report UCB/EECS-2010-21, University of California, Berkeley, 2010.Google Scholar
G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27-72, 2004. Google Scholar
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006. Google Scholar
C. A. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6:1099-1125, 2005. Google Scholar
J. S. Nath, G. Dinesh, S. Raman, C. Bhattacharyya, A. Ben-Tal, and K. R. Ramakrishnan. On the algorithmics and applications of a mixed-norm regularization based kernel learning formulation. In Advances in Neural Information Processing Systems (NIPS), 2009.Google Scholar
M-E. Nilsback and A. Zisserman. A visual vocabulary for flower classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 1447-1454, 2006. Google Scholar
M-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In Proceedings of the Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008. Google Scholar
A. Quattoni, X. Carreras, M. Collins, and T. Darrell. An efficient projection for l _1,¿ regularization. In Proceedings of the International Conference on Machine Learning, 2009. Google Scholar
A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. Simplemkl. Journal of Machine Learning Research, 9:2491-2521, 2008.Google Scholar
R. T. Rockafellar. Minimax theorems and conjugate saddle-functions. Mathematica Scandinava, 14:151-173, 1964.Google Scholar
R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970.Google Scholar
E. Shechtman and M. Irani. Matching local self-similarities across images and videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.Google Scholar
M. Sion. On general minimax theorem. Pacific Journal of Mathematics, 1958.Google Scholar
S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531-1565, 2006. Google Scholar
M. Szafranski, Y. Grandvalet, and A. Rakotomamonjy. Composite kernel learning. In Proceedings of the International Conference on Machine Learning, 2008. Google Scholar
P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimisation. Journal of Optimization Theory and Applications, 2001. Google Scholar
V. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.Google Scholar
M. Varma and D. Ray. Learning the discriminative power invariance trade-off. In Proceedings of the International Conference on Computer Vision, 2007.Google Scholar
A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In Proceedings of the International Conference on Computer Vision, 2009.Google Scholar
H. Zhang, A. Berg, M. Maire, and J. Malik. Svm-knn: Discriminative nearest neighbor classication for visual category recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2126-2136, 2006. Google Scholar

Index Terms

Variable Sparsity Kernel Learning

Recommendations

Multiple kernel learning with NOn-conVex group spArsity

Enforce grouping sparsity penalty to select out discriminative visual features.Propose non-convex penalty to guarantee a consistent selection for features.Introduce sparse canonical correlation analysis to boost image annotation. As the high-dimensional ...
Read More
Structured Variable Selection with Sparsity-Inducing Norms

We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsity-inducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual l₁-norm and ...
Read More
Statistical Learning with Sparsity: The Lasso and Generalizations
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

The Journal of Machine Learning Research Volume 12, Issue
2/1/2011
3426 pages
ISSN:1532-4435
EISSN:1533-7928
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
JMLR.org
Publication History
- Published: 1 February 2011
Published in jmlr Volume 12, Issue
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 214
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Variable Sparsity Kernel Learning

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Multiple kernel learning with NOn-conVex group spArsity

Structured Variable Selection with Sparsity-Inducing Norms

Statistical Learning with Sparsity: The Lasso and Generalizations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Variable Sparsity Kernel Learning

The Journal of Machine Learning Research

Abstract

References

Cited By

Index Terms

Recommendations

Multiple kernel learning with NOn-conVex group spArsity

Structured Variable Selection with Sparsity-Inducing Norms

Statistical Learning with Sparsity: The Lasso and Generalizations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media