Abstract
This paper presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls, s≥2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels---hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O(m2ntot log nmax/ε2) where m is no. training data points, nmax,ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency.
- F. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the International Conference on Machine Learning, 2004. Google Scholar
- A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31:167-175, 2003. Google Scholar
- A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization: Analysis, algorithms and engineering applications. MPS/ SIAM Series on Optimization, 1, 2001. Google Scholar
- A. Ben-Tal, T. Margalit, and A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal of Optimization, 12(1):79-108, 2001. Google Scholar
- A. C. Berg, T. L. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005. Google Scholar
- D. P. Bertsekas. Nonlinear Programming. Athena Scientific, second edition, 1999.Google Scholar
- C. Blake and C. Merz. Uci repository of machine learning databases, 1998. URL http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Google Scholar
- T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 2006. Google Scholar
- A. D'Aspermont. Smooth optimization with approximate gradient. SIAM Journal on Optimization, 19(3):1171-1183, 2008. Google Scholar
- L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Generative-Model Based Vision, 2004. Google Scholar
- G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report CNS-TR-2007-001, California Institute of Technology, 2007.Google Scholar
- M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. Non-sparse regularization and efficient training with multiple kernels. Technical Report UCB/EECS-2010-21, University of California, Berkeley, 2010.Google Scholar
- G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27-72, 2004. Google Scholar
- S. Lazebnik, C. Schmid, and J. Ponce. Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006. Google Scholar
- C. A. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6:1099-1125, 2005. Google Scholar
- J. S. Nath, G. Dinesh, S. Raman, C. Bhattacharyya, A. Ben-Tal, and K. R. Ramakrishnan. On the algorithmics and applications of a mixed-norm regularization based kernel learning formulation. In Advances in Neural Information Processing Systems (NIPS), 2009.Google Scholar
- M-E. Nilsback and A. Zisserman. A visual vocabulary for flower classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 1447-1454, 2006. Google Scholar
- M-E. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In Proceedings of the Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008. Google Scholar
- A. Quattoni, X. Carreras, M. Collins, and T. Darrell. An efficient projection for l 1,¿ regularization. In Proceedings of the International Conference on Machine Learning, 2009. Google Scholar
- A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. Simplemkl. Journal of Machine Learning Research, 9:2491-2521, 2008.Google Scholar
- R. T. Rockafellar. Minimax theorems and conjugate saddle-functions. Mathematica Scandinava, 14:151-173, 1964.Google Scholar
- R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970.Google Scholar
- E. Shechtman and M. Irani. Matching local self-similarities across images and videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007.Google Scholar
- M. Sion. On general minimax theorem. Pacific Journal of Mathematics, 1958.Google Scholar
- S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531-1565, 2006. Google Scholar
- M. Szafranski, Y. Grandvalet, and A. Rakotomamonjy. Composite kernel learning. In Proceedings of the International Conference on Machine Learning, 2008. Google Scholar
- P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimisation. Journal of Optimization Theory and Applications, 2001. Google Scholar
- V. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.Google Scholar
- M. Varma and D. Ray. Learning the discriminative power invariance trade-off. In Proceedings of the International Conference on Computer Vision, 2007.Google Scholar
- A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In Proceedings of the International Conference on Computer Vision, 2009.Google Scholar
- H. Zhang, A. Berg, M. Maire, and J. Malik. Svm-knn: Discriminative nearest neighbor classication for visual category recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2126-2136, 2006. Google Scholar
Index Terms
- Variable Sparsity Kernel Learning
Recommendations
Multiple kernel learning with NOn-conVex group spArsity
Enforce grouping sparsity penalty to select out discriminative visual features.Propose non-convex penalty to guarantee a consistent selection for features.Introduce sparse canonical correlation analysis to boost image annotation. As the high-dimensional ...
Structured Variable Selection with Sparsity-Inducing Norms
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsity-inducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual l1-norm and ...
Comments