ABSTRACT
A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.
Supplemental Material
- E. D. Andersen and K. D. Andersen. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In T. T. H. Frenk, K. Roos and S. Zhang, editors, High Performance Optimization pages 197--232. Kluwer Academic Publishers, 2000.Google Scholar
- A. Argyriou, R. Hauser, C. Micchelli, and M. Pontil. A DC-programming algorithm for kernel selection. In Proceedings of the Twenty-third International Conference on Machine Learning pages 41--48, 2006. Google ScholarDigital Library
- F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarDigital Library
- M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Advances in Neural Information Processing Systems 15, 2003. Google ScholarDigital Library
- C. Blake and C. Merz. UCI repository of machine learning databases, 1998.Google Scholar
- S. Boyd and L. Vandenberghe. Convex Optimization Cambridge University Press, 2004. Google ScholarDigital Library
- I. S. Dhillon, Y. Guan, and B. Kulis. A uni?ed view of kernel k-means, spectral clustering and graph partitioning. Technical report, TR-04-25, UTCS, 2004.Google Scholar
- P. Diaconis and D. Freedman. Asymptotics of graphical projection pursuit. Annuals of Statistics 12:793--815, 1984.Google ScholarCross Ref
- C. Ding and T. Li. Adaptive dimension reduction using discriminant analysis and k-means clustering. In Proceedings of the Twenty-fourth International Conference on Machine Learning 2007. Google ScholarDigital Library
- C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensonal data. In Proceedings of the SIAM International Conference on Data Mining 2004.Google ScholarCross Ref
- J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association 84(405):165--175, 1989.Google ScholarCross Ref
- K. Fukunaga. Introduction to Statistical Pattern Recognition Academic Press, 2nd edition, 1990. Google ScholarDigital Library
- G. Fung, M. Dundar, J. Bi, and B. Rao. A fast iterative algorithm for Fisher discriminant using heterogeneous kernels. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarDigital Library
- G. H. Golub and C. F. Van Loan. Matrix Computations The Johns Hopkins University Press, third edition, 1996.Google Scholar
- P. Hall and K. Li. On almost linearity of low dimensional projections from high dimensional data. Annuals of Statistics 21:867--889, 1993.Google ScholarCross Ref
- T. Jebara. Multi-task feature and kernel selection for SVMs. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarDigital Library
- I. Jolliffe. Principal Component Analysis Springer; 2nd edition, 2002.Google Scholar
- S.-J. Kim, A. Magnani, and S. Boyd. Optimal kernel selection in kernel Fisher discriminant analysis. In Proceedings of the Twenty-third International Conference on Machine Learning pages 465--472, 2006. Google ScholarDigital Library
- F. D. la Torre Frade and T. Kanade. Discriminative cluster analysis. In Proceedings of the Twenty-third International Conference on Machine Learning pages 241--248, 2006. Google ScholarDigital Library
- G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semide finite programming. Journal of Machine Learning Research 5:27--72, 2004. Google ScholarDigital Library
- D. Lewis, T. Jebara, and W. S. Noble. Nonstationary kernel combination. In Proceedings of the Twenty-third Track Paper International Conference on Machine Learning pages 553--560, 2006. Google ScholarDigital Library
- L. Lovasz and M. Plummer. Matching Theory North Holland, 1986. Google ScholarDigital Library
- L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4:119--155, 2003. Google ScholarDigital Library
- S. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond MIT Press, 2002.Google Scholar
- J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis Cambridge University Press, 2004. Google ScholarDigital Library
- N. Shental, T. Hertz, D. Weinshall, and M. Pavel. Adjustment learning and relevant component analysis. In Proceedings of the Seventh European Conference on Computer Vision pages 776--792, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
- A. Smola and I. Kondor. Kernels and regularization on graphs. In Proceedings of the Annual Conference on Computational Learning Theory 2003.Google ScholarCross Ref
- S. Sonnenburg, G. R ätsch,C.Schäfer, and B. Schölkopf. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research 7:1531--1565, July 2006. Google ScholarDigital Library
- J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319--2323, 2000.Google ScholarCross Ref
- H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In Advances in Neural Information Processing Systems 2006.Google Scholar
- K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems 2005.Google Scholar
- E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 2002.Google ScholarDigital Library
- B. Yan and C. Domeniconi. An adaptive kernel method for semi-supervised clustering. In Proceedings of the Seventeenth European Conference on Machine Learning 2006. Google ScholarDigital Library
- L. Yang and R. Jin. Distance metric learning: A comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University, 2006.Google Scholar
- L. Yang, R. Jin, R. Sukthankar, and Y. Liu. An efficient algorithm for local distance metric learning. In Proceedings of the Twenty-first National Conference on Artificial Intelligence 2006. Google ScholarDigital Library
- J. Ye, S. Ji, and J. Chen. Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming. In Proceedings of the Thirteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining 2007. Google ScholarDigital Library
- J. Ye, Z. Zhao, and H. Liu. Adaptive distance metric learning for clustering. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2007.Google ScholarCross Ref
- H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems 2001.Google Scholar
Index Terms
- Nonlinear adaptive distance metric learning for clustering
Recommendations
Metric learning by discriminant neighborhood embedding
In this paper, we learn a distance metric in favor of classification task from available labeled samples. Multi-class data points are supposed to be pulled or pushed by discriminant neighbors. We define a discriminant adjacent matrix in favor of ...
Applying inverse stereographic projection to manifold learning and clustering
AbstractIn machine learning, a data set is often viewed as a point set distributed on a manifold. Using Euclidean norms to measure the proximity of this data set reduces the efficiency of learning methods. Also, many algorithms like Laplacian Eigenmaps or ...
Distance metric learning guided adaptive subspace semi-supervised clustering
Most existing semi-supervised clustering algorithms are not designed for handling high-dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the ...
Comments