skip to main content
10.1145/1281192.1281209acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Nonlinear adaptive distance metric learning for clustering

Published:12 August 2007Publication History

ABSTRACT

A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.

Skip Supplemental Material Section

Supplemental Material

p123-ye-200.mov

mov

38.6 MB

p123-ye-768.mov

mov

133 MB

References

  1. E. D. Andersen and K. D. Andersen. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In T. T. H. Frenk, K. Roos and S. Zhang, editors, High Performance Optimization pages 197--232. Kluwer Academic Publishers, 2000.Google ScholarGoogle Scholar
  2. A. Argyriou, R. Hauser, C. Micchelli, and M. Pontil. A DC-programming algorithm for kernel selection. In Proceedings of the Twenty-third International Conference on Machine Learning pages 41--48, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Advances in Neural Information Processing Systems 15, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Blake and C. Merz. UCI repository of machine learning databases, 1998.Google ScholarGoogle Scholar
  6. S. Boyd and L. Vandenberghe. Convex Optimization Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. S. Dhillon, Y. Guan, and B. Kulis. A uni?ed view of kernel k-means, spectral clustering and graph partitioning. Technical report, TR-04-25, UTCS, 2004.Google ScholarGoogle Scholar
  8. P. Diaconis and D. Freedman. Asymptotics of graphical projection pursuit. Annuals of Statistics 12:793--815, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Ding and T. Li. Adaptive dimension reduction using discriminant analysis and k-means clustering. In Proceedings of the Twenty-fourth International Conference on Machine Learning 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensonal data. In Proceedings of the SIAM International Conference on Data Mining 2004.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association 84(405):165--175, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  12. K. Fukunaga. Introduction to Statistical Pattern Recognition Academic Press, 2nd edition, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Fung, M. Dundar, J. Bi, and B. Rao. A fast iterative algorithm for Fisher discriminant using heterogeneous kernels. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. H. Golub and C. F. Van Loan. Matrix Computations The Johns Hopkins University Press, third edition, 1996.Google ScholarGoogle Scholar
  15. P. Hall and K. Li. On almost linearity of low dimensional projections from high dimensional data. Annuals of Statistics 21:867--889, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  16. T. Jebara. Multi-task feature and kernel selection for SVMs. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Jolliffe. Principal Component Analysis Springer; 2nd edition, 2002.Google ScholarGoogle Scholar
  18. S.-J. Kim, A. Magnani, and S. Boyd. Optimal kernel selection in kernel Fisher discriminant analysis. In Proceedings of the Twenty-third International Conference on Machine Learning pages 465--472, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. D. la Torre Frade and T. Kanade. Discriminative cluster analysis. In Proceedings of the Twenty-third International Conference on Machine Learning pages 241--248, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semide finite programming. Journal of Machine Learning Research 5:27--72, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Lewis, T. Jebara, and W. S. Noble. Nonstationary kernel combination. In Proceedings of the Twenty-third Track Paper International Conference on Machine Learning pages 553--560, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Lovasz and M. Plummer. Matching Theory North Holland, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4:119--155, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond MIT Press, 2002.Google ScholarGoogle Scholar
  25. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Shental, T. Hertz, D. Weinshall, and M. Pavel. Adjustment learning and relevant component analysis. In Proceedings of the Seventh European Conference on Computer Vision pages 776--792, London, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Smola and I. Kondor. Kernels and regularization on graphs. In Proceedings of the Annual Conference on Computational Learning Theory 2003.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Sonnenburg, G. R ätsch,C.Schäfer, and B. Schölkopf. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research 7:1531--1565, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319--2323, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  30. H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In Advances in Neural Information Processing Systems 2006.Google ScholarGoogle Scholar
  31. K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems 2005.Google ScholarGoogle Scholar
  32. E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. Yan and C. Domeniconi. An adaptive kernel method for semi-supervised clustering. In Proceedings of the Seventeenth European Conference on Machine Learning 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Yang and R. Jin. Distance metric learning: A comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University, 2006.Google ScholarGoogle Scholar
  35. L. Yang, R. Jin, R. Sukthankar, and Y. Liu. An efficient algorithm for local distance metric learning. In Proceedings of the Twenty-first National Conference on Artificial Intelligence 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Ye, S. Ji, and J. Chen. Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming. In Proceedings of the Thirteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Ye, Z. Zhao, and H. Liu. Adaptive distance metric learning for clustering. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2007.Google ScholarGoogle ScholarCross RefCross Ref
  38. H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems 2001.Google ScholarGoogle Scholar

Index Terms

  1. Nonlinear adaptive distance metric learning for clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2007
      1080 pages
      ISBN:9781595936097
      DOI:10.1145/1281192

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader