Article

Nonlinear adaptive distance metric learning for clustering

Authors:
Jianhui Chen

Arizona State University

Arizona State University
View Profile

,
Zheng Zhao

Arizona State University

Arizona State University
View Profile

,
Jieping Ye

Arizona State University

Arizona State University
View Profile

,
Huan Liu

Arizona State University

Arizona State University
View Profile

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2007Pages 123–132https://doi.org/10.1145/1281192.1281209

Published:12 August 2007Publication History

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 123–132

ABSTRACT

A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a low-dimensional manifold, where geometric relationships such as pairwise distances are preserved. It can be extended to the nonlinear case by applying the kernel trick, which embeds the data into a feature space by specifying the kernel function that computes the dot products between data points in the feature space. In this paper, we propose a novel unsupervised Nonlinear Adaptive Metric Learning algorithm, called NAML, which performs clustering and distance metric learning simultaneously. NAML firstmaps the data to a high-dimensional space through a kernel function; then applies a linear projection to find a low-dimensional manifold where the separability of the data is maximized; and finally performs clustering in the low-dimensional space. The performance of NAML depends on the selection of the kernel function and the projection. We show that the joint kernel learning, dimensionality reduction, and clustering can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results demonstrated the efficacy of the proposed algorithm.

Supplemental Material

p123-ye-200.mov

mov

38.6 MB

Download

p123-ye-768.mov

mov

133 MB

Download

References

E. D. Andersen and K. D. Andersen. The MOSEK interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In T. T. H. Frenk, K. Roos and S. Zhang, editors, High Performance Optimization pages 197--232. Kluwer Academic Publishers, 2000.Google Scholar
A. Argyriou, R. Hauser, C. Micchelli, and M. Pontil. A DC-programming algorithm for kernel selection. In Proceedings of the Twenty-third International Conference on Machine Learning pages 41--48, 2006. Google ScholarDigital Library
F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarDigital Library
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Advances in Neural Information Processing Systems 15, 2003. Google ScholarDigital Library
C. Blake and C. Merz. UCI repository of machine learning databases, 1998.Google Scholar
S. Boyd and L. Vandenberghe. Convex Optimization Cambridge University Press, 2004. Google ScholarDigital Library
I. S. Dhillon, Y. Guan, and B. Kulis. A uni?ed view of kernel k-means, spectral clustering and graph partitioning. Technical report, TR-04-25, UTCS, 2004.Google Scholar
P. Diaconis and D. Freedman. Asymptotics of graphical projection pursuit. Annuals of Statistics 12:793--815, 1984.Google ScholarCross Ref
C. Ding and T. Li. Adaptive dimension reduction using discriminant analysis and k-means clustering. In Proceedings of the Twenty-fourth International Conference on Machine Learning 2007. Google ScholarDigital Library
C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensonal data. In Proceedings of the SIAM International Conference on Data Mining 2004.Google ScholarCross Ref
J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association 84(405):165--175, 1989.Google ScholarCross Ref
K. Fukunaga. Introduction to Statistical Pattern Recognition Academic Press, 2nd edition, 1990. Google ScholarDigital Library
G. Fung, M. Dundar, J. Bi, and B. Rao. A fast iterative algorithm for Fisher discriminant using heterogeneous kernels. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarDigital Library
G. H. Golub and C. F. Van Loan. Matrix Computations The Johns Hopkins University Press, third edition, 1996.Google Scholar
P. Hall and K. Li. On almost linearity of low dimensional projections from high dimensional data. Annuals of Statistics 21:867--889, 1993.Google ScholarCross Ref
T. Jebara. Multi-task feature and kernel selection for SVMs. In Proceedings of the Twenty-first International Conference on Machine Learning 2004. Google ScholarDigital Library
I. Jolliffe. Principal Component Analysis Springer; 2nd edition, 2002.Google Scholar
S.-J. Kim, A. Magnani, and S. Boyd. Optimal kernel selection in kernel Fisher discriminant analysis. In Proceedings of the Twenty-third International Conference on Machine Learning pages 465--472, 2006. Google ScholarDigital Library
F. D. la Torre Frade and T. Kanade. Discriminative cluster analysis. In Proceedings of the Twenty-third International Conference on Machine Learning pages 241--248, 2006. Google ScholarDigital Library
G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semide finite programming. Journal of Machine Learning Research 5:27--72, 2004. Google ScholarDigital Library
D. Lewis, T. Jebara, and W. S. Noble. Nonstationary kernel combination. In Proceedings of the Twenty-third Track Paper International Conference on Machine Learning pages 553--560, 2006. Google ScholarDigital Library
L. Lovasz and M. Plummer. Matching Theory North Holland, 1986. Google ScholarDigital Library
L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4:119--155, 2003. Google ScholarDigital Library
S. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond MIT Press, 2002.Google Scholar
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis Cambridge University Press, 2004. Google ScholarDigital Library
N. Shental, T. Hertz, D. Weinshall, and M. Pavel. Adjustment learning and relevant component analysis. In Proceedings of the Seventh European Conference on Computer Vision pages 776--792, London, UK, 2002. Springer-Verlag. Google ScholarDigital Library
A. Smola and I. Kondor. Kernels and regularization on graphs. In Proceedings of the Annual Conference on Computational Learning Theory 2003.Google ScholarCross Ref
S. Sonnenburg, G. R ätsch,C.Schäfer, and B. Schölkopf. Large Scale Multiple Kernel Learning. Journal of Machine Learning Research 7:1531--1565, July 2006. Google ScholarDigital Library
J. Tenenbaum, V. de Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319--2323, 2000.Google ScholarCross Ref
H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In Advances in Neural Information Processing Systems 2006.Google Scholar
K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Advances in Neural Information Processing Systems 2005.Google Scholar
E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 2002.Google ScholarDigital Library
B. Yan and C. Domeniconi. An adaptive kernel method for semi-supervised clustering. In Proceedings of the Seventeenth European Conference on Machine Learning 2006. Google ScholarDigital Library
L. Yang and R. Jin. Distance metric learning: A comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University, 2006.Google Scholar
L. Yang, R. Jin, R. Sukthankar, and Y. Liu. An efficient algorithm for local distance metric learning. In Proceedings of the Twenty-first National Conference on Artificial Intelligence 2006. Google ScholarDigital Library
J. Ye, S. Ji, and J. Chen. Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming. In Proceedings of the Thirteenth ACM SIGKDD International Conference On Knowledge Discovery and Data Mining 2007. Google ScholarDigital Library
J. Ye, Z. Zhao, and H. Liu. Adaptive distance metric learning for clustering. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2007.Google ScholarCross Ref
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. In Advances in Neural Information Processing Systems 2001.Google Scholar

Index Terms

Nonlinear adaptive distance metric learning for clustering
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Metric learning by discriminant neighborhood embedding

In this paper, we learn a distance metric in favor of classification task from available labeled samples. Multi-class data points are supposed to be pulled or pushed by discriminant neighbors. We define a discriminant adjacent matrix in favor of ...
Read More
Applying inverse stereographic projection to manifold learning and clustering
Abstract
In machine learning, a data set is often viewed as a point set distributed on a manifold. Using Euclidean norms to measure the proximity of this data set reduces the efficiency of learning methods. Also, many algorithms like Laplacian Eigenmaps or ...
Read More
Distance metric learning guided adaptive subspace semi-supervised clustering

Most existing semi-supervised clustering algorithms are not designed for handling high-dimensional data. On the other hand, semi-supervised dimensionality reduction methods may not necessarily improve the clustering performance, due to the fact that the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2007
1080 pages
ISBN:9781595936097
DOI:10.1145/1281192
General Chair:
Pavel Berkhin
Yahoo!, USA
,
Program Chairs:
Rich Caruana
Cornell University, USA
,
Xindong Wu
University of Vermont, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
convex programming
distance metric
kernel
Qualifiers
- Article
Conference

Acceptance Rates
KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 1,499
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Nonlinear adaptive distance metric learning for clustering

KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Metric learning by discriminant neighborhood embedding

Applying inverse stereographic projection to manifold learning and clustering

Distance metric learning guided adaptive subspace semi-supervised clustering