Abstract
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE---though capable of generating highly nonlinear embeddings---are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithm's performance---both successes and failures---and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.
- H. Attias. Independent factor analysis. Neural Computation, 11(4):803-851, 1999.]] Google Scholar
- Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Society for Industrial and Applied Mathematics, Philadelphia, 2000.]] Google Scholar
- M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 585-591, Cambridge, MA, 2002. MIT Press.]]Google Scholar
- Y. Bengio, P. Vincent, and J.F. Paiement. Learning eigenfunctions of similarity: linking spectral clustering and kernel PCA. Technical Report 1232, Departement d'Informatique et Recherche Oprationnelle, Universite de Montreal, 2003.]]Google Scholar
- D. Beymer and T. Poggio. Image representation for visual learning. Science, 272:1905, 1996.]]Google Scholar
- C. Bishop, M. Svensen, and C. Williams. GTM: The generative topographic mapping. Neural Computation, 10:215, 1998.]] Google Scholar
- C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1996.]] Google Scholar
- M. Brand. Charting a manifold. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.]]Google Scholar
- M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. In C. M. Bishop and B. J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, January 2003.]]Google Scholar
- C. Bregler and S. Omohundro. Nonlinear image interpolation using manifold learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 973-980, Cambridge, MA, 1995. MIT Press.]]Google Scholar
- E. Cosatto and H. P. Graf. Sample-based synthesis of photo-realistic talking-heads. In Proceedings of Computer Animation, pages 103-110. IEEE Computer Society, 1998.]] Google Scholar
- T. Cox and M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994.]]Google Scholar
- P. Dayan, G. E. Hinton, R. M. Neal, and R. S. Zemel. The Helmholtz machine. Neural Computation, 7(5):889-904, 1995.]] Google Scholar
- V. de Silva and J. Tenenbaum. Unsupervised learning of curved manifolds. In Proceedings of the MSRI workshop on nonlinear estimation and classification. Springer Verlag, 2002.]]Google Scholar
- D. DeCoste. Visualizing Mercel kernel feature spaces via kernelized locally linear embedding. In Proceedings of the Eighth International Conference on Neural Information Processing (ICONIP-01), Shanghai, China, November 2001.]]Google Scholar
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990.]]Google Scholar
- D. DeMers and G.W. Cottrell. Nonlinear dimensionality reduction. In D. Hanson, J. Cowan, and L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 580-587, San Mateo, CA, 1993. Morgan Kaufmann.]] Google Scholar
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-37, 1977.]]Google Scholar
- D. L. Donoho and C. E. Grimes. When does Isomap recover the natural parameterization of families of articulated images? Technical Report 2002-27, Department of Statistics, Stanford University, August 2002.]]Google Scholar
- D. L. Donoho and C. E. Grimes. Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Arts and Sciences, 100:5591-5596, 2003.]]Google Scholar
- R. Durbin and D. Wilshaw. An analogue approach to the traveling salesman problem using an elastic net method. Nature, 326:689-691, 1987.]]Google Scholar
- D.R. Fokkema, G.L.G. Sleijpen, and H.A. van der Vorst. Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM Journal Scientific Computing, 20(1):94-125, 1998.]] Google Scholar
- J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3:209-226, 1977.]] Google Scholar
- K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):176-193, 1971.]]Google Scholar
- Z. Ghahramani and G. E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1 (revised February 1997), Department of Computer Science, University of Toronto, May 1996.]]Google Scholar
- A. G. Gray and A. W. Moore. N-Body problems in statistical learning. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 521-527, Cambridge, MA, 2001. MIT Press.]]Google Scholar
- J. H. Ham, D. D. Lee, and L. K. Saul. Learning high dimensional correspondences from low dimensional manifolds. Submitted for publication, 2003.]]Google Scholar
- T. J. Hastie and W. Stuetzle. Principal curves and surfaces. Journal of the American Statistical Association, 84(406):502-516, 1989.]]Google Scholar
- G. E. Hinton, P. Dayan, and M. Revow. Modeling the manifolds of handwritten digits. IEEE Transactions on Neural Networks, 8:65-74, 1997.]]Google Scholar
- G. E. Hinton and M. Revow. Using mixtures of factor analyzers for segmentation and pose estimation. Unpublished, 1998.]]Google Scholar
- G. E. Hinton and T. J. Sejnowski, editors. Unsupervised Learning and Map Formation: Foundations of Neural Computation. MIT Press, Cambridge, MA, 1999.]] Google Scholar
- R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1990.]] Google Scholar
- J. J. Hull. A database for handwritten text recognition research. IEEE Transaction on Pattern Analysis and Machine Intelligence, 16(5):550-554, May 1994.]] Google Scholar
- A. Hyvärinen. Independent component analysis in the presence of gaussian noise by maximizing joint likelihood. Neurocomputing, 22:49-67, 1998.]]Google Scholar
- P. Indyk. Dimensionality reduction techniques for proximity problems. In Proceedings of the Eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA '00), pages 371-378, 2000.]] Google Scholar
- I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.]]Google Scholar
- G. G. Judge and T. T. Takayama. Inequality restrictions in regression analysis. Journal of the American Statistical Association, 61(313):166-181, March 1966.]]Google Scholar
- N. Kambhatla and T. K. Leen. Dimension reduction by local principal component analysis. Neural Computation, 9:1493-1516, 1997.]] Google Scholar
- D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the Thirty Fourth ACM Symposium on the Theory of Computing (STOC '02), pages 741-750, 2002.]] Google Scholar
- B. Kegl. Intrinsic dimension estimation using packing numbers. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.]]Google Scholar
- H. Klock and J. Buhmann. Data visualization by multidimensional scaling: a deterministic annealing approach. Pattern Recognition, 33:651, 1999.]]Google Scholar
- T. Kohonen. Self-organization and Associative Memory. Springer-Verlag, Berlin, 1988.]] Google Scholar
- M. Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37:233, 1991.]]Google Scholar
- Y. LeCun, L. Bottou, G. Orr, and K.-R. Müller. Efficient backprop. In G. Orr and Müller K.-R., editors, Neural Networks: Tricks of the trade. Springer, 1998.]] Google Scholar
- M. L. Littman, D. F. Swayne, N. Dean, and A. Buja. Visualizing the embedding of objects in Euclidean space. In H. J. N. Newton, editor, Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface, pages 208-217. Interface Foundation of North America, 1992.]]Google Scholar
- T. Martinetz and K. Schulten. Topology representing networks. Neural Networks, 7:507, 1994.]] Google Scholar
- G. McLachlan and K. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988.]]Google Scholar
- M. Meila and J. Shi. Learning segmentation by random walks. In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 873-879, Cambridge, MA, 2000. MIT Press.]]Google Scholar
- A. W. Moore, A. Connolly, C. Genovese, A. Gray, L. Grone, N. Kanidoris II, R. Nichol, J. Schneider, A. Szalay, I. Szapudi, and L. Wasserman. Fast algorithms and efficient statistics: N-point correlation functions. In Proceedings of the MPA/MPE/ESO Conference on Mining the Sky, 2000.]]Google Scholar
- A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 849-856, Cambridge, MA, 2002. MIT Press.]]Google Scholar
- S. Omohundro. Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute, December 1989.]]Google Scholar
- S. Omohundro. Bumptrees for efficient function, constraint, and classification learning. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing 3, pages 693-699, San Mateo, CA, 1991. Morgan Kaufmann.]] Google Scholar
- P. Perona and M. Polito. Grouping and dimensionality reduction by locally linear embedding. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1255-1262, Cambridge, MA, 2002. MIT Press.]]Google Scholar
- K. Pettis, T. Bailey, A. Jain, and R. Dubes. An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI, 1(1):25-37, 1979.]]Google Scholar
- R. Pless and I. Simon. Embedding images in non-flat spaces. Technical Report WU-CS-01-43, Washington University, December 2001.]]Google Scholar
- W. H. Press, S. E. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1993.]] Google Scholar
- S. T. Roweis. EM algorthms for PCA and SPCA. In M. Kearns, M. Jordan, and S. Solla, editors, Advances in neural information processing systems 10, pages 626-632, Cambridge, MA, 1998. MIT Press.]] Google Scholar
- S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326, 2000.]]Google Scholar
- S. T. Roweis, L. K. Saul, and G. E. Hinton. Global coordination of locally linear models. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 889-896, Cambridge, MA, 2002. MIT Press.]]Google Scholar
- D. B. Rubin and D. T. Thayer. EM algorithms tbr ML factor analysis. Psychometrika, 47:69-76, 1982.]]Google Scholar
- L. K. Saul and J. B. Allen. Periodic component analysis: an eigenvalue method for representing periodic structure in speech. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 807-813, Cambridge, MA, 2001. MIT Press.]]Google Scholar
- L. K. Saul and M. G. Rahim. Maximum likelihood and minimum classification error factor analysis for automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 8(2): 115-125, 1999.]]Google Scholar
- B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002.]] Google Scholar
- B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299-1319, 1998.]] Google Scholar
- H. S. Seung and D. D. Lee. The manifold ways of perception. Science, 290:2268-2269, 2000.]]Google Scholar
- J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pages 888-905, August 2000.]] Google Scholar
- Y. Takane and F. W. Young. Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika, 42:7, 1977.]]Google Scholar
- R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2), 1972.]]Google Scholar
- R. Tarjan. Data structures and network algorithms. In CBMS, volume 44. Society for Industrial and Applied Mathematics, 1983.]] Google Scholar
- Y. W. Teh and S. T. Roweis. Automatic alignment of hidden representations. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.]]Google Scholar
- J. Tenenbaum. Mapping a manifold of perceptual observations. In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, Advances in Neural Information Processing Systems 10, pages 682-688, Cambridge, MA, 1998. MIT Press.]] Google Scholar
- J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319-2323, 2000.]]Google Scholar
- M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analysers. Neural Computation, 11(2):443-482, 1999.]] Google Scholar
- J. J. Verbeek, N. Vlassis, and B. Kröse. Coordinating mixtures of probabilistic principal component analyzers. Technical Report IAS-UVA-02-01, Computer Science Institute, University of Amsterdam, The Netherlands, February 2002a.]]Google Scholar
- J.J. Verbeek, N. Vlassis, and B. Kröse. A k-segments algorithm for finding principal curves. Pattern Recognition Letters, 23(8):1009-1017, 2002b.]] Google Scholar
- N. Vlassis, Y. Motomura, and B. Kröse. Supervised dimension reduction of intrinsically low-dimensional data. Neural Computation, 14(1):191-215, 2002.]] Google Scholar
- Y. Weiss. Segmentation using eigenvectors: a unifying view. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 975-982, 1999.]] Google Scholar
- C. K. I. Williams. On a connection between kernel PCA and metric multidimensional scaling. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 675-681, Cambridge, MA, 2001. MIT Press.]]Google Scholar
- S. Yu and J. Shi. Grouping with directed relationships. In M. Figueiredo, J. Zerubia, and A. K. Jain, editors, Proceedings of the Third International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, (EMMCVPR-01), volume 2134 of Lecture Notes in Computer Science, pages 283-297. Springer, 2001.]] Google Scholar
Index Terms
- Think globally, fit locally: unsupervised learning of low dimensional manifolds
Recommendations
Locally Minimizing Embedding and Globally Maximizing Variance: Unsupervised Linear Difference Projection for Dimensionality Reduction
Recently, many dimensionality reduction algorithms, including local methods and global methods, have been presented. The representative local linear methods are locally linear embedding (LLE) and linear preserving projections (LPP), which seek to find ...
Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics
This paper develops an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-dimensional data in small sample size cases. UDP can be seen as a linear approximation of a multimanifolds-based learning framework which ...
Globally-Preserving Based Locally Linear Embedding
ICPR '10: Proceedings of the 2010 20th International Conference on Pattern RecognitionThe locally linear embedding (LLE) algorithm is considered as a powerful method for the problem of nonlinear dimensionality reduction. In this paper, a new method called globally-preserving based LLE (GPLLE) is proposed. It not only preserves the local ...
Comments