skip to main content
article
Free Access

Think globally, fit locally: unsupervised learning of low dimensional manifolds

Authors Info & Claims
Published:01 December 2003Publication History
Skip Abstract Section

Abstract

The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE---though capable of generating highly nonlinear embeddings---are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithm's performance---both successes and failures---and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.

References

  1. H. Attias. Independent factor analysis. Neural Computation, 11(4):803-851, 1999.]] Google ScholarGoogle Scholar
  2. Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst. Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Society for Industrial and Applied Mathematics, Philadelphia, 2000.]] Google ScholarGoogle Scholar
  3. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 585-591, Cambridge, MA, 2002. MIT Press.]]Google ScholarGoogle Scholar
  4. Y. Bengio, P. Vincent, and J.F. Paiement. Learning eigenfunctions of similarity: linking spectral clustering and kernel PCA. Technical Report 1232, Departement d'Informatique et Recherche Oprationnelle, Universite de Montreal, 2003.]]Google ScholarGoogle Scholar
  5. D. Beymer and T. Poggio. Image representation for visual learning. Science, 272:1905, 1996.]]Google ScholarGoogle Scholar
  6. C. Bishop, M. Svensen, and C. Williams. GTM: The generative topographic mapping. Neural Computation, 10:215, 1998.]] Google ScholarGoogle Scholar
  7. C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, 1996.]] Google ScholarGoogle Scholar
  8. M. Brand. Charting a manifold. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.]]Google ScholarGoogle Scholar
  9. M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. In C. M. Bishop and B. J. Frey, editors, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, January 2003.]]Google ScholarGoogle Scholar
  10. C. Bregler and S. Omohundro. Nonlinear image interpolation using manifold learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 973-980, Cambridge, MA, 1995. MIT Press.]]Google ScholarGoogle Scholar
  11. E. Cosatto and H. P. Graf. Sample-based synthesis of photo-realistic talking-heads. In Proceedings of Computer Animation, pages 103-110. IEEE Computer Society, 1998.]] Google ScholarGoogle Scholar
  12. T. Cox and M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994.]]Google ScholarGoogle Scholar
  13. P. Dayan, G. E. Hinton, R. M. Neal, and R. S. Zemel. The Helmholtz machine. Neural Computation, 7(5):889-904, 1995.]] Google ScholarGoogle Scholar
  14. V. de Silva and J. Tenenbaum. Unsupervised learning of curved manifolds. In Proceedings of the MSRI workshop on nonlinear estimation and classification. Springer Verlag, 2002.]]Google ScholarGoogle Scholar
  15. D. DeCoste. Visualizing Mercel kernel feature spaces via kernelized locally linear embedding. In Proceedings of the Eighth International Conference on Neural Information Processing (ICONIP-01), Shanghai, China, November 2001.]]Google ScholarGoogle Scholar
  16. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990.]]Google ScholarGoogle Scholar
  17. D. DeMers and G.W. Cottrell. Nonlinear dimensionality reduction. In D. Hanson, J. Cowan, and L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 580-587, San Mateo, CA, 1993. Morgan Kaufmann.]] Google ScholarGoogle Scholar
  18. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39:1-37, 1977.]]Google ScholarGoogle Scholar
  19. D. L. Donoho and C. E. Grimes. When does Isomap recover the natural parameterization of families of articulated images? Technical Report 2002-27, Department of Statistics, Stanford University, August 2002.]]Google ScholarGoogle Scholar
  20. D. L. Donoho and C. E. Grimes. Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Arts and Sciences, 100:5591-5596, 2003.]]Google ScholarGoogle Scholar
  21. R. Durbin and D. Wilshaw. An analogue approach to the traveling salesman problem using an elastic net method. Nature, 326:689-691, 1987.]]Google ScholarGoogle Scholar
  22. D.R. Fokkema, G.L.G. Sleijpen, and H.A. van der Vorst. Jacobi-Davidson style QR and QZ algorithms for the reduction of matrix pencils. SIAM Journal Scientific Computing, 20(1):94-125, 1998.]] Google ScholarGoogle Scholar
  23. J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3:209-226, 1977.]] Google ScholarGoogle Scholar
  24. K. Fukunaga and D. R. Olsen. An algorithm for finding intrinsic dimensionality of data. IEEE Transactions on Computers, 20(2):176-193, 1971.]]Google ScholarGoogle Scholar
  25. Z. Ghahramani and G. E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1 (revised February 1997), Department of Computer Science, University of Toronto, May 1996.]]Google ScholarGoogle Scholar
  26. A. G. Gray and A. W. Moore. N-Body problems in statistical learning. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 521-527, Cambridge, MA, 2001. MIT Press.]]Google ScholarGoogle Scholar
  27. J. H. Ham, D. D. Lee, and L. K. Saul. Learning high dimensional correspondences from low dimensional manifolds. Submitted for publication, 2003.]]Google ScholarGoogle Scholar
  28. T. J. Hastie and W. Stuetzle. Principal curves and surfaces. Journal of the American Statistical Association, 84(406):502-516, 1989.]]Google ScholarGoogle Scholar
  29. G. E. Hinton, P. Dayan, and M. Revow. Modeling the manifolds of handwritten digits. IEEE Transactions on Neural Networks, 8:65-74, 1997.]]Google ScholarGoogle Scholar
  30. G. E. Hinton and M. Revow. Using mixtures of factor analyzers for segmentation and pose estimation. Unpublished, 1998.]]Google ScholarGoogle Scholar
  31. G. E. Hinton and T. J. Sejnowski, editors. Unsupervised Learning and Map Formation: Foundations of Neural Computation. MIT Press, Cambridge, MA, 1999.]] Google ScholarGoogle Scholar
  32. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1990.]] Google ScholarGoogle Scholar
  33. J. J. Hull. A database for handwritten text recognition research. IEEE Transaction on Pattern Analysis and Machine Intelligence, 16(5):550-554, May 1994.]] Google ScholarGoogle Scholar
  34. A. Hyvärinen. Independent component analysis in the presence of gaussian noise by maximizing joint likelihood. Neurocomputing, 22:49-67, 1998.]]Google ScholarGoogle Scholar
  35. P. Indyk. Dimensionality reduction techniques for proximity problems. In Proceedings of the Eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA '00), pages 371-378, 2000.]] Google ScholarGoogle Scholar
  36. I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.]]Google ScholarGoogle Scholar
  37. G. G. Judge and T. T. Takayama. Inequality restrictions in regression analysis. Journal of the American Statistical Association, 61(313):166-181, March 1966.]]Google ScholarGoogle Scholar
  38. N. Kambhatla and T. K. Leen. Dimension reduction by local principal component analysis. Neural Computation, 9:1493-1516, 1997.]] Google ScholarGoogle Scholar
  39. D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the Thirty Fourth ACM Symposium on the Theory of Computing (STOC '02), pages 741-750, 2002.]] Google ScholarGoogle Scholar
  40. B. Kegl. Intrinsic dimension estimation using packing numbers. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.]]Google ScholarGoogle Scholar
  41. H. Klock and J. Buhmann. Data visualization by multidimensional scaling: a deterministic annealing approach. Pattern Recognition, 33:651, 1999.]]Google ScholarGoogle Scholar
  42. T. Kohonen. Self-organization and Associative Memory. Springer-Verlag, Berlin, 1988.]] Google ScholarGoogle Scholar
  43. M. Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37:233, 1991.]]Google ScholarGoogle Scholar
  44. Y. LeCun, L. Bottou, G. Orr, and K.-R. Müller. Efficient backprop. In G. Orr and Müller K.-R., editors, Neural Networks: Tricks of the trade. Springer, 1998.]] Google ScholarGoogle Scholar
  45. M. L. Littman, D. F. Swayne, N. Dean, and A. Buja. Visualizing the embedding of objects in Euclidean space. In H. J. N. Newton, editor, Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface, pages 208-217. Interface Foundation of North America, 1992.]]Google ScholarGoogle Scholar
  46. T. Martinetz and K. Schulten. Topology representing networks. Neural Networks, 7:507, 1994.]] Google ScholarGoogle Scholar
  47. G. McLachlan and K. Basford. Mixture Models: Inference and Applications to Clustering. Marcel Dekker, 1988.]]Google ScholarGoogle Scholar
  48. M. Meila and J. Shi. Learning segmentation by random walks. In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems 12, pages 873-879, Cambridge, MA, 2000. MIT Press.]]Google ScholarGoogle Scholar
  49. A. W. Moore, A. Connolly, C. Genovese, A. Gray, L. Grone, N. Kanidoris II, R. Nichol, J. Schneider, A. Szalay, I. Szapudi, and L. Wasserman. Fast algorithms and efficient statistics: N-point correlation functions. In Proceedings of the MPA/MPE/ESO Conference on Mining the Sky, 2000.]]Google ScholarGoogle Scholar
  50. A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 849-856, Cambridge, MA, 2002. MIT Press.]]Google ScholarGoogle Scholar
  51. S. Omohundro. Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute, December 1989.]]Google ScholarGoogle Scholar
  52. S. Omohundro. Bumptrees for efficient function, constraint, and classification learning. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing 3, pages 693-699, San Mateo, CA, 1991. Morgan Kaufmann.]] Google ScholarGoogle Scholar
  53. P. Perona and M. Polito. Grouping and dimensionality reduction by locally linear embedding. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1255-1262, Cambridge, MA, 2002. MIT Press.]]Google ScholarGoogle Scholar
  54. K. Pettis, T. Bailey, A. Jain, and R. Dubes. An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI, 1(1):25-37, 1979.]]Google ScholarGoogle Scholar
  55. R. Pless and I. Simon. Embedding images in non-flat spaces. Technical Report WU-CS-01-43, Washington University, December 2001.]]Google ScholarGoogle Scholar
  56. W. H. Press, S. E. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1993.]] Google ScholarGoogle Scholar
  57. S. T. Roweis. EM algorthms for PCA and SPCA. In M. Kearns, M. Jordan, and S. Solla, editors, Advances in neural information processing systems 10, pages 626-632, Cambridge, MA, 1998. MIT Press.]] Google ScholarGoogle Scholar
  58. S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326, 2000.]]Google ScholarGoogle Scholar
  59. S. T. Roweis, L. K. Saul, and G. E. Hinton. Global coordination of locally linear models. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 889-896, Cambridge, MA, 2002. MIT Press.]]Google ScholarGoogle Scholar
  60. D. B. Rubin and D. T. Thayer. EM algorithms tbr ML factor analysis. Psychometrika, 47:69-76, 1982.]]Google ScholarGoogle Scholar
  61. L. K. Saul and J. B. Allen. Periodic component analysis: an eigenvalue method for representing periodic structure in speech. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 807-813, Cambridge, MA, 2001. MIT Press.]]Google ScholarGoogle Scholar
  62. L. K. Saul and M. G. Rahim. Maximum likelihood and minimum classification error factor analysis for automatic speech recognition. IEEE Transactions on Speech and Audio Processing, 8(2): 115-125, 1999.]]Google ScholarGoogle Scholar
  63. B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002.]] Google ScholarGoogle Scholar
  64. B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299-1319, 1998.]] Google ScholarGoogle Scholar
  65. H. S. Seung and D. D. Lee. The manifold ways of perception. Science, 290:2268-2269, 2000.]]Google ScholarGoogle Scholar
  66. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pages 888-905, August 2000.]] Google ScholarGoogle Scholar
  67. Y. Takane and F. W. Young. Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika, 42:7, 1977.]]Google ScholarGoogle Scholar
  68. R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2), 1972.]]Google ScholarGoogle Scholar
  69. R. Tarjan. Data structures and network algorithms. In CBMS, volume 44. Society for Industrial and Applied Mathematics, 1983.]] Google ScholarGoogle Scholar
  70. Y. W. Teh and S. T. Roweis. Automatic alignment of hidden representations. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.]]Google ScholarGoogle Scholar
  71. J. Tenenbaum. Mapping a manifold of perceptual observations. In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, Advances in Neural Information Processing Systems 10, pages 682-688, Cambridge, MA, 1998. MIT Press.]] Google ScholarGoogle Scholar
  72. J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319-2323, 2000.]]Google ScholarGoogle Scholar
  73. M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analysers. Neural Computation, 11(2):443-482, 1999.]] Google ScholarGoogle Scholar
  74. J. J. Verbeek, N. Vlassis, and B. Kröse. Coordinating mixtures of probabilistic principal component analyzers. Technical Report IAS-UVA-02-01, Computer Science Institute, University of Amsterdam, The Netherlands, February 2002a.]]Google ScholarGoogle Scholar
  75. J.J. Verbeek, N. Vlassis, and B. Kröse. A k-segments algorithm for finding principal curves. Pattern Recognition Letters, 23(8):1009-1017, 2002b.]] Google ScholarGoogle Scholar
  76. N. Vlassis, Y. Motomura, and B. Kröse. Supervised dimension reduction of intrinsically low-dimensional data. Neural Computation, 14(1):191-215, 2002.]] Google ScholarGoogle Scholar
  77. Y. Weiss. Segmentation using eigenvectors: a unifying view. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 975-982, 1999.]] Google ScholarGoogle Scholar
  78. C. K. I. Williams. On a connection between kernel PCA and metric multidimensional scaling. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 675-681, Cambridge, MA, 2001. MIT Press.]]Google ScholarGoogle Scholar
  79. S. Yu and J. Shi. Grouping with directed relationships. In M. Figueiredo, J. Zerubia, and A. K. Jain, editors, Proceedings of the Third International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, (EMMCVPR-01), volume 2134 of Lecture Notes in Computer Science, pages 283-297. Springer, 2001.]] Google ScholarGoogle Scholar

Index Terms

  1. Think globally, fit locally: unsupervised learning of low dimensional manifolds

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image The Journal of Machine Learning Research
        The Journal of Machine Learning Research  Volume 4, Issue
        12/1/2003
        1486 pages
        ISSN:1532-4435
        EISSN:1533-7928
        Issue’s Table of Contents

        Publisher

        JMLR.org

        Publication History

        • Published: 1 December 2003
        Published in jmlr Volume 4, Issue

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader