skip to main content
research-article
Free Access

Faster dimension reduction

Published:01 February 2010Publication History
Skip Abstract Section

Abstract

Data represented geometrically in high-dimensional vector spaces can be found in many applications. Images and videos, are often represented by assigning a dimension for every pixel (and time). Text documents may be represented in a vector space where each word in the dictionary incurs a dimension. The need to manipulate such data in huge corpora such as the web and to support various query types gives rise to the question of how to represent the data in a lower-dimensional space to allow more space and time efficient computation. Linear mappings are an attractive approach to this problem because the mapped input can be readily fed into popular algorithms that operate on linear spaces (such as principal-component analysis, PCA) while avoiding the curse of dimensionality.

The fact that such mappings even exist became known in computer science following seminal work by Johnson and Lindenstrauss in the early 1980s. The underlying technique is often called "random projection." The complexity of the mapping itself, essentially the product of a vector with a dense matrix, did not attract much attention until recently. In 2006, we discovered a way to "sparsify" the matrix via a computational version of Heisenberg's Uncertainty Principle. This led to a significant speedup, which also retained the practical simplicity of the standard Johnson-Lindenstrauss projection. We describe the improvement in this article, together with some of its applications.

References

  1. Achlioptas, D. Database-friendly random projections: Johnson--Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 4 (2003), 671--687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ailon, N., Chazelle, B. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. SIAM J. Comput. 39, 1 (2009), 302--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ailon, N., Liberty, E. Fast dimension reduction using rademacher series on dual bch codes. Discrete and Computational Geometry (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ailon, N., Liberty, E., Singer, A. Dense fast random projections and lean walsh transforms. APPROX-RANDOM, 2008, 512--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alon, N. Problems and results in extremal combinatorics-I. Discrete Math. 273, 1--3 (2003), 31--53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alon, N., Spencer, J. The Probabilistic Method. John Wiley, 2nd edition, 2000.Google ScholarGoogle Scholar
  7. Arora, S. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. J. ACM 45, 5 (1998), 753--782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arya, S., Mount, D.M. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (Austin, TX, United States, 1993), 271--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6 (1998), 891--923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bern, M.W. Approximate closest-point queries in high dimensions. Inf. Process. Lett. 45, 2 (1993), 95--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bingham, E., Mannila, H. Random projection in dimensionality reduction: Applications to image and text data. In Knowledge Discovery and Data Mining, 2001, 245--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Borodin, A., Ostrovsky, R., Rabani, Y. Lower bounds for high dimensional nearest neighbor search and related problems. In Proceedings of the 31st Annual Symposium on the Theory of Computing (STOC) (1999), 312--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chan, T.M. Approximate nearest neighbor queries revisited. Discrete Comput. Geometry 20, 3 (1998), 359--373.Google ScholarGoogle ScholarCross RefCross Ref
  14. Christofides, N. Worst-case analysis of a new heuristic for the travelling salesman problem. Technical Report, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, 1976, 388.Google ScholarGoogle Scholar
  15. Clarkson, K.L. An algorithm for approximate closest-point queries. In Proceedings of the 10th Annual ACM Symposium on Computational Geometry (SoCG) (1994), 160--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Clarkson, K.L. Nearest neighbor queries in metric spaces. Discrete Comput. Geometry 22, 1 (1999), 63--93.Google ScholarGoogle ScholarCross RefCross Ref
  17. DasGupta, S., Gupta, A. An elementary proof of the Johnson--Lindenstrauss lemma. Technical Report, UC Berkeley, 1999, 99--106.Google ScholarGoogle Scholar
  18. Drineas, P., Mahoney, M.W., Muthukrishnan, S. Sampling algorithms for ℓ2 regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (Miami, FL, United States, 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Farach-Colton, M., Indyk, P. Approximate nearest neighbor algorithms for Hausdorff metrics via embeddings. In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (1999), 171--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Frankl, P., Maehara, H. The Johnson--Lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theory Ser. A 44 (1987), 355--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Indyk, P. On approximate nearest neighbors in non-Euclidean spaces. In Proceedings of the 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (1998), 148--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Indyk, P. Dimensionality reduction techniques for proximity problems. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2000), 371--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Indyk, P. Nearest neighbors in high-dimensional spaces. In Handbook of Discrete and Computational Geometry. CRC, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  24. Indyk P., Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC) (1998), 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Johnson, W.B., Lindenstrauss, J. Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26 (1984), 189--206.Google ScholarGoogle ScholarCross RefCross Ref
  26. Kleinberg, J.M. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC) (1997), 599--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kushilevitz, E., Ostrovsky, R., Rabani Y. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30, 2 (2000), 457--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Liberty, E., Woolfe, F., Martinsson, P.-G., Rokhlin, V., Tygert, M. Randomized algorithms for the low-rank approximation of matrices. Proc. Natl. Acad. Sci. (PNAS) 104, 51 (2007), 20167--20172.Google ScholarGoogle ScholarCross RefCross Ref
  29. Linial, N., London, E., Rabinovich, Y. The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 2 (1995), 215--245.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mitchell, J.S.B. Guillotine subdivisions approximate polygonal subdivisions: A simple new method for the geometric k-MST problem. SIAM J. Comput. 28, 4 (1999), 1298--1309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Morgenstern, J. Note on a lower bound on the linear complexity of the fast fourier transform. J. ACM 20, 2 (1973), 305--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Morgenstern, J. The linear complexity of computation. J. ACM 22, 2 (1975), 184--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Muthukrishnan, S., Sahinalp, S.C. Simple and practical sequence nearest neighbors with block operations. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM) (2002), 262--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th Annual Symposium of Database Systems (1998), 159--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Rokhlin, V., Tygert, M. A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Science (PNAS) 105, 36 (2008), 13212--13217.Google ScholarGoogle ScholarCross RefCross Ref
  36. Sarlós, T. Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Berkeley, CA, 2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Schulman, L. Clustering for edge-cost minimization. In Proceedings of the 32nd Annual Symposium on Theory of Computing (STOC) (2000), 547--555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Woolfe, F., Liberty, E., Rokhlin, V., Tygert, M. A fast randomized algorithm for the approximation of matrices. Appl. Comput. Harmon. Anal. 25 (2008), 335--366.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yianilos, P.N. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (1993), 311--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yianilos, P.N. Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract). In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2000), 361--370. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Faster dimension reduction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 53, Issue 2
        February 2010
        147 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/1646353
        Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 February 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Popular
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format