research-article

Free Access

Faster dimension reduction

Authors:
Nir Ailon

Google Research

Google Research
View Profile

,
Bernard Chazelle

Princeton University

Princeton University
View Profile

Authors Info & Claims

Communications of the ACM Volume 53 Issue 2February 2010pp 97–104https://doi.org/10.1145/1646353.1646379

Published:01 February 2010Publication History

Communications of the ACM

Abstract

Data represented geometrically in high-dimensional vector spaces can be found in many applications. Images and videos, are often represented by assigning a dimension for every pixel (and time). Text documents may be represented in a vector space where each word in the dictionary incurs a dimension. The need to manipulate such data in huge corpora such as the web and to support various query types gives rise to the question of how to represent the data in a lower-dimensional space to allow more space and time efficient computation. Linear mappings are an attractive approach to this problem because the mapped input can be readily fed into popular algorithms that operate on linear spaces (such as principal-component analysis, PCA) while avoiding the curse of dimensionality.

The fact that such mappings even exist became known in computer science following seminal work by Johnson and Lindenstrauss in the early 1980s. The underlying technique is often called "random projection." The complexity of the mapping itself, essentially the product of a vector with a dense matrix, did not attract much attention until recently. In 2006, we discovered a way to "sparsify" the matrix via a computational version of Heisenberg's Uncertainty Principle. This led to a significant speedup, which also retained the practical simplicity of the standard Johnson-Lindenstrauss projection. We describe the improvement in this article, together with some of its applications.

References

Achlioptas, D. Database-friendly random projections: Johnson--Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 4 (2003), 671--687. Google ScholarDigital Library
Ailon, N., Chazelle, B. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. SIAM J. Comput. 39, 1 (2009), 302--322. Google ScholarDigital Library
Ailon, N., Liberty, E. Fast dimension reduction using rademacher series on dual bch codes. Discrete and Computational Geometry (2008). Google ScholarDigital Library
Ailon, N., Liberty, E., Singer, A. Dense fast random projections and lean walsh transforms. APPROX-RANDOM, 2008, 512--522. Google ScholarDigital Library
Alon, N. Problems and results in extremal combinatorics-I. Discrete Math. 273, 1--3 (2003), 31--53.Google ScholarDigital Library
Alon, N., Spencer, J. The Probabilistic Method. John Wiley, 2nd edition, 2000.Google Scholar
Arora, S. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. J. ACM 45, 5 (1998), 753--782. Google ScholarDigital Library
Arya, S., Mount, D.M. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (Austin, TX, United States, 1993), 271--280. Google ScholarDigital Library
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6 (1998), 891--923. Google ScholarDigital Library
Bern, M.W. Approximate closest-point queries in high dimensions. Inf. Process. Lett. 45, 2 (1993), 95--99. Google ScholarDigital Library
Bingham, E., Mannila, H. Random projection in dimensionality reduction: Applications to image and text data. In Knowledge Discovery and Data Mining, 2001, 245--250. Google ScholarDigital Library
Borodin, A., Ostrovsky, R., Rabani, Y. Lower bounds for high dimensional nearest neighbor search and related problems. In Proceedings of the 31st Annual Symposium on the Theory of Computing (STOC) (1999), 312--321. Google ScholarDigital Library
Chan, T.M. Approximate nearest neighbor queries revisited. Discrete Comput. Geometry 20, 3 (1998), 359--373.Google ScholarCross Ref
Christofides, N. Worst-case analysis of a new heuristic for the travelling salesman problem. Technical Report, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, 1976, 388.Google Scholar
Clarkson, K.L. An algorithm for approximate closest-point queries. In Proceedings of the 10th Annual ACM Symposium on Computational Geometry (SoCG) (1994), 160--164. Google ScholarDigital Library
Clarkson, K.L. Nearest neighbor queries in metric spaces. Discrete Comput. Geometry 22, 1 (1999), 63--93.Google ScholarCross Ref
DasGupta, S., Gupta, A. An elementary proof of the Johnson--Lindenstrauss lemma. Technical Report, UC Berkeley, 1999, 99--106.Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S. Sampling algorithms for &ell;₂ regression and applications. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (Miami, FL, United States, 2006). Google ScholarDigital Library
Farach-Colton, M., Indyk, P. Approximate nearest neighbor algorithms for Hausdorff metrics via embeddings. In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (1999), 171--180. Google ScholarDigital Library
Frankl, P., Maehara, H. The Johnson--Lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theory Ser. A 44 (1987), 355--362. Google ScholarDigital Library
Indyk, P. On approximate nearest neighbors in non-Euclidean spaces. In Proceedings of the 39th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (1998), 148--155. Google ScholarDigital Library
Indyk, P. Dimensionality reduction techniques for proximity problems. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2000), 371--378. Google ScholarDigital Library
Indyk, P. Nearest neighbors in high-dimensional spaces. In Handbook of Discrete and Computational Geometry. CRC, 2004.Google ScholarCross Ref
Indyk P., Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC) (1998), 604--613. Google ScholarDigital Library
Johnson, W.B., Lindenstrauss, J. Extensions of Lipschitz mappings into a Hilbert space. Contemp. Math. 26 (1984), 189--206.Google ScholarCross Ref
Kleinberg, J.M. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC) (1997), 599--608. Google ScholarDigital Library
Kushilevitz, E., Ostrovsky, R., Rabani Y. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM J. Comput. 30, 2 (2000), 457--474. Google ScholarDigital Library
Liberty, E., Woolfe, F., Martinsson, P.-G., Rokhlin, V., Tygert, M. Randomized algorithms for the low-rank approximation of matrices. Proc. Natl. Acad. Sci. (PNAS) 104, 51 (2007), 20167--20172.Google ScholarCross Ref
Linial, N., London, E., Rabinovich, Y. The geometry of graphs and some of its algorithmic applications. Combinatorica 15, 2 (1995), 215--245.Google ScholarCross Ref
Mitchell, J.S.B. Guillotine subdivisions approximate polygonal subdivisions: A simple new method for the geometric k-MST problem. SIAM J. Comput. 28, 4 (1999), 1298--1309. Google ScholarDigital Library
Morgenstern, J. Note on a lower bound on the linear complexity of the fast fourier transform. J. ACM 20, 2 (1973), 305--306. Google ScholarDigital Library
Morgenstern, J. The linear complexity of computation. J. ACM 22, 2 (1975), 184--194. Google ScholarDigital Library
Muthukrishnan, S., Sahinalp, S.C. Simple and practical sequence nearest neighbors with block operations. In Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching (CPM) (2002), 262--278. Google ScholarDigital Library
Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th Annual Symposium of Database Systems (1998), 159--168. Google ScholarDigital Library
Rokhlin, V., Tygert, M. A fast randomized algorithm for overdetermined linear least-squares regression. Proceedings of the National Academy of Science (PNAS) 105, 36 (2008), 13212--13217.Google ScholarCross Ref
Sarlós, T. Improved approximation algorithms for large matrices via random projections. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Berkeley, CA, 2006). Google ScholarDigital Library
Schulman, L. Clustering for edge-cost minimization. In Proceedings of the 32nd Annual Symposium on Theory of Computing (STOC) (2000), 547--555. Google ScholarDigital Library
Woolfe, F., Liberty, E., Rokhlin, V., Tygert, M. A fast randomized algorithm for the approximation of matrices. Appl. Comput. Harmon. Anal. 25 (2008), 335--366.Google ScholarCross Ref
Yianilos, P.N. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (1993), 311--321. Google ScholarDigital Library
Yianilos, P.N. Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract). In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) (2000), 361--370. Google ScholarDigital Library

Index Terms

Faster dimension reduction
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization

Recommendations

Geometrically local embedding in manifolds for dimension reduction

In this paper, geometrically local embedding (GLE) is presented to discover the intrinsic structure of manifolds as a method in nonlinear dimension reduction. GLE is able to reveal the inner features of the input data in the lower dimension space while ...
Read More
Max-K-Min Distance Analysis for Dimension Reduction
ICPR '14: Proceedings of the 2014 22nd International Conference on Pattern Recognition

We propose a new criterion for discriminative dimension reduction, Max-K-Min Distance Analysis (MKMDA). Given a data set with C classes, MKMDA maximizes the sum of the K minimum pair wise distance of these C classes on the selected one-dimensional ...
Read More
Weighted locally linear embedding for dimension reduction

The low-dimensional representation of high-dimensional data and the concise description of its intrinsic structures are central problems in data analysis. In this paper, an unsupervised learning algorithm called weighted locally linear embedding (WLLE) ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Communications of the ACM Volume 53, Issue 2
February 2010
147 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1646353
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 54
  Total Citations
  View Citations
- 6,185
  Total Downloads
- Downloads (Last 12 months)194
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Faster dimension reduction

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Geometrically local embedding in manifolds for dimension reduction

Max-K-Min Distance Analysis for Dimension Reduction

Weighted locally linear embedding for dimension reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Faster dimension reduction

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Geometrically local embedding in manifolds for dimension reduction

Max-K-Min Distance Analysis for Dimension Reduction

Weighted locally linear embedding for dimension reduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media