Article

Locality-sensitive hashing scheme based on p-stable distributions

Authors:
Mayur Datar

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Nicole Immorlica

MIT

MIT
View Profile

,
Piotr Indyk

MIT

MIT
View Profile

,
Vahab S. Mirrokni

MIT

MIT
View Profile

SCG '04: Proceedings of the twentieth annual symposium on Computational geometryJune 2004Pages 253–262https://doi.org/10.1145/997817.997857

Published:08 June 2004Publication History

SCG '04: Proceedings of the twentieth annual symposium on Computational geometry

Pages 253–262

ABSTRACT

We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under l_p norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the l_p norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain "bounded growth" condition.Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than kd-tree.

References

C. Aggarwal and D. Keim A. Hinneburg. On the surprising behavior of distance metrics in high dimensional spaces. Proceedings of the International Conference on Database Theory, pages 420--434, 2001.]] Google ScholarDigital Library
S. Arya and D. Mount. Ann: Library for approximate nearest neighbor searching. available at http://www.cs.umd.edu/~mount/ANN/.]]Google Scholar
S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 573--582, 1994.]] Google ScholarDigital Library
K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? Proceedings of the International Conference on Database Theory, pages 217--235, 1999.]] Google ScholarDigital Library
J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17:419--428, 2001.]]Google ScholarCross Ref
J. Buhler. Provably sensitive indexing strategies for biosequence similarity search. Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB02), 2002.]] Google ScholarDigital Library
J. Buhler and M. Tompa. Finding motifs using random projections. Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB01), 2001.]] Google ScholarDigital Library
J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. J. Amer. Statist. Assoc., 71:340--344, 1976.]]Google ScholarCross Ref
K. Clarkson. Nearest neighbor queries in metric spaces. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pages 609--617, 1997.]] Google ScholarDigital Library
E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, and C. Yang. Finding interesting associations without support prunning. Proceedings of the 16th International Conference on Data Engineering (ICDE), 2000.]] Google ScholarDigital Library
G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. Proc. 18th International Conference on Data Engineering (ICDE), 2002.]] Google ScholarDigital Library
T. Darrell, P. Indyk, G. Shakhnarovich, and P. Viola. Approximate nearest neighbors methods for learning and vision. NIPS Workshop at http://www.ai.mit.edu/projects/vip/nips03ann, 2003.]]Google Scholar
B. Georgescu, I. Shimshoni, and P. Meer. Mean shift based clustering in high dimensions: A texture classification example. Proceedings of the 9th International Conference on Computer Vision, 2003.]] Google ScholarDigital Library
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), 1999.]] Google ScholarDigital Library
S. Har-Peled. A replacement for voronoi diagrams of near linear size. Proceedings of the Symposium on Foundations of Computer Science, 2001.]] Google ScholarDigital Library
T. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. WebDB Workshop, 2000.]]Google Scholar
A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the International Conference on Very Large Databases (VLDB), pages 506--515, 2000.]] Google ScholarDigital Library
P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. Proceedings of the Symposium on Foundations of Computer Science, 2000.]] Google ScholarDigital Library
P. Indyk and R. Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Proceedings of the Symposium on Theory of Computing, 1998.]] Google ScholarDigital Library
P. Indyk and N. Thaper. Fast color image retrieval via embeddings. Workshop on Statistical and Computational Theories of Vision (at ICCV), 2003.]]Google Scholar
D. Karger and M Ruhl. Finding nearest neighbors in growth-restricted metrics. Proceedings of the Symposium on Theory of Computing, 2002.]] Google ScholarDigital Library
J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997.]] Google ScholarDigital Library
R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2004.]] Google ScholarDigital Library
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. Proceedings of the Thirtieth ACM Symposium on Theory of Computing, pages 614--623, 1998.]] Google ScholarDigital Library
J. P. Nolan. An introduction to stable distributions. available at http://www.cas.american.edu/~jpnolan/chap1.ps.]]Google Scholar
Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of collections of files. Proceedings of the International Conference on Web Information Systems Engineering (WISE), 2002.]] Google ScholarDigital Library
N. Shivakumar. Detecting digital copyright violations on the Internet (Ph.D. thesis). Department of Computer Science, Stanford University, 2000.]] Google ScholarDigital Library
Roger Weber, Hans J. Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. Proceedings of the 24th Int. Conf. Very Large Data Bases (VLDB), 1998.]] Google ScholarDigital Library
C. Yang. Macs: Music audio characteristic sequence indexing for similarity retrieval. Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics, 2001.]]Google Scholar
P.N. Yiannilos. Locally lifting the curse of dimensionality for nearest neighbor search. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2000.]] Google ScholarDigital Library
V.M. Zolotarev. One-Dimensional Stable Distributions. Vol. 65 of Translations of Mathematical Monographs, American Mathematical Society, 1986.]]Google Scholar

Index Terms

Locality-sensitive hashing scheme based on p-stable distributions

Recommendations

Locality-sensitive hashing scheme based on dynamic collision counting
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Locality-Sensitive Hashing (LSH) and its variants are well-known methods for solving the c-approximate NN Search problem in high-dimensional space. Traditionally, several LSH functions are concatenated to form a "static" compound hash function for ...
Read More
Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing ...
Read More
Data-Dependent Locality Sensitive Hashing
Proceedings of the 15th Pacific-Rim Conference on Advances in Multimedia Information Processing --- PCM 2014 - Volume 8879

Locality sensitive hashing LSH is the most popular algorithm for approximate nearest neighbor ANN search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SCG '04: Proceedings of the twentieth annual symposium on Computational geometry
June 2004
468 pages
ISBN:1581138857
DOI:10.1145/997817
Conference Chairs:
Jack Snoeyink
UNC Chapel Hill
,
Jean-Daniel Boissonnat
INRIA Sophia Antipolis
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2004
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
p-stable distributions
approximate nearest neighbor
locally sensitive hashing
sublinear algorithm
Qualifiers
- Article
Conference

Acceptance Rates
SCG '04 Paper Acceptance Rate49of147submissions,33%Overall Acceptance Rate625of1,685submissions,37%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,977
  Total Citations
  View Citations
- 7,509
  Total Downloads
- Downloads (Last 12 months)325
- Downloads (Last 6 weeks)38
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Locality-sensitive hashing scheme based on p-stable distributions

SCG '04: Proceedings of the twentieth annual symposium on Computational geometry

ABSTRACT

References

Cited By

Index Terms

Recommendations

Locality-sensitive hashing scheme based on dynamic collision counting

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Data-Dependent Locality Sensitive Hashing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Locality-sensitive hashing scheme based on p-stable distributions

SCG '04: Proceedings of the twentieth annual symposium on Computational geometry

ABSTRACT

References

Cited By

Index Terms

Recommendations

Locality-sensitive hashing scheme based on dynamic collision counting

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Data-Dependent Locality Sensitive Hashing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media