skip to main content
10.1145/997817.997857acmconferencesArticle/Chapter ViewAbstractPublication PagessocgConference Proceedingsconference-collections
Article

Locality-sensitive hashing scheme based on p-stable distributions

Published:08 June 2004Publication History

ABSTRACT

We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain "bounded growth" condition.Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than kd-tree.

References

  1. C. Aggarwal and D. Keim A. Hinneburg. On the surprising behavior of distance metrics in high dimensional spaces. Proceedings of the International Conference on Database Theory, pages 420--434, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Arya and D. Mount. Ann: Library for approximate nearest neighbor searching. available at http://www.cs.umd.edu/~mount/ANN/.]]Google ScholarGoogle Scholar
  3. S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching. Proceedings of the Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 573--582, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? Proceedings of the International Conference on Database Theory, pages 217--235, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Buhler. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17:419--428, 2001.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Buhler. Provably sensitive indexing strategies for biosequence similarity search. Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB02), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Buhler and M. Tompa. Finding motifs using random projections. Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB01), 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. M. Chambers, C. L. Mallows, and B. W. Stuck. A method for simulating stable random variables. J. Amer. Statist. Assoc., 71:340--344, 1976.]]Google ScholarGoogle ScholarCross RefCross Ref
  9. K. Clarkson. Nearest neighbor queries in metric spaces. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, pages 609--617, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J. Ullman, and C. Yang. Finding interesting associations without support prunning. Proceedings of the 16th International Conference on Data Engineering (ICDE), 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Cormode, P. Indyk, N. Koudas, and S. Muthukrishnan. Fast mining of massive tabular data via approximate distance computations. Proc. 18th International Conference on Data Engineering (ICDE), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Darrell, P. Indyk, G. Shakhnarovich, and P. Viola. Approximate nearest neighbors methods for learning and vision. NIPS Workshop at http://www.ai.mit.edu/projects/vip/nips03ann, 2003.]]Google ScholarGoogle Scholar
  13. B. Georgescu, I. Shimshoni, and P. Meer. Mean shift based clustering in high dimensions: A texture classification example. Proceedings of the 9th International Conference on Computer Vision, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Har-Peled. A replacement for voronoi diagrams of near linear size. Proceedings of the Symposium on Foundations of Computer Science, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Haveliwala, A. Gionis, and P. Indyk. Scalable techniques for clustering the web. WebDB Workshop, 2000.]]Google ScholarGoogle Scholar
  17. A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? Proceedings of the International Conference on Very Large Databases (VLDB), pages 506--515, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. Proceedings of the Symposium on Foundations of Computer Science, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Indyk and R. Motwani. Approximate nearest neighbor: towards removing the curse of dimensionality. Proceedings of the Symposium on Theory of Computing, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Indyk and N. Thaper. Fast color image retrieval via embeddings. Workshop on Statistical and Computational Theories of Vision (at ICCV), 2003.]]Google ScholarGoogle Scholar
  21. D. Karger and M Ruhl. Finding nearest neighbors in growth-restricted metrics. Proceedings of the Symposium on Theory of Computing, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. Proceedings of the Thirtieth ACM Symposium on Theory of Computing, pages 614--623, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. P. Nolan. An introduction to stable distributions. available at http://www.cas.american.edu/~jpnolan/chap1.ps.]]Google ScholarGoogle Scholar
  26. Z. Ouyang, N. Memon, T. Suel, and D. Trendafilov. Cluster-based delta compression of collections of files. Proceedings of the International Conference on Web Information Systems Engineering (WISE), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Shivakumar. Detecting digital copyright violations on the Internet (Ph.D. thesis). Department of Computer Science, Stanford University, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Roger Weber, Hans J. Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. Proceedings of the 24th Int. Conf. Very Large Data Bases (VLDB), 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Yang. Macs: Music audio characteristic sequence indexing for similarity retrieval. Proceedings of the Workshop on Applications of Signal Processing to Audio and Acoustics, 2001.]]Google ScholarGoogle Scholar
  30. P.N. Yiannilos. Locally lifting the curse of dimensionality for nearest neighbor search. Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V.M. Zolotarev. One-Dimensional Stable Distributions. Vol. 65 of Translations of Mathematical Monographs, American Mathematical Society, 1986.]]Google ScholarGoogle Scholar

Index Terms

  1. Locality-sensitive hashing scheme based on p-stable distributions

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SCG '04: Proceedings of the twentieth annual symposium on Computational geometry
              June 2004
              468 pages
              ISBN:1581138857
              DOI:10.1145/997817

              Copyright © 2004 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 8 June 2004

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              SCG '04 Paper Acceptance Rate49of147submissions,33%Overall Acceptance Rate625of1,685submissions,37%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader