- Andoni, A., Indyk, P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51 (2008), 117--122. Google ScholarDigital Library
- Broder, A.Z. On the resemblance and containment of documents. In The Compression and Complexity of Sequences (Positano, Italy, 1997), 21--29. Google ScholarDigital Library
- Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M. Min-wise independent permutations. J. Comput. Syst. Sci. 60, 3 (2000), 630--659. Google ScholarDigital Library
- Broder, A.Z., Glassman, S.C., Manasse, M.S., Zweig, G. Syntactic clustering of the web. In WWW (Santa Clara, CA, 1997), 1157--1166. Google ScholarDigital Library
- Charikar, M.S. Similarity estimation techniques from rounding algorithms. In STOC (Montreal, Quebec, Canada, 2002), 380--388. Google ScholarDigital Library
- Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C. Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13, 1 (2001), 64--78. Google ScholarDigital Library
- Fetterly, D., Manasse, M., Najork, M., Wiener, J.L. A large-scale study of the evolution of web pages. In WWW (Budapest, Hungary, 2003), 669--678. Google ScholarDigital Library
- Forman, G., Eshghi, K., Suermondt, J. Efficient detection of large-scale redundancy in enterprise file systems. SIGOPS Oper. Syst. Rev. 43, 1 (2009), 84--91. Google ScholarDigital Library
- Gamon, M., Basu, S., Belenko, D., Fisher, D., Hurst, M., König, A.C. Blews: Using blogs to provide context for news articles. In AAAI Conference on Weblogs and Social Media (Redmond, WA, 2008).Google Scholar
- Gionis, A., Gunopulos, D., Koudas, N. Efficient and tunable similar set retrieval. In SIGMOD (Santa Barbara, CA, 2001), 247--258. Google ScholarDigital Library
- Goemans, M.X., Williamson, D.P. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42, 6 (1995), 1115--1145. Google ScholarDigital Library
- Indyk, P. A small approximately min-wise independent family of hash functions. J. Algorithms 38, 1 (2001), 84--90. Google ScholarDigital Library
- Indyk, P., Motwani, R. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC (Dallas, TX, 1998), 604--613. Google ScholarDigital Library
- Itoh, T., Takei, Y., Tarui, J. On the sample size of k-restricted min-wise independent permutations and other k-wise distributions. In STOC (San Diego, CA, 2003), 710--718. Google ScholarDigital Library
- Kushilevitz, E., Ostrovsky, R., Rabani, Y. Efficient search for approximate nearest neighbor in high dimensional spaces. In STOC (Dallas, TX, 1998), 614--623. Google ScholarDigital Library
- Li, P., Church, K.W. A sketch algorithm for estimating two-way and multi-way associations. Comput. Linguist. 33, 3 (2007), 305--354 (Preliminary results appeared in HLT/EMNLP 2005). Google ScholarDigital Library
- Li, P., Church, K.W., Hastie, T.J. One sketch for all: Theory and applications of conditional random sampling. In NIPS (Vancouver, British Columbia, Canada, 2008) (Preliminary results appeared in NIPS 2006).Google Scholar
- Li, P., Hastie, T.J., Church, K.W. Improving random projections using marginal information. In COLT (Pittsburgh, PA, 2006), 635--649. Google ScholarDigital Library
- Li, P., König, A.C., Gui, W. b-Bit minwise hashing for estimating three-way similarities. In NIPS (Vancouver, British Columbia, Canada, 2010).Google Scholar
- Li, P., Moore, J., König, A.C. b-Bit minwise hashing for large-scale linear SVM. Technical report, 2011. http://www.stat.cornell.edu/~li/b-bit-hashing/HashingSVM.pdfGoogle Scholar
- Cherkasova, L., Eshghi, K., Morrey III, C.B., Tucek, J., Veitch, A. Applying Syntactic similarity algorithms for enterprise information management. In KDD (Paris, France, 2009), 1087--1096. Google ScholarDigital Library
- Manasse, M., McSherry, F., Talwar, K. Consistent weighted sampling. Technical Report MSR-TR-2010-73, Microsoft Research, 2010.Google Scholar
- Pandey, S., Broder, A., Chierichetti, F., Josifovski, V., Kumar, R., Vassilvitskii, S. Nearest-neighbor caching for content-match applications. In WWW (Madrid, Spain, 2009), 441--450. Google ScholarDigital Library
- Rajaraman, A., Ullman, J. Mining of Massive Datasets. http://i.stanford.edu/ullman/mmds.htmlGoogle Scholar
- Urvoy, T., Chauveau, E., Filoche, P., Lavergne, T. Tracking web spam with html style similarities. ACM Trans. Web 2, 1 (2008), 1--28. Google ScholarDigital Library
Index Terms
- Theory and applications of b-bit minwise hashing
Recommendations
b-Bit minwise hashing
WWW '10: Proceedings of the 19th international conference on World wide webThis paper establishes the theoretical framework of b-bit minwise hashing. The original minwise hashing method has become a standard technique for estimating set similarity (e.g., resemblance) with applications in information retrieval, data management, ...
b-bit minwise hashing in practice
Internetware '13: Proceedings of the 5th Asia-Pacific Symposium on InternetwareMinwise hashing is a standard technique in the context of search for approximating set similarities. The recent work [26, 32] demonstrated a potential use of b-bit minwise hashing [23, 24] for efficient search and learning on massive, high-dimensional, ...
GPU-based minwise hashing: GPU-based minwise hashing
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebMinwise hashing is a standard technique for efficient set similarity estimation in the context of search. The recent work of b-bit minwise hashing provided a substantial improvement by storing only the lowest b bits of each hashed value. Both minwise ...
Comments