survey

Hashing Techniques: A Survey and Taxonomy

Authors:
Lianhua Chi

IBM Research, Melbourne, Australia

IBM Research, Melbourne, Australia
View Profile

,
Xingquan Zhu

Florida Atlantic University, Boca Raton, FL; Fudan University, Shanghai, China

Florida Atlantic University, Boca Raton, FL; Fudan University, Shanghai, China

0000-0003-4129-9611
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 50 Issue 1Article No.: 11pp 1–36https://doi.org/10.1145/3047307

Published:04 April 2017Publication History

ACM Computing Surveys

Abstract

With the rapid development of information storage and networking technologies, quintillion bytes of data are generated every day from social networks, business transactions, sensors, and many other domains. The increasing data volumes impose significant challenges to traditional data analysis tools in storing, processing, and analyzing these extremely large-scale data. For decades, hashing has been one of the most effective tools commonly used to compress data for fast access and analysis, as well as information integrity verification. Hashing techniques have also evolved from simple randomization approaches to advanced adaptive methods considering locality, structure, label information, and data security, for effective hashing. This survey reviews and categorizes existing hashing techniques as a taxonomy, in order to provide a comprehensive view of mainstream hashing techniques for different types of data and applications. The taxonomy also studies the uniqueness of each method and therefore can serve as technique references in understanding the niche of different hashing mechanisms for future development.

References

Austin Appleby. 2008. Murmurhash 2.0.Google Scholar
Vassilis Athitsos and Stan Sclaroff. 2003. Estimating 3D hand pose from a cluttered image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. II--432--9. Google ScholarCross Ref
Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier. 2005. A family of fast syndrome based cryptographic hash functions. In Proceedings of the International Conference on Cryptology in Malaysia. Springer, 64--83. Google ScholarDigital Library
Jean-Philippe Aumasson and Daniel J. Bernstein. 2012. SipHash: A fast short-input PRF. In Proceedings of the International Conference on Cryptology in India. Springer, 489--508. Google ScholarCross Ref
Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-OHearn, and Christian Winnerlein. 2013. BLAKE2: Simpler, smaller, fast as MD5. In Proceedings of the International Conference on Applied Cryptography 8 Net Security. Springer, 119--135.Google ScholarDigital Library
Daniel J. Bernstein. 2005. The poly1305-AES message-authentication code. In Proceedings of the International Workshop on Fast Software Encryption. Springer, 32--49. Google ScholarDigital Library
Krishna Bharat and Andrei Broder. 1998. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems 30, 1 (1998), 379--388. Google ScholarDigital Library
John Black, Shai Halevi, Hugo Krawczyk, Ted Krovetz, and Phillip Rogaway. 1999. UMAC: Fast and secure message authentication. In Annual Intl. Cryptology Conf. Springer, 216--233. Google ScholarCross Ref
John R. Black Jr. 2000. Message Authentication Codes. Ph.D. Dissertation. University of California Davis.Google Scholar
Zalán Bodó and Lehel Csató. 2014. Linear spectral hashing. Neurocomputing 141 (2014), 117--123. Google ScholarCross Ref
Jean Bourgain. 1985. On Lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics 52, 1--2 (1985), 46--52.Google ScholarCross Ref
Jonathan Brandt. 2010. Transform coding for fast approximate nearest neighbor search in high dimensions. In Proceedings of the 2010 IEEE Conf. Computer Vision and Pattern Recognition (CVPR’10). IEEE, 1815--1822. Google ScholarCross Ref
Frank Breitinger and Harald Baier. 2012. Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In Proceedings of the International Conference on Digital Forensics and Cyber Crime. Springer, 167--182.Google Scholar
Frank Breitinger, Barbara Guttman, Michael McCarrin, Vassil Roussev, and Douglas White. 2014. Approximate matching: Definition and terminology. NIST Special Publication 800 (2014), 168. Google ScholarCross Ref
Andrei Z. Broder. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences 1997. 21--29.Google ScholarDigital Library
Daniel R. L. Brown, Adrian Antipa, Matt Campagna, and Rene Struik. 2008. ECOH: The elliptic curve only hash. Submission to NIST (2008).Google Scholar
J. Lawrence Carter and Mark N. Wegman. 1977. Universal classes of hash functions. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing. ACM, 106--112.Google Scholar
Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. ACM, 380--388. Google ScholarDigital Library
Lianhua Chi, Bin Li, and Xingquan Zhu. 2013. Fast Graph Stream Classification Using Discriminative Clique Hashing. Springer, 225--236. Google ScholarCross Ref
Lianhua Chi, Bin Li, and Xingquan Zhu. 2014. Context-preserving hashing for fast text classification. In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM’14). 100--108. Google ScholarCross Ref
Lynn Choi, Hyogon Kim, Sunil Kim, and Moon Hae Kim. 2009. Scalable packet classification through rulebase partitioning using the maximum entropy hashing. IEEE/ACM Transactions on Networking (TON) 17, 6 (2009), 1926--1935.Google ScholarDigital Library
Ondrej Chum, James Philbin, and Andrew Zisserman. 2008. Near duplicate image detection: Min-hash and tf-idf weighting. In BMVC, Vol. 810. 812--815.Google Scholar
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. 253--262. Google ScholarDigital Library
Bert Den Boer and Antoon Bosselaers. 1991. An attack on the last two rounds of MD4. In Annual Intl. Cryptology Conf. Springer, 194--203.Google Scholar
E. Knuth Donald. 1999. The art of computer programming. Sorting and Searching 3 (1999), 426--458.Google Scholar
César Estébanez, Yago Saez, Gustavo Recio, and Pedro Isasi. 2014. Performance of the most common non-cryptographic hash functions. Software: Practice and Experience 44, 6 (2014), 681--698. Google ScholarDigital Library
Christos Faloutsos and King-Ip Lin. 1995. FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Vol. 24. ACM. Google ScholarDigital Library
Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1 (1974), 1--9. Google ScholarDigital Library
PUB FIPS. 1995. 180-1. secure hash standard. National Institute of Standards and Tech 17 (1995), 45.Google Scholar
G. Fowler. 1991. Fowler/Noll/Vo (FNV) hash. Retrieved from http://isthe. com/chongo/tech/comp/fnv.Google Scholar
Haiyan Fu, Xiangwei Kong, and Jiayin Lu. 2013. Large-scale image retrieval based on boosting iterative quantization hashing with query-adaptive reranking. Neurocomputing 122 (2013), 480--489. Google ScholarDigital Library
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In VLDB, Vol. 99. 518--529.Google ScholarDigital Library
Yunchao Gong, Sanjiv Kumar, Vishal Verma, and Svetlana Lazebnik. 2012. Angular quantization-based binary codes for fast similarity search. In Advances in Neural Info Processing Systems. 1196--1204.Google Scholar
Yunchao Gong and Svetlana Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 817--824. Google ScholarDigital Library
Shai Halevi and Hugo Krawczyk. 2006. Strengthening Digital Signatures via Randomized Hashing. Springer, 41--59. Google ScholarDigital Library
Junfeng He, Wei Liu, and Shih-Fu Chang. 2010. Scalable similarity search with optimized kernel hashing. In Proceedings of the 16th SIGKDD International Conference on Knowledge Discovery and Data Mining. 1129--1138. Google ScholarDigital Library
Kaiming He, Fang Wen, and Jian Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2938--2945. Google ScholarDigital Library
Jae-Pil Heo, Youngwoon Lee, Junfeng He, Shih-Fu Chang, and Sung-Eui Yoon. 2012. Spherical hashing. In Proceedings of the 2012 IEEE Conference Computer Vision and Pattern Recognition (CVPR’12). IEEE, 2957--2964.Google Scholar
Gisli R. Hjaltason and Hanan Samet. 2003. Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 5 (2003), 530--549. Google ScholarDigital Library
Paul Hsieh. 2004. Hash functions.Google Scholar
Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing. 604--613. Google ScholarDigital Library
Sergey Ioffe. 2010. Improved consistent sampling, weighted minhash and l1 sketching. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 246--255. Google ScholarDigital Library
Go Irie, Zhenguo Li, Xiao-Ming Wu, and Shih-Fu Chang. 2014. Locally linear hashing for extracting non-linear manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2115--2122. Google ScholarDigital Library
Tetsu Iwata and Kaoru Kurosawa. 2003. Omac: One-key cbc mac. In Intl. Workshop on Fast Software Encryption. Springer, 129--153. Google ScholarCross Ref
H. V. Jagadish. 1997. Analysis of the hilbert curve for representing two-dimensional space. Information Processing Letters 62, 1 (1997), 17--22. Google ScholarDigital Library
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. Springer, 304--317. Google ScholarDigital Library
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128. Google ScholarDigital Library
Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, and Qi Tian. 2012. Super-bit locality-sensitive hashing. In Advances in Neural Information Processing Systems. 108--116.Google Scholar
Minho Jin and Chang Dong Yoo. 2009. Quantum hashing for multimedia. IEEE Transactions on Information Forensics and Security 4, 4 (2009), 982--994. Google ScholarDigital Library
Alexis Joly and Olivier Buisson. 2011. Random maximum margin hashing. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 873--880. Google ScholarDigital Library
Alexis Joly, Carl Frélicot, and Olivier Buisson. 2004. Feature statistical retrieval applied to content based copy identification. In Proceedings of the International Conference on Image Processing, Vol. 1. IEEE, 681--684. Google ScholarCross Ref
Burton Kaliski. 1992. The MD2 Message-Digest Algorithm. Technical Report.Google Scholar
Yoonseop Kang, Saehoon Kim, and Seungjin Choi. 2012. Deep learning to hash with multiple representations. In ICDM. 930--935. Google ScholarDigital Library
Ravneet Kaur and Amandeep Kaur. 2012. Digital signature. In Proceedings of the 2012 International Conference on Computing Sciences (ICCS’12). IEEE, 295--301. Google ScholarDigital Library
Saehoon Kim and Seungjin Choi. 2011. Semi-supervised discriminant hashing. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM’11). IEEE, 1122--1127. Google ScholarDigital Library
Donald Ervin Knuth. 1998. The Art of Computer Programming: Sorting and Searching. Vol. 3. Pearson Education.Google Scholar
Weihao Kong and Wu-Jun Li. 2012. Isotropic hashing. In Advances in Neural Information Processing Systems. 1646--1654.Google Scholar
Weihao Kong, Wu-Jun Li, and Minyi Guo. 2012. Manhattan hashing for large-scale image retrieval. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 45--54. Google ScholarDigital Library
Vanja Kontak, Sinisa Srbljic, and Dejan Skvorc. 2012. Hashing scheme for space-efficient detection and localization of changes in large data sets. In Proceedings of the 35th International Convention. 1496--1501.Google Scholar
Simon Korman and Shai Avidan. 2011. Coherency sensitive hashing. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV’11). IEEE, 1607--1614. Google ScholarDigital Library
Jesse Kornblum. 2006. Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3 (2006), 91--97. Google ScholarDigital Library
Hugo Krawczyk, Ran Canetti, and Mihir Bellare. 1997. HMAC: Keyed-hashing for message authentication. Informational (1997).Google Scholar
Brian Kulis and Trevor Darrell. 2009. Learning to hash with binary reconstructive embeddings. In Advances in Neural Information Processing Systems. 1042--1050.Google Scholar
Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2130--2137. Google ScholarCross Ref
Brian Kulis, Prateek Jain, and Kristen Grauman. 2009. Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 12 (2009), 2143--2157. Google ScholarDigital Library
Bin Li, Xingquan Zhu, Lianhua Chi, and Chengqi Zhang. 2012. Nested subtree hash kernels for large-scale graph classification over streams. In Proceedings of the IEEE International Conference on Data Mining. 399--408. Google ScholarDigital Library
Ping Li, Arnd Konig, and Wenhao Gui. 2010. B-bit minwise hashing for estimating three-way similarities. In Advances in Neural Information Processing Systems. 1387--1395.Google Scholar
Ping Li and Arnd Christian Konig. 2011. Theory and applications of b-bit minwise hashing. Communications of the ACM 54, 8 (2011), 101--109. Google ScholarDigital Library
Ping Li and Christian Konig. 2010. B-bit minwise hashing. In Proceedings of the 19th International Conference on World Wide Web. ACM, 671--680. Google ScholarDigital Library
Ping Li, Anshumali Shrivastava, Joshua L. Moore, and Arnd C. Konig. 2011. Hashing algorithms for large-scale learning. In Advances in Neural Information Processing Systems. 2672--2680.Google Scholar
Peng Li, Meng Wang, Jian Cheng, Changsheng Xu, and Hanqing Lu. 2013. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia 15, 1 (2013), 141--152. Google ScholarDigital Library
Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and David Suter. 2014. Fast supervised hashing with decision trees for high-dimensional data. In Proceedings of the IEEE Conference on CVPR. 1963--1970. Google ScholarDigital Library
Guosheng Lin, Chunhua Shen, David Suter, and Anton van den Hengel. 2013. A general two-step approach to learning-based hashing. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV’13). IEEE, 2552--2559. Google ScholarDigital Library
Yue Lin, Rong Jin, Deng Cai, Shuicheng Yan, and Xuelong Li. 2013. Compressed hashing. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, 446--451. Google ScholarDigital Library
Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012b. Supervised hashing with kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2074--2081. Google ScholarCross Ref
Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with graphs. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1--8.Google ScholarDigital Library
Xianglong Liu, Junfeng He, Di Liu, and Bo Lang. 2012a. Compact kernel hashing with multiple features. In Proc. of the 20th ACM Intl. Conf. on Multimedia. ACM, 881--884. Google ScholarDigital Library
Yi Lu, Balaji Prabhakar, and Flavio Bonomi. 2006. Perfect hashing for network applications. In 2006 IEEE Intl. Symp. on Information Theory. IEEE, 2774--2778. Google ScholarCross Ref
Hans Peter Luhn. 1953. A new method of recording and searching information. American Documentation 4, 1 (1953), 14--16. Google ScholarCross Ref
Mark Manasse, Frank McSherry, and Kunal Talwar. 2010. Consistent weighted sampling. Unpublished Technical Report. Retrieved from http://research.microsoft.com/en-us/people/manasse.Google Scholar
Christopher Martinez and Wei-Ming Lin. 2006. Adaptive hashing for IP address lookup in computer networks. In Proceedings of the 14th IEEE International Conference on Networks, 2006 (ICON’06), Vol. 1. IEEE, 1--6. Google ScholarCross Ref
Christopher J. Martinez, Wei-Ming Lin, and Parimal Patel. 2005. Optimal XOR hashing for a linearly distributed address lookup in computer networks. In Proceedings of the ACM/IEEE ANCS Symposium. 203--210. Google ScholarDigital Library
Christopher J. Martinez, Devang K. Pandya, and Wei-Ming Lin. 2009. On designing fast nonuniformly distributed ip address lookup hashing algorithms. IEEE/ACM Transactions on Networking 17, 6 (2009). Google ScholarDigital Library
Jonathan Masci, Michael M. Bronstein, Alexander M. Bronstein, and Jürgen Schmidhuber. 2014. Multimodal similarity-preserving hashing. IEEE Transactions on Pattern Analalysis and Machine Intelligence 36, 4 (2014), 824--830. Google ScholarDigital Library
Florian Mendel, Norbert Pramstaller, Christian Rechberger, Marcin Kontak, and Janusz Szmidt. 2008. Cryptanalysis of the GOST hash function. In Proceedings of the Annual International Cryptology Conference. Springer, 162--178. Google ScholarDigital Library
Alfred J. Menezes, Paul C. Van Oorschot, and Scott A. Vanstone. 1996. Handbook of Applied Cryptography. CRC Press. Google ScholarCross Ref
Sean Moran, Victor Lavrenko, and Miles Osborne. 2013a. Neighbourhood preserving quantisation for lsh. In Proceedings of the 36th ACM SIGIR Conference on Research and Development in Information Retrieval. 1009--1012. Google ScholarDigital Library
Sean Moran, Victor Lavrenko, and Miles Osborne. 2013b. Variable bit quantisation for LSH. In ACL (2). 753--758.Google Scholar
Robert Morris. 1968. Scatter storage techniques. Communications of the ACM 11, 1 (1968), 38--44. Google ScholarDigital Library
Yadong Mu, Xiangyu Chen, Xianglong Liu, and et al. 2012. Multimedia semantics-aware query-adaptive hashing with bits reconfigurability. International Journal of Multimedia Information Retrieval 1, 1 (2012), 59--70. Google ScholarCross Ref
Yadong Mu, Jialie Shen, and Shuicheng Yan. 2010. Weakly-supervised hashing in kernel space. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3344--3351. Google ScholarCross Ref
Mohammad Norouzi and David M. Blei. 2011. Minimal loss hashing for compact binary codes. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 353--360.Google Scholar
Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 3 (2001), 145--175. Google ScholarDigital Library
Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, and Shiqiang Yang. 2013. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the ACM SIGKDD Conference. 230--238. Google ScholarDigital Library
Arash Partow. 2013. General purpose hash function algorithms. Retrieved from http://www.partow.net/programming/hashfunctions.Google Scholar
W. Wesley Peterson. 1957. Addressing for random-access storage. IBM Journal of Research and Development 1, 2 (1957), 130--146. Google ScholarDigital Library
Sébastien Poullot, Olivier Buisson, and Michel Crucianu. 2007. Z-grid-based probabilistic retrieval for scaling up content-based copy detection. In Proceedings of the 6th ACM Conference on Image and Video Retrieval. 348--355. Google ScholarDigital Library
Maxim Raginsky and Svetlana Lazebnik. 2009. Locality-sensitive binary codes from shift-invariant kernels. In Advances in Neural Information Processing Systems. 1509--1517.Google Scholar
Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. 2013. Predictable dual-view hashing. In Proceedings of the 30th International Conference on Machine Learning. 1328--1336.Google Scholar
Dennis M. Ritchie, Brian W. Kernighan, and Michael E. Lesk. 1988. The C Programming Language. Prentice Hall, Englewood Cliffs, NJ.Google Scholar
Ronald Rivest. 1992. The MD4 Message-Digest Algorithm, RFC 1320. MIT and RSA Data Security, Inc (1992).Google ScholarDigital Library
Ronald L. Rivest, Benjamin Agre, Daniel V. Bailey, Christopher Crutchfield, Yevgeniy Dodis, Kermin Elliottet Fleming, Asif Khan, Jayant Krishnamurthy, Yuncheng Lin, and Leo Reyzin. 2008. The MD6 hash function--a proposal to NIST for SHA-3. Submission to NIST 2 (2008), 3.Google Scholar
N. Rogier and Pascal Chauvaud. 1997. MD2 is not secure without the checksum byte. Designs, Codes and Cryptography 12, 3 (1997), 245--251.Google ScholarDigital Library
Vassil Roussev. 2010. Data fingerprinting with similarity digests. In Proceedings of the IFIP International Conference on Digital Forensics. Springer, 207--226. Google ScholarCross Ref
Caitlin Sadowski and Greg Levin. 2007. Simhash: Hash-based similarity detection. www.googlecode.com/sun/trunk/paper/SimHashwithBib.pdf (2007).Google Scholar
Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969--978. Google ScholarDigital Library
Ruslan Salakhutdinov and Geoffrey E Hinton. 2007. Learning a nonlinear embedding by preserving class neighbourhood structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 412--419.Google Scholar
Robert E. Schapire and Yoram Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 3 (1999), 297--336. Google ScholarDigital Library
Marc Schneider and Shih-Fu Chang. 1996. A robust content based digital signature for image authentication. In Proceedings of the International Conference on Image Processing, 1996, Vol. 3. IEEE, 227--230. Google ScholarCross Ref
Gregory Shakhnarovich. 2005. Learning Task-Specific Similarity. Thesis.Google Scholar
Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. 2006. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. The MIT Press (2006).Google ScholarDigital Library
Gregory Shakhnarovich, Paul Viola, and Trevor Darrell. 2003. Fast pose estimation with parameter-sensitive hashing. In Proceedings of the 9th International Conference on Computer Vision. 750--757. Google ScholarCross Ref
Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and Zhenmin Tang. 2013. Inductive hashing on manifolds. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 1562--1569. Google ScholarDigital Library
Anshumali Shrivastava. 2016. Exact weighted minwise hashing in constant time. arXiv Preprint arXiv:1602.08393 (2016).Google Scholar
Anshumali Shrivastava and Ping Li. 2014a. Densifying one permutation hashing via rotation for fast near neighbor search. In ICML. 557--565.Google Scholar
Anshumali Shrivastava and Ping Li. 2014b. Improved densification of one permutation hashing. arXiv Preprint arXiv:1406.4784 (2014).Google Scholar
Anshumali Shrivastava and Ping Li. 2014c. In defense of minhash over simhash. In AISTATS. 886--894.Google Scholar
Alan Siegel. 2004. On universal classes of extremely random constant-time hash functions. SIAM Journal on Computing 33, 3 (2004), 505--543. Google ScholarDigital Library
Nádia F. F. Silva, Eduardo R. Hruschka, and Estevam Rafael Hruschka Jr. 2014. Biocom usp: Tweet sentiment analysis with adaptive boosting ensemble. SemEval 2014 (2014), 123.Google Scholar
Jingkuan Song. 2015. Effective Hashing for Searching Large-scale Multimedia Databases. Thesis.Google Scholar
Martin Steinebach, Huajian Liu, and York Yannikos. 2012. Forbild: Efficient robust image hashing. In Proceedings of the SPIE Conference on Media Watermarking, Security and Forensics, Vol. 8303. Google ScholarCross Ref
Christoph Strecha, Alexander M. Bronstein, Michael M. Bronstein, and Pascal Fua. 2012. LDAHash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 1 (2012), 66--78. Google ScholarDigital Library
Antonio Torralba, Robert Fergus, and Yair Weiss. 2008. Small codes and large image databases for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR’08). IEEE, 1--8. Google ScholarCross Ref
Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2010a. Semi-supervised hashing for scalable image retrieval. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3424--3431. Google ScholarCross Ref
Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2010b. Sequential projection learning for hashing with compact codes. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 1127--1134.Google ScholarDigital Library
Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2012. Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 12 (2012), 2393--2406. Google ScholarDigital Library
Xiong Wang, Jason T. L. Wang, King-Ip Lin, Dennis Shasha, Bruce A. Shapiro, and Kaizhong Zhang. 2000. An index structure for data mining and clustering. Knowledge and Information Systems 2, 2 (2000), 161--184. Google ScholarCross Ref
X.-J. Wang, Lei Zhang, Feng Jing, and Wei-Ying Ma. 2006. Annosearch: Image auto-annotation by search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 1483--1490.Google Scholar
Roger Weber, Hans-Jörg Schek, and Stephen Blott. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. 98. 194--205.Google ScholarDigital Library
Mark N. Wegman and J. Lawrence Carter. 1981. New hash functions and their use in authentication and set equality. Journal of Computer and System Sciences 22, 3 (1981), 265--279. Google ScholarCross Ref
Yair Weiss, Rob Fergus, and Antonio Torralba. 2012. Multidimensional Spectral Hashing. Springer, 340--353. Google ScholarDigital Library
Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Advances in Neural Information Processing Systems. 1753--1760.Google Scholar
Chenxia Wu, Jianke Zhu, Deng Cai, Chun Chen, and Jiajun Bu. 2013. Semi-supervised nonlinear hashing using bootstrap sequential projection learning. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2013), 1380--1393. Google ScholarDigital Library
Hao Xu, Jingdong Wang, Zhu Li, Gang Zeng, Shipeng Li, and Nenghai Yu. 2011b. Complementary hashing for approximate nearest neighbor search. In Proceedings of the IEEE Conference on Computer Vision (ICCV’11). 1631--1638. Google ScholarDigital Library
Yang Xu, Lei Ma, Zhaobo Liu, and H Jonathan Chao. 2011a. A multi-dimensional progressive perfect hashing for high-speed string matching. In Prof. of ACM/IEEE ANCS Symp. 167--177. Google ScholarDigital Library
Zhao Xu, Kristian Kersting, and Christian Bauckhage. 2012. Efficient learning for hashing proportional data. In 2012 IEEE 12th International Conference on Data Mining (ICDM’12). 735--744. Google ScholarDigital Library
Atsushi Yoshioka, Shariful Hasan Shaikot, and Min Sik Kim. 2008. Rule hashing for efficient packet classification in network intrusion detection. In Proc. of Computer Communications and Networks. 1--6. Google ScholarCross Ref
Xiang Yu, Shaoting Zhang, Bo Liu, Lin Zhong, and Dimitris N Metaxas. 2013. Large scale medical image search via unsupervised PCA hashing. In Proceedings of IEEEE CVPR Workshops. 393--398. Google ScholarDigital Library
Feng Yue, Bin Li, Ming Yu, and JiaQiang Wang. 2011. Fast palmprint identification using orientation pattern hashing. In Proceedings of the 2011 International Conference on Hand-Based Biometrics (ICHB’11). IEEE, 1--6.Google Scholar
Feng Yue, Bin Li, Ming Yu, and Jiaqiang Wang. 2013. Hashing based fast palmprint identification for large-scale databases. IEEE Transactions on Information Forensics and Security 8, 5 (2013), 769--778. Google ScholarDigital Library
Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010a. Laplacian Co-Hashing of Terms and Documents. Springer, 577--580.Google Scholar
Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010b. Self-taught hashing for fast similarity search. In Proceedings of the 33rd ACM SIGIR Conference. 18--25.Google ScholarDigital Library
Lei Zhang, Yongdong Zhang, Xiaoguang Gu, Jinhui Tang, and Qi Tian. 2014a. Scalable similarity search with topology preserving hashing. IEEE Transactions on Image Processing 23, 7 (2014), 3025--3039. Google ScholarCross Ref
Lei Zhang, Yongdong Zhang, Dongming Zhang, and Qi Tian. 2013. Distribution-Aware Locality Sensitive Hashing. Springer, 395--406. Google ScholarCross Ref
Peichao Zhang, Wei Zhang, Wu-Jun Li, and Minyi Guo. 2014b. Supervised hashing with latent factor models. In Proceedings of the 37th ACM SIGIR Conf. ACM, 173--182. Google ScholarDigital Library
Yi Zhen and Dit-Yan Yeung. 2012. A probabilistic model for multimodal hash function learning. In Proceedings of the 18th SIGKDD Conf. on Knowledge Discovery and Data Mining. ACM, 940--948. Google ScholarDigital Library
Yuliang Zheng, Josef Pieprzyk, and Jennifer Seberry. 1992. HAVAL: A one-way hashing algorithm with variable length of output. In Proceedings of the International Workshop on the Theory and Application of Cryptographic Technology. Springer, 81--104.Google Scholar
Xiaofeng Zhu, Zi Huang, Hong Cheng, Jiangtao Cui, and Heng Tao Shen. 2013a. Sparse hashing for fast multimedia search. ACM Transactions on Information Systems (TOIS) 31, 2 (2013), 9.Google ScholarDigital Library
Xiaofeng Zhu, Zi Huang, Heng Tao Shen, and Xin Zhao. 2013b. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the 21st ACM Intl. Conf. on Multimedia. ACM, 143--152. Google ScholarDigital Library

Index Terms

Hashing Techniques: A Survey and Taxonomy
1. General and reference
  1. Document types
    1. Surveys and overviews

Recommendations

Fast locality-sensitive hashing
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Locality-sensitive hashing (LSH) is a basic primitive in several large-scale data processing applications, including nearest-neighbor search, de-duplication, clustering, etc. In this paper we propose a new and simple method to speed up the widely-used ...
Read More
Extendible hashing—a fast access method for dynamic files

Extendible hashing is a new access technique, in which the user is guaranteed no more than two page faults to locate the data associated with a given unique identifier, or key. Unlike conventional hashing, extendible hashing has a dynamic structure that ...
Read More
Complementary Projection Hashing
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer Vision

Recently, hashing techniques have been widely applied to solve the approximate nearest neighbors search problem in many vision applications. Generally, these hashing approaches generate 2^c buckets, where c is the length of the hash code. A good hashing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 50, Issue 1
January 2018
588 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3058791
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 April 2017
- Accepted: 1 January 2017
- Revised: 1 December 2016
- Received: 1 March 2016
Published in csur Volume 50, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hashing
compression
cryptographic hashing
data coding
dimension reduction
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 92
  Total Citations
  View Citations
- 2,548
  Total Downloads
- Downloads (Last 12 months)311
- Downloads (Last 6 weeks)52
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hashing Techniques: A Survey and Taxonomy

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Fast locality-sensitive hashing

Extendible hashing—a fast access method for dynamic files

Complementary Projection Hashing