skip to main content
survey

Hashing Techniques: A Survey and Taxonomy

Published:04 April 2017Publication History
Skip Abstract Section

Abstract

With the rapid development of information storage and networking technologies, quintillion bytes of data are generated every day from social networks, business transactions, sensors, and many other domains. The increasing data volumes impose significant challenges to traditional data analysis tools in storing, processing, and analyzing these extremely large-scale data. For decades, hashing has been one of the most effective tools commonly used to compress data for fast access and analysis, as well as information integrity verification. Hashing techniques have also evolved from simple randomization approaches to advanced adaptive methods considering locality, structure, label information, and data security, for effective hashing. This survey reviews and categorizes existing hashing techniques as a taxonomy, in order to provide a comprehensive view of mainstream hashing techniques for different types of data and applications. The taxonomy also studies the uniqueness of each method and therefore can serve as technique references in understanding the niche of different hashing mechanisms for future development.

References

  1. Austin Appleby. 2008. Murmurhash 2.0.Google ScholarGoogle Scholar
  2. Vassilis Athitsos and Stan Sclaroff. 2003. Estimating 3D hand pose from a cluttered image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. II--432--9. Google ScholarGoogle ScholarCross RefCross Ref
  3. Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier. 2005. A family of fast syndrome based cryptographic hash functions. In Proceedings of the International Conference on Cryptology in Malaysia. Springer, 64--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jean-Philippe Aumasson and Daniel J. Bernstein. 2012. SipHash: A fast short-input PRF. In Proceedings of the International Conference on Cryptology in India. Springer, 489--508. Google ScholarGoogle ScholarCross RefCross Ref
  5. Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-OHearn, and Christian Winnerlein. 2013. BLAKE2: Simpler, smaller, fast as MD5. In Proceedings of the International Conference on Applied Cryptography 8 Net Security. Springer, 119--135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel J. Bernstein. 2005. The poly1305-AES message-authentication code. In Proceedings of the International Workshop on Fast Software Encryption. Springer, 32--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Krishna Bharat and Andrei Broder. 1998. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems 30, 1 (1998), 379--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. John Black, Shai Halevi, Hugo Krawczyk, Ted Krovetz, and Phillip Rogaway. 1999. UMAC: Fast and secure message authentication. In Annual Intl. Cryptology Conf. Springer, 216--233. Google ScholarGoogle ScholarCross RefCross Ref
  9. John R. Black Jr. 2000. Message Authentication Codes. Ph.D. Dissertation. University of California Davis.Google ScholarGoogle Scholar
  10. Zalán Bodó and Lehel Csató. 2014. Linear spectral hashing. Neurocomputing 141 (2014), 117--123. Google ScholarGoogle ScholarCross RefCross Ref
  11. Jean Bourgain. 1985. On Lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics 52, 1--2 (1985), 46--52.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jonathan Brandt. 2010. Transform coding for fast approximate nearest neighbor search in high dimensions. In Proceedings of the 2010 IEEE Conf. Computer Vision and Pattern Recognition (CVPR’10). IEEE, 1815--1822. Google ScholarGoogle ScholarCross RefCross Ref
  13. Frank Breitinger and Harald Baier. 2012. Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In Proceedings of the International Conference on Digital Forensics and Cyber Crime. Springer, 167--182.Google ScholarGoogle Scholar
  14. Frank Breitinger, Barbara Guttman, Michael McCarrin, Vassil Roussev, and Douglas White. 2014. Approximate matching: Definition and terminology. NIST Special Publication 800 (2014), 168. Google ScholarGoogle ScholarCross RefCross Ref
  15. Andrei Z. Broder. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences 1997. 21--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Daniel R. L. Brown, Adrian Antipa, Matt Campagna, and Rene Struik. 2008. ECOH: The elliptic curve only hash. Submission to NIST (2008).Google ScholarGoogle Scholar
  17. J. Lawrence Carter and Mark N. Wegman. 1977. Universal classes of hash functions. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing. ACM, 106--112.Google ScholarGoogle Scholar
  18. Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. ACM, 380--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lianhua Chi, Bin Li, and Xingquan Zhu. 2013. Fast Graph Stream Classification Using Discriminative Clique Hashing. Springer, 225--236. Google ScholarGoogle ScholarCross RefCross Ref
  20. Lianhua Chi, Bin Li, and Xingquan Zhu. 2014. Context-preserving hashing for fast text classification. In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM’14). 100--108. Google ScholarGoogle ScholarCross RefCross Ref
  21. Lynn Choi, Hyogon Kim, Sunil Kim, and Moon Hae Kim. 2009. Scalable packet classification through rulebase partitioning using the maximum entropy hashing. IEEE/ACM Transactions on Networking (TON) 17, 6 (2009), 1926--1935.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ondrej Chum, James Philbin, and Andrew Zisserman. 2008. Near duplicate image detection: Min-hash and tf-idf weighting. In BMVC, Vol. 810. 812--815.Google ScholarGoogle Scholar
  23. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Bert Den Boer and Antoon Bosselaers. 1991. An attack on the last two rounds of MD4. In Annual Intl. Cryptology Conf. Springer, 194--203.Google ScholarGoogle Scholar
  25. E. Knuth Donald. 1999. The art of computer programming. Sorting and Searching 3 (1999), 426--458.Google ScholarGoogle Scholar
  26. César Estébanez, Yago Saez, Gustavo Recio, and Pedro Isasi. 2014. Performance of the most common non-cryptographic hash functions. Software: Practice and Experience 44, 6 (2014), 681--698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christos Faloutsos and King-Ip Lin. 1995. FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Vol. 24. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1 (1974), 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. PUB FIPS. 1995. 180-1. secure hash standard. National Institute of Standards and Tech 17 (1995), 45.Google ScholarGoogle Scholar
  30. G. Fowler. 1991. Fowler/Noll/Vo (FNV) hash. Retrieved from http://isthe. com/chongo/tech/comp/fnv.Google ScholarGoogle Scholar
  31. Haiyan Fu, Xiangwei Kong, and Jiayin Lu. 2013. Large-scale image retrieval based on boosting iterative quantization hashing with query-adaptive reranking. Neurocomputing 122 (2013), 480--489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In VLDB, Vol. 99. 518--529.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yunchao Gong, Sanjiv Kumar, Vishal Verma, and Svetlana Lazebnik. 2012. Angular quantization-based binary codes for fast similarity search. In Advances in Neural Info Processing Systems. 1196--1204.Google ScholarGoogle Scholar
  34. Yunchao Gong and Svetlana Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 817--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shai Halevi and Hugo Krawczyk. 2006. Strengthening Digital Signatures via Randomized Hashing. Springer, 41--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Junfeng He, Wei Liu, and Shih-Fu Chang. 2010. Scalable similarity search with optimized kernel hashing. In Proceedings of the 16th SIGKDD International Conference on Knowledge Discovery and Data Mining. 1129--1138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kaiming He, Fang Wen, and Jian Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2938--2945. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jae-Pil Heo, Youngwoon Lee, Junfeng He, Shih-Fu Chang, and Sung-Eui Yoon. 2012. Spherical hashing. In Proceedings of the 2012 IEEE Conference Computer Vision and Pattern Recognition (CVPR’12). IEEE, 2957--2964.Google ScholarGoogle Scholar
  39. Gisli R. Hjaltason and Hanan Samet. 2003. Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 5 (2003), 530--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Paul Hsieh. 2004. Hash functions.Google ScholarGoogle Scholar
  41. Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing. 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sergey Ioffe. 2010. Improved consistent sampling, weighted minhash and l1 sketching. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 246--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Go Irie, Zhenguo Li, Xiao-Ming Wu, and Shih-Fu Chang. 2014. Locally linear hashing for extracting non-linear manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2115--2122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tetsu Iwata and Kaoru Kurosawa. 2003. Omac: One-key cbc mac. In Intl. Workshop on Fast Software Encryption. Springer, 129--153. Google ScholarGoogle ScholarCross RefCross Ref
  45. H. V. Jagadish. 1997. Analysis of the hilbert curve for representing two-dimensional space. Information Processing Letters 62, 1 (1997), 17--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. Springer, 304--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, and Qi Tian. 2012. Super-bit locality-sensitive hashing. In Advances in Neural Information Processing Systems. 108--116.Google ScholarGoogle Scholar
  49. Minho Jin and Chang Dong Yoo. 2009. Quantum hashing for multimedia. IEEE Transactions on Information Forensics and Security 4, 4 (2009), 982--994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Alexis Joly and Olivier Buisson. 2011. Random maximum margin hashing. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 873--880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Alexis Joly, Carl Frélicot, and Olivier Buisson. 2004. Feature statistical retrieval applied to content based copy identification. In Proceedings of the International Conference on Image Processing, Vol. 1. IEEE, 681--684. Google ScholarGoogle ScholarCross RefCross Ref
  52. Burton Kaliski. 1992. The MD2 Message-Digest Algorithm. Technical Report.Google ScholarGoogle Scholar
  53. Yoonseop Kang, Saehoon Kim, and Seungjin Choi. 2012. Deep learning to hash with multiple representations. In ICDM. 930--935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ravneet Kaur and Amandeep Kaur. 2012. Digital signature. In Proceedings of the 2012 International Conference on Computing Sciences (ICCS’12). IEEE, 295--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Saehoon Kim and Seungjin Choi. 2011. Semi-supervised discriminant hashing. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM’11). IEEE, 1122--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Donald Ervin Knuth. 1998. The Art of Computer Programming: Sorting and Searching. Vol. 3. Pearson Education.Google ScholarGoogle Scholar
  57. Weihao Kong and Wu-Jun Li. 2012. Isotropic hashing. In Advances in Neural Information Processing Systems. 1646--1654.Google ScholarGoogle Scholar
  58. Weihao Kong, Wu-Jun Li, and Minyi Guo. 2012. Manhattan hashing for large-scale image retrieval. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Vanja Kontak, Sinisa Srbljic, and Dejan Skvorc. 2012. Hashing scheme for space-efficient detection and localization of changes in large data sets. In Proceedings of the 35th International Convention. 1496--1501.Google ScholarGoogle Scholar
  60. Simon Korman and Shai Avidan. 2011. Coherency sensitive hashing. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV’11). IEEE, 1607--1614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jesse Kornblum. 2006. Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3 (2006), 91--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Hugo Krawczyk, Ran Canetti, and Mihir Bellare. 1997. HMAC: Keyed-hashing for message authentication. Informational (1997).Google ScholarGoogle Scholar
  63. Brian Kulis and Trevor Darrell. 2009. Learning to hash with binary reconstructive embeddings. In Advances in Neural Information Processing Systems. 1042--1050.Google ScholarGoogle Scholar
  64. Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2130--2137. Google ScholarGoogle ScholarCross RefCross Ref
  65. Brian Kulis, Prateek Jain, and Kristen Grauman. 2009. Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 12 (2009), 2143--2157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Bin Li, Xingquan Zhu, Lianhua Chi, and Chengqi Zhang. 2012. Nested subtree hash kernels for large-scale graph classification over streams. In Proceedings of the IEEE International Conference on Data Mining. 399--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Ping Li, Arnd Konig, and Wenhao Gui. 2010. B-bit minwise hashing for estimating three-way similarities. In Advances in Neural Information Processing Systems. 1387--1395.Google ScholarGoogle Scholar
  68. Ping Li and Arnd Christian Konig. 2011. Theory and applications of b-bit minwise hashing. Communications of the ACM 54, 8 (2011), 101--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Ping Li and Christian Konig. 2010. B-bit minwise hashing. In Proceedings of the 19th International Conference on World Wide Web. ACM, 671--680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Ping Li, Anshumali Shrivastava, Joshua L. Moore, and Arnd C. Konig. 2011. Hashing algorithms for large-scale learning. In Advances in Neural Information Processing Systems. 2672--2680.Google ScholarGoogle Scholar
  71. Peng Li, Meng Wang, Jian Cheng, Changsheng Xu, and Hanqing Lu. 2013. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia 15, 1 (2013), 141--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and David Suter. 2014. Fast supervised hashing with decision trees for high-dimensional data. In Proceedings of the IEEE Conference on CVPR. 1963--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Guosheng Lin, Chunhua Shen, David Suter, and Anton van den Hengel. 2013. A general two-step approach to learning-based hashing. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV’13). IEEE, 2552--2559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Yue Lin, Rong Jin, Deng Cai, Shuicheng Yan, and Xuelong Li. 2013. Compressed hashing. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, 446--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012b. Supervised hashing with kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2074--2081. Google ScholarGoogle ScholarCross RefCross Ref
  76. Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with graphs. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Xianglong Liu, Junfeng He, Di Liu, and Bo Lang. 2012a. Compact kernel hashing with multiple features. In Proc. of the 20th ACM Intl. Conf. on Multimedia. ACM, 881--884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Yi Lu, Balaji Prabhakar, and Flavio Bonomi. 2006. Perfect hashing for network applications. In 2006 IEEE Intl. Symp. on Information Theory. IEEE, 2774--2778. Google ScholarGoogle ScholarCross RefCross Ref
  79. Hans Peter Luhn. 1953. A new method of recording and searching information. American Documentation 4, 1 (1953), 14--16. Google ScholarGoogle ScholarCross RefCross Ref
  80. Mark Manasse, Frank McSherry, and Kunal Talwar. 2010. Consistent weighted sampling. Unpublished Technical Report. Retrieved from http://research.microsoft.com/en-us/people/manasse.Google ScholarGoogle Scholar
  81. Christopher Martinez and Wei-Ming Lin. 2006. Adaptive hashing for IP address lookup in computer networks. In Proceedings of the 14th IEEE International Conference on Networks, 2006 (ICON’06), Vol. 1. IEEE, 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  82. Christopher J. Martinez, Wei-Ming Lin, and Parimal Patel. 2005. Optimal XOR hashing for a linearly distributed address lookup in computer networks. In Proceedings of the ACM/IEEE ANCS Symposium. 203--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Christopher J. Martinez, Devang K. Pandya, and Wei-Ming Lin. 2009. On designing fast nonuniformly distributed ip address lookup hashing algorithms. IEEE/ACM Transactions on Networking 17, 6 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Jonathan Masci, Michael M. Bronstein, Alexander M. Bronstein, and Jürgen Schmidhuber. 2014. Multimodal similarity-preserving hashing. IEEE Transactions on Pattern Analalysis and Machine Intelligence 36, 4 (2014), 824--830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Florian Mendel, Norbert Pramstaller, Christian Rechberger, Marcin Kontak, and Janusz Szmidt. 2008. Cryptanalysis of the GOST hash function. In Proceedings of the Annual International Cryptology Conference. Springer, 162--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Alfred J. Menezes, Paul C. Van Oorschot, and Scott A. Vanstone. 1996. Handbook of Applied Cryptography. CRC Press. Google ScholarGoogle ScholarCross RefCross Ref
  87. Sean Moran, Victor Lavrenko, and Miles Osborne. 2013a. Neighbourhood preserving quantisation for lsh. In Proceedings of the 36th ACM SIGIR Conference on Research and Development in Information Retrieval. 1009--1012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Sean Moran, Victor Lavrenko, and Miles Osborne. 2013b. Variable bit quantisation for LSH. In ACL (2). 753--758.Google ScholarGoogle Scholar
  89. Robert Morris. 1968. Scatter storage techniques. Communications of the ACM 11, 1 (1968), 38--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Yadong Mu, Xiangyu Chen, Xianglong Liu, and et al. 2012. Multimedia semantics-aware query-adaptive hashing with bits reconfigurability. International Journal of Multimedia Information Retrieval 1, 1 (2012), 59--70. Google ScholarGoogle ScholarCross RefCross Ref
  91. Yadong Mu, Jialie Shen, and Shuicheng Yan. 2010. Weakly-supervised hashing in kernel space. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3344--3351. Google ScholarGoogle ScholarCross RefCross Ref
  92. Mohammad Norouzi and David M. Blei. 2011. Minimal loss hashing for compact binary codes. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 353--360.Google ScholarGoogle Scholar
  93. Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 3 (2001), 145--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, and Shiqiang Yang. 2013. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the ACM SIGKDD Conference. 230--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Arash Partow. 2013. General purpose hash function algorithms. Retrieved from http://www.partow.net/programming/hashfunctions.Google ScholarGoogle Scholar
  96. W. Wesley Peterson. 1957. Addressing for random-access storage. IBM Journal of Research and Development 1, 2 (1957), 130--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Sébastien Poullot, Olivier Buisson, and Michel Crucianu. 2007. Z-grid-based probabilistic retrieval for scaling up content-based copy detection. In Proceedings of the 6th ACM Conference on Image and Video Retrieval. 348--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Maxim Raginsky and Svetlana Lazebnik. 2009. Locality-sensitive binary codes from shift-invariant kernels. In Advances in Neural Information Processing Systems. 1509--1517.Google ScholarGoogle Scholar
  99. Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. 2013. Predictable dual-view hashing. In Proceedings of the 30th International Conference on Machine Learning. 1328--1336.Google ScholarGoogle Scholar
  100. Dennis M. Ritchie, Brian W. Kernighan, and Michael E. Lesk. 1988. The C Programming Language. Prentice Hall, Englewood Cliffs, NJ.Google ScholarGoogle Scholar
  101. Ronald Rivest. 1992. The MD4 Message-Digest Algorithm, RFC 1320. MIT and RSA Data Security, Inc (1992).Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Ronald L. Rivest, Benjamin Agre, Daniel V. Bailey, Christopher Crutchfield, Yevgeniy Dodis, Kermin Elliottet Fleming, Asif Khan, Jayant Krishnamurthy, Yuncheng Lin, and Leo Reyzin. 2008. The MD6 hash function--a proposal to NIST for SHA-3. Submission to NIST 2 (2008), 3.Google ScholarGoogle Scholar
  103. N. Rogier and Pascal Chauvaud. 1997. MD2 is not secure without the checksum byte. Designs, Codes and Cryptography 12, 3 (1997), 245--251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Vassil Roussev. 2010. Data fingerprinting with similarity digests. In Proceedings of the IFIP International Conference on Digital Forensics. Springer, 207--226. Google ScholarGoogle ScholarCross RefCross Ref
  105. Caitlin Sadowski and Greg Levin. 2007. Simhash: Hash-based similarity detection. www.googlecode.com/sun/trunk/paper/SimHashwithBib.pdf (2007).Google ScholarGoogle Scholar
  106. Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969--978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Ruslan Salakhutdinov and Geoffrey E Hinton. 2007. Learning a nonlinear embedding by preserving class neighbourhood structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 412--419.Google ScholarGoogle Scholar
  108. Robert E. Schapire and Yoram Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 3 (1999), 297--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Marc Schneider and Shih-Fu Chang. 1996. A robust content based digital signature for image authentication. In Proceedings of the International Conference on Image Processing, 1996, Vol. 3. IEEE, 227--230. Google ScholarGoogle ScholarCross RefCross Ref
  110. Gregory Shakhnarovich. 2005. Learning Task-Specific Similarity. Thesis.Google ScholarGoogle Scholar
  111. Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. 2006. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. The MIT Press (2006).Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Gregory Shakhnarovich, Paul Viola, and Trevor Darrell. 2003. Fast pose estimation with parameter-sensitive hashing. In Proceedings of the 9th International Conference on Computer Vision. 750--757. Google ScholarGoogle ScholarCross RefCross Ref
  113. Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and Zhenmin Tang. 2013. Inductive hashing on manifolds. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 1562--1569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Anshumali Shrivastava. 2016. Exact weighted minwise hashing in constant time. arXiv Preprint arXiv:1602.08393 (2016).Google ScholarGoogle Scholar
  115. Anshumali Shrivastava and Ping Li. 2014a. Densifying one permutation hashing via rotation for fast near neighbor search. In ICML. 557--565.Google ScholarGoogle Scholar
  116. Anshumali Shrivastava and Ping Li. 2014b. Improved densification of one permutation hashing. arXiv Preprint arXiv:1406.4784 (2014).Google ScholarGoogle Scholar
  117. Anshumali Shrivastava and Ping Li. 2014c. In defense of minhash over simhash. In AISTATS. 886--894.Google ScholarGoogle Scholar
  118. Alan Siegel. 2004. On universal classes of extremely random constant-time hash functions. SIAM Journal on Computing 33, 3 (2004), 505--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Nádia F. F. Silva, Eduardo R. Hruschka, and Estevam Rafael Hruschka Jr. 2014. Biocom usp: Tweet sentiment analysis with adaptive boosting ensemble. SemEval 2014 (2014), 123.Google ScholarGoogle Scholar
  120. Jingkuan Song. 2015. Effective Hashing for Searching Large-scale Multimedia Databases. Thesis.Google ScholarGoogle Scholar
  121. Martin Steinebach, Huajian Liu, and York Yannikos. 2012. Forbild: Efficient robust image hashing. In Proceedings of the SPIE Conference on Media Watermarking, Security and Forensics, Vol. 8303. Google ScholarGoogle ScholarCross RefCross Ref
  122. Christoph Strecha, Alexander M. Bronstein, Michael M. Bronstein, and Pascal Fua. 2012. LDAHash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 1 (2012), 66--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Antonio Torralba, Robert Fergus, and Yair Weiss. 2008. Small codes and large image databases for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR’08). IEEE, 1--8. Google ScholarGoogle ScholarCross RefCross Ref
  124. Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2010a. Semi-supervised hashing for scalable image retrieval. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3424--3431. Google ScholarGoogle ScholarCross RefCross Ref
  125. Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2010b. Sequential projection learning for hashing with compact codes. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 1127--1134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2012. Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 12 (2012), 2393--2406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Xiong Wang, Jason T. L. Wang, King-Ip Lin, Dennis Shasha, Bruce A. Shapiro, and Kaizhong Zhang. 2000. An index structure for data mining and clustering. Knowledge and Information Systems 2, 2 (2000), 161--184. Google ScholarGoogle ScholarCross RefCross Ref
  128. X.-J. Wang, Lei Zhang, Feng Jing, and Wei-Ying Ma. 2006. Annosearch: Image auto-annotation by search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 1483--1490.Google ScholarGoogle Scholar
  129. Roger Weber, Hans-Jörg Schek, and Stephen Blott. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. 98. 194--205.Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Mark N. Wegman and J. Lawrence Carter. 1981. New hash functions and their use in authentication and set equality. Journal of Computer and System Sciences 22, 3 (1981), 265--279. Google ScholarGoogle ScholarCross RefCross Ref
  131. Yair Weiss, Rob Fergus, and Antonio Torralba. 2012. Multidimensional Spectral Hashing. Springer, 340--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Advances in Neural Information Processing Systems. 1753--1760.Google ScholarGoogle Scholar
  133. Chenxia Wu, Jianke Zhu, Deng Cai, Chun Chen, and Jiajun Bu. 2013. Semi-supervised nonlinear hashing using bootstrap sequential projection learning. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2013), 1380--1393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Hao Xu, Jingdong Wang, Zhu Li, Gang Zeng, Shipeng Li, and Nenghai Yu. 2011b. Complementary hashing for approximate nearest neighbor search. In Proceedings of the IEEE Conference on Computer Vision (ICCV’11). 1631--1638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Yang Xu, Lei Ma, Zhaobo Liu, and H Jonathan Chao. 2011a. A multi-dimensional progressive perfect hashing for high-speed string matching. In Prof. of ACM/IEEE ANCS Symp. 167--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Zhao Xu, Kristian Kersting, and Christian Bauckhage. 2012. Efficient learning for hashing proportional data. In 2012 IEEE 12th International Conference on Data Mining (ICDM’12). 735--744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Atsushi Yoshioka, Shariful Hasan Shaikot, and Min Sik Kim. 2008. Rule hashing for efficient packet classification in network intrusion detection. In Proc. of Computer Communications and Networks. 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  138. Xiang Yu, Shaoting Zhang, Bo Liu, Lin Zhong, and Dimitris N Metaxas. 2013. Large scale medical image search via unsupervised PCA hashing. In Proceedings of IEEEE CVPR Workshops. 393--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Feng Yue, Bin Li, Ming Yu, and JiaQiang Wang. 2011. Fast palmprint identification using orientation pattern hashing. In Proceedings of the 2011 International Conference on Hand-Based Biometrics (ICHB’11). IEEE, 1--6.Google ScholarGoogle Scholar
  140. Feng Yue, Bin Li, Ming Yu, and Jiaqiang Wang. 2013. Hashing based fast palmprint identification for large-scale databases. IEEE Transactions on Information Forensics and Security 8, 5 (2013), 769--778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  142. Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010a. Laplacian Co-Hashing of Terms and Documents. Springer, 577--580.Google ScholarGoogle Scholar
  143. Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010b. Self-taught hashing for fast similarity search. In Proceedings of the 33rd ACM SIGIR Conference. 18--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Lei Zhang, Yongdong Zhang, Xiaoguang Gu, Jinhui Tang, and Qi Tian. 2014a. Scalable similarity search with topology preserving hashing. IEEE Transactions on Image Processing 23, 7 (2014), 3025--3039. Google ScholarGoogle ScholarCross RefCross Ref
  145. Lei Zhang, Yongdong Zhang, Dongming Zhang, and Qi Tian. 2013. Distribution-Aware Locality Sensitive Hashing. Springer, 395--406. Google ScholarGoogle ScholarCross RefCross Ref
  146. Peichao Zhang, Wei Zhang, Wu-Jun Li, and Minyi Guo. 2014b. Supervised hashing with latent factor models. In Proceedings of the 37th ACM SIGIR Conf. ACM, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Yi Zhen and Dit-Yan Yeung. 2012. A probabilistic model for multimodal hash function learning. In Proceedings of the 18th SIGKDD Conf. on Knowledge Discovery and Data Mining. ACM, 940--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Yuliang Zheng, Josef Pieprzyk, and Jennifer Seberry. 1992. HAVAL: A one-way hashing algorithm with variable length of output. In Proceedings of the International Workshop on the Theory and Application of Cryptographic Technology. Springer, 81--104.Google ScholarGoogle Scholar
  149. Xiaofeng Zhu, Zi Huang, Hong Cheng, Jiangtao Cui, and Heng Tao Shen. 2013a. Sparse hashing for fast multimedia search. ACM Transactions on Information Systems (TOIS) 31, 2 (2013), 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. Xiaofeng Zhu, Zi Huang, Heng Tao Shen, and Xin Zhao. 2013b. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the 21st ACM Intl. Conf. on Multimedia. ACM, 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hashing Techniques: A Survey and Taxonomy

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 50, Issue 1
      January 2018
      588 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3058791
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 April 2017
      • Accepted: 1 January 2017
      • Revised: 1 December 2016
      • Received: 1 March 2016
      Published in csur Volume 50, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader