Abstract
With the rapid development of information storage and networking technologies, quintillion bytes of data are generated every day from social networks, business transactions, sensors, and many other domains. The increasing data volumes impose significant challenges to traditional data analysis tools in storing, processing, and analyzing these extremely large-scale data. For decades, hashing has been one of the most effective tools commonly used to compress data for fast access and analysis, as well as information integrity verification. Hashing techniques have also evolved from simple randomization approaches to advanced adaptive methods considering locality, structure, label information, and data security, for effective hashing. This survey reviews and categorizes existing hashing techniques as a taxonomy, in order to provide a comprehensive view of mainstream hashing techniques for different types of data and applications. The taxonomy also studies the uniqueness of each method and therefore can serve as technique references in understanding the niche of different hashing mechanisms for future development.
- Austin Appleby. 2008. Murmurhash 2.0.Google Scholar
- Vassilis Athitsos and Stan Sclaroff. 2003. Estimating 3D hand pose from a cluttered image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. II--432--9. Google ScholarCross Ref
- Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier. 2005. A family of fast syndrome based cryptographic hash functions. In Proceedings of the International Conference on Cryptology in Malaysia. Springer, 64--83. Google ScholarDigital Library
- Jean-Philippe Aumasson and Daniel J. Bernstein. 2012. SipHash: A fast short-input PRF. In Proceedings of the International Conference on Cryptology in India. Springer, 489--508. Google ScholarCross Ref
- Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-OHearn, and Christian Winnerlein. 2013. BLAKE2: Simpler, smaller, fast as MD5. In Proceedings of the International Conference on Applied Cryptography 8 Net Security. Springer, 119--135.Google ScholarDigital Library
- Daniel J. Bernstein. 2005. The poly1305-AES message-authentication code. In Proceedings of the International Workshop on Fast Software Encryption. Springer, 32--49. Google ScholarDigital Library
- Krishna Bharat and Andrei Broder. 1998. A technique for measuring the relative size and overlap of public web search engines. Computer Networks and ISDN Systems 30, 1 (1998), 379--388. Google ScholarDigital Library
- John Black, Shai Halevi, Hugo Krawczyk, Ted Krovetz, and Phillip Rogaway. 1999. UMAC: Fast and secure message authentication. In Annual Intl. Cryptology Conf. Springer, 216--233. Google ScholarCross Ref
- John R. Black Jr. 2000. Message Authentication Codes. Ph.D. Dissertation. University of California Davis.Google Scholar
- Zalán Bodó and Lehel Csató. 2014. Linear spectral hashing. Neurocomputing 141 (2014), 117--123. Google ScholarCross Ref
- Jean Bourgain. 1985. On Lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics 52, 1--2 (1985), 46--52.Google ScholarCross Ref
- Jonathan Brandt. 2010. Transform coding for fast approximate nearest neighbor search in high dimensions. In Proceedings of the 2010 IEEE Conf. Computer Vision and Pattern Recognition (CVPR’10). IEEE, 1815--1822. Google ScholarCross Ref
- Frank Breitinger and Harald Baier. 2012. Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In Proceedings of the International Conference on Digital Forensics and Cyber Crime. Springer, 167--182.Google Scholar
- Frank Breitinger, Barbara Guttman, Michael McCarrin, Vassil Roussev, and Douglas White. 2014. Approximate matching: Definition and terminology. NIST Special Publication 800 (2014), 168. Google ScholarCross Ref
- Andrei Z. Broder. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences 1997. 21--29.Google ScholarDigital Library
- Daniel R. L. Brown, Adrian Antipa, Matt Campagna, and Rene Struik. 2008. ECOH: The elliptic curve only hash. Submission to NIST (2008).Google Scholar
- J. Lawrence Carter and Mark N. Wegman. 1977. Universal classes of hash functions. In Proceedings of the 9th Annual ACM Symposium on Theory of Computing. ACM, 106--112.Google Scholar
- Moses S. Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing. ACM, 380--388. Google ScholarDigital Library
- Lianhua Chi, Bin Li, and Xingquan Zhu. 2013. Fast Graph Stream Classification Using Discriminative Clique Hashing. Springer, 225--236. Google ScholarCross Ref
- Lianhua Chi, Bin Li, and Xingquan Zhu. 2014. Context-preserving hashing for fast text classification. In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM’14). 100--108. Google ScholarCross Ref
- Lynn Choi, Hyogon Kim, Sunil Kim, and Moon Hae Kim. 2009. Scalable packet classification through rulebase partitioning using the maximum entropy hashing. IEEE/ACM Transactions on Networking (TON) 17, 6 (2009), 1926--1935.Google ScholarDigital Library
- Ondrej Chum, James Philbin, and Andrew Zisserman. 2008. Near duplicate image detection: Min-hash and tf-idf weighting. In BMVC, Vol. 810. 812--815.Google Scholar
- Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th Annual Symposium on Computational Geometry. 253--262. Google ScholarDigital Library
- Bert Den Boer and Antoon Bosselaers. 1991. An attack on the last two rounds of MD4. In Annual Intl. Cryptology Conf. Springer, 194--203.Google Scholar
- E. Knuth Donald. 1999. The art of computer programming. Sorting and Searching 3 (1999), 426--458.Google Scholar
- César Estébanez, Yago Saez, Gustavo Recio, and Pedro Isasi. 2014. Performance of the most common non-cryptographic hash functions. Software: Practice and Experience 44, 6 (2014), 681--698. Google ScholarDigital Library
- Christos Faloutsos and King-Ip Lin. 1995. FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. Vol. 24. ACM. Google ScholarDigital Library
- Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta Informatica 4, 1 (1974), 1--9. Google ScholarDigital Library
- PUB FIPS. 1995. 180-1. secure hash standard. National Institute of Standards and Tech 17 (1995), 45.Google Scholar
- G. Fowler. 1991. Fowler/Noll/Vo (FNV) hash. Retrieved from http://isthe. com/chongo/tech/comp/fnv.Google Scholar
- Haiyan Fu, Xiangwei Kong, and Jiayin Lu. 2013. Large-scale image retrieval based on boosting iterative quantization hashing with query-adaptive reranking. Neurocomputing 122 (2013), 480--489. Google ScholarDigital Library
- Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In VLDB, Vol. 99. 518--529.Google ScholarDigital Library
- Yunchao Gong, Sanjiv Kumar, Vishal Verma, and Svetlana Lazebnik. 2012. Angular quantization-based binary codes for fast similarity search. In Advances in Neural Info Processing Systems. 1196--1204.Google Scholar
- Yunchao Gong and Svetlana Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 817--824. Google ScholarDigital Library
- Shai Halevi and Hugo Krawczyk. 2006. Strengthening Digital Signatures via Randomized Hashing. Springer, 41--59. Google ScholarDigital Library
- Junfeng He, Wei Liu, and Shih-Fu Chang. 2010. Scalable similarity search with optimized kernel hashing. In Proceedings of the 16th SIGKDD International Conference on Knowledge Discovery and Data Mining. 1129--1138. Google ScholarDigital Library
- Kaiming He, Fang Wen, and Jian Sun. 2013. K-means hashing: An affinity-preserving quantization method for learning binary compact codes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2938--2945. Google ScholarDigital Library
- Jae-Pil Heo, Youngwoon Lee, Junfeng He, Shih-Fu Chang, and Sung-Eui Yoon. 2012. Spherical hashing. In Proceedings of the 2012 IEEE Conference Computer Vision and Pattern Recognition (CVPR’12). IEEE, 2957--2964.Google Scholar
- Gisli R. Hjaltason and Hanan Samet. 2003. Properties of embedding methods for similarity searching in metric spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 5 (2003), 530--549. Google ScholarDigital Library
- Paul Hsieh. 2004. Hash functions.Google Scholar
- Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing. 604--613. Google ScholarDigital Library
- Sergey Ioffe. 2010. Improved consistent sampling, weighted minhash and l1 sketching. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 246--255. Google ScholarDigital Library
- Go Irie, Zhenguo Li, Xiao-Ming Wu, and Shih-Fu Chang. 2014. Locally linear hashing for extracting non-linear manifolds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2115--2122. Google ScholarDigital Library
- Tetsu Iwata and Kaoru Kurosawa. 2003. Omac: One-key cbc mac. In Intl. Workshop on Fast Software Encryption. Springer, 129--153. Google ScholarCross Ref
- H. V. Jagadish. 1997. Analysis of the hilbert curve for representing two-dimensional space. Information Processing Letters 62, 1 (1997), 17--22. Google ScholarDigital Library
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2008. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search. Springer, 304--317. Google ScholarDigital Library
- Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128. Google ScholarDigital Library
- Jianqiu Ji, Jianmin Li, Shuicheng Yan, Bo Zhang, and Qi Tian. 2012. Super-bit locality-sensitive hashing. In Advances in Neural Information Processing Systems. 108--116.Google Scholar
- Minho Jin and Chang Dong Yoo. 2009. Quantum hashing for multimedia. IEEE Transactions on Information Forensics and Security 4, 4 (2009), 982--994. Google ScholarDigital Library
- Alexis Joly and Olivier Buisson. 2011. Random maximum margin hashing. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 873--880. Google ScholarDigital Library
- Alexis Joly, Carl Frélicot, and Olivier Buisson. 2004. Feature statistical retrieval applied to content based copy identification. In Proceedings of the International Conference on Image Processing, Vol. 1. IEEE, 681--684. Google ScholarCross Ref
- Burton Kaliski. 1992. The MD2 Message-Digest Algorithm. Technical Report.Google Scholar
- Yoonseop Kang, Saehoon Kim, and Seungjin Choi. 2012. Deep learning to hash with multiple representations. In ICDM. 930--935. Google ScholarDigital Library
- Ravneet Kaur and Amandeep Kaur. 2012. Digital signature. In Proceedings of the 2012 International Conference on Computing Sciences (ICCS’12). IEEE, 295--301. Google ScholarDigital Library
- Saehoon Kim and Seungjin Choi. 2011. Semi-supervised discriminant hashing. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining (ICDM’11). IEEE, 1122--1127. Google ScholarDigital Library
- Donald Ervin Knuth. 1998. The Art of Computer Programming: Sorting and Searching. Vol. 3. Pearson Education.Google Scholar
- Weihao Kong and Wu-Jun Li. 2012. Isotropic hashing. In Advances in Neural Information Processing Systems. 1646--1654.Google Scholar
- Weihao Kong, Wu-Jun Li, and Minyi Guo. 2012. Manhattan hashing for large-scale image retrieval. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 45--54. Google ScholarDigital Library
- Vanja Kontak, Sinisa Srbljic, and Dejan Skvorc. 2012. Hashing scheme for space-efficient detection and localization of changes in large data sets. In Proceedings of the 35th International Convention. 1496--1501.Google Scholar
- Simon Korman and Shai Avidan. 2011. Coherency sensitive hashing. In Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV’11). IEEE, 1607--1614. Google ScholarDigital Library
- Jesse Kornblum. 2006. Identifying almost identical files using context triggered piecewise hashing. Digital Investigation 3 (2006), 91--97. Google ScholarDigital Library
- Hugo Krawczyk, Ran Canetti, and Mihir Bellare. 1997. HMAC: Keyed-hashing for message authentication. Informational (1997).Google Scholar
- Brian Kulis and Trevor Darrell. 2009. Learning to hash with binary reconstructive embeddings. In Advances in Neural Information Processing Systems. 1042--1050.Google Scholar
- Brian Kulis and Kristen Grauman. 2009. Kernelized locality-sensitive hashing for scalable image search. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2130--2137. Google ScholarCross Ref
- Brian Kulis, Prateek Jain, and Kristen Grauman. 2009. Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 12 (2009), 2143--2157. Google ScholarDigital Library
- Bin Li, Xingquan Zhu, Lianhua Chi, and Chengqi Zhang. 2012. Nested subtree hash kernels for large-scale graph classification over streams. In Proceedings of the IEEE International Conference on Data Mining. 399--408. Google ScholarDigital Library
- Ping Li, Arnd Konig, and Wenhao Gui. 2010. B-bit minwise hashing for estimating three-way similarities. In Advances in Neural Information Processing Systems. 1387--1395.Google Scholar
- Ping Li and Arnd Christian Konig. 2011. Theory and applications of b-bit minwise hashing. Communications of the ACM 54, 8 (2011), 101--109. Google ScholarDigital Library
- Ping Li and Christian Konig. 2010. B-bit minwise hashing. In Proceedings of the 19th International Conference on World Wide Web. ACM, 671--680. Google ScholarDigital Library
- Ping Li, Anshumali Shrivastava, Joshua L. Moore, and Arnd C. Konig. 2011. Hashing algorithms for large-scale learning. In Advances in Neural Information Processing Systems. 2672--2680.Google Scholar
- Peng Li, Meng Wang, Jian Cheng, Changsheng Xu, and Hanqing Lu. 2013. Spectral hashing with semantically consistent graph for image indexing. IEEE Transactions on Multimedia 15, 1 (2013), 141--152. Google ScholarDigital Library
- Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and David Suter. 2014. Fast supervised hashing with decision trees for high-dimensional data. In Proceedings of the IEEE Conference on CVPR. 1963--1970. Google ScholarDigital Library
- Guosheng Lin, Chunhua Shen, David Suter, and Anton van den Hengel. 2013. A general two-step approach to learning-based hashing. In Proceedings of the 2013 IEEE International Conference on Computer Vision (ICCV’13). IEEE, 2552--2559. Google ScholarDigital Library
- Yue Lin, Rong Jin, Deng Cai, Shuicheng Yan, and Xuelong Li. 2013. Compressed hashing. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, 446--451. Google ScholarDigital Library
- Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, and Shih-Fu Chang. 2012b. Supervised hashing with kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2074--2081. Google ScholarCross Ref
- Wei Liu, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. Hashing with graphs. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1--8.Google ScholarDigital Library
- Xianglong Liu, Junfeng He, Di Liu, and Bo Lang. 2012a. Compact kernel hashing with multiple features. In Proc. of the 20th ACM Intl. Conf. on Multimedia. ACM, 881--884. Google ScholarDigital Library
- Yi Lu, Balaji Prabhakar, and Flavio Bonomi. 2006. Perfect hashing for network applications. In 2006 IEEE Intl. Symp. on Information Theory. IEEE, 2774--2778. Google ScholarCross Ref
- Hans Peter Luhn. 1953. A new method of recording and searching information. American Documentation 4, 1 (1953), 14--16. Google ScholarCross Ref
- Mark Manasse, Frank McSherry, and Kunal Talwar. 2010. Consistent weighted sampling. Unpublished Technical Report. Retrieved from http://research.microsoft.com/en-us/people/manasse.Google Scholar
- Christopher Martinez and Wei-Ming Lin. 2006. Adaptive hashing for IP address lookup in computer networks. In Proceedings of the 14th IEEE International Conference on Networks, 2006 (ICON’06), Vol. 1. IEEE, 1--6. Google ScholarCross Ref
- Christopher J. Martinez, Wei-Ming Lin, and Parimal Patel. 2005. Optimal XOR hashing for a linearly distributed address lookup in computer networks. In Proceedings of the ACM/IEEE ANCS Symposium. 203--210. Google ScholarDigital Library
- Christopher J. Martinez, Devang K. Pandya, and Wei-Ming Lin. 2009. On designing fast nonuniformly distributed ip address lookup hashing algorithms. IEEE/ACM Transactions on Networking 17, 6 (2009). Google ScholarDigital Library
- Jonathan Masci, Michael M. Bronstein, Alexander M. Bronstein, and Jürgen Schmidhuber. 2014. Multimodal similarity-preserving hashing. IEEE Transactions on Pattern Analalysis and Machine Intelligence 36, 4 (2014), 824--830. Google ScholarDigital Library
- Florian Mendel, Norbert Pramstaller, Christian Rechberger, Marcin Kontak, and Janusz Szmidt. 2008. Cryptanalysis of the GOST hash function. In Proceedings of the Annual International Cryptology Conference. Springer, 162--178. Google ScholarDigital Library
- Alfred J. Menezes, Paul C. Van Oorschot, and Scott A. Vanstone. 1996. Handbook of Applied Cryptography. CRC Press. Google ScholarCross Ref
- Sean Moran, Victor Lavrenko, and Miles Osborne. 2013a. Neighbourhood preserving quantisation for lsh. In Proceedings of the 36th ACM SIGIR Conference on Research and Development in Information Retrieval. 1009--1012. Google ScholarDigital Library
- Sean Moran, Victor Lavrenko, and Miles Osborne. 2013b. Variable bit quantisation for LSH. In ACL (2). 753--758.Google Scholar
- Robert Morris. 1968. Scatter storage techniques. Communications of the ACM 11, 1 (1968), 38--44. Google ScholarDigital Library
- Yadong Mu, Xiangyu Chen, Xianglong Liu, and et al. 2012. Multimedia semantics-aware query-adaptive hashing with bits reconfigurability. International Journal of Multimedia Information Retrieval 1, 1 (2012), 59--70. Google ScholarCross Ref
- Yadong Mu, Jialie Shen, and Shuicheng Yan. 2010. Weakly-supervised hashing in kernel space. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3344--3351. Google ScholarCross Ref
- Mohammad Norouzi and David M. Blei. 2011. Minimal loss hashing for compact binary codes. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 353--360.Google Scholar
- Aude Oliva and Antonio Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42, 3 (2001), 145--175. Google ScholarDigital Library
- Mingdong Ou, Peng Cui, Fei Wang, Jun Wang, Wenwu Zhu, and Shiqiang Yang. 2013. Comparing apples to oranges: A scalable solution with heterogeneous hashing. In Proceedings of the ACM SIGKDD Conference. 230--238. Google ScholarDigital Library
- Arash Partow. 2013. General purpose hash function algorithms. Retrieved from http://www.partow.net/programming/hashfunctions.Google Scholar
- W. Wesley Peterson. 1957. Addressing for random-access storage. IBM Journal of Research and Development 1, 2 (1957), 130--146. Google ScholarDigital Library
- Sébastien Poullot, Olivier Buisson, and Michel Crucianu. 2007. Z-grid-based probabilistic retrieval for scaling up content-based copy detection. In Proceedings of the 6th ACM Conference on Image and Video Retrieval. 348--355. Google ScholarDigital Library
- Maxim Raginsky and Svetlana Lazebnik. 2009. Locality-sensitive binary codes from shift-invariant kernels. In Advances in Neural Information Processing Systems. 1509--1517.Google Scholar
- Mohammad Rastegari, Jonghyun Choi, Shobeir Fakhraei, Daume Hal, and Larry Davis. 2013. Predictable dual-view hashing. In Proceedings of the 30th International Conference on Machine Learning. 1328--1336.Google Scholar
- Dennis M. Ritchie, Brian W. Kernighan, and Michael E. Lesk. 1988. The C Programming Language. Prentice Hall, Englewood Cliffs, NJ.Google Scholar
- Ronald Rivest. 1992. The MD4 Message-Digest Algorithm, RFC 1320. MIT and RSA Data Security, Inc (1992).Google ScholarDigital Library
- Ronald L. Rivest, Benjamin Agre, Daniel V. Bailey, Christopher Crutchfield, Yevgeniy Dodis, Kermin Elliottet Fleming, Asif Khan, Jayant Krishnamurthy, Yuncheng Lin, and Leo Reyzin. 2008. The MD6 hash function--a proposal to NIST for SHA-3. Submission to NIST 2 (2008), 3.Google Scholar
- N. Rogier and Pascal Chauvaud. 1997. MD2 is not secure without the checksum byte. Designs, Codes and Cryptography 12, 3 (1997), 245--251.Google ScholarDigital Library
- Vassil Roussev. 2010. Data fingerprinting with similarity digests. In Proceedings of the IFIP International Conference on Digital Forensics. Springer, 207--226. Google ScholarCross Ref
- Caitlin Sadowski and Greg Levin. 2007. Simhash: Hash-based similarity detection. www.googlecode.com/sun/trunk/paper/SimHashwithBib.pdf (2007).Google Scholar
- Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Semantic hashing. International Journal of Approximate Reasoning 50, 7 (2009), 969--978. Google ScholarDigital Library
- Ruslan Salakhutdinov and Geoffrey E Hinton. 2007. Learning a nonlinear embedding by preserving class neighbourhood structure. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 412--419.Google Scholar
- Robert E. Schapire and Yoram Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 3 (1999), 297--336. Google ScholarDigital Library
- Marc Schneider and Shih-Fu Chang. 1996. A robust content based digital signature for image authentication. In Proceedings of the International Conference on Image Processing, 1996, Vol. 3. IEEE, 227--230. Google ScholarCross Ref
- Gregory Shakhnarovich. 2005. Learning Task-Specific Similarity. Thesis.Google Scholar
- Gregory Shakhnarovich, Trevor Darrell, and Piotr Indyk. 2006. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. The MIT Press (2006).Google ScholarDigital Library
- Gregory Shakhnarovich, Paul Viola, and Trevor Darrell. 2003. Fast pose estimation with parameter-sensitive hashing. In Proceedings of the 9th International Conference on Computer Vision. 750--757. Google ScholarCross Ref
- Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, and Zhenmin Tang. 2013. Inductive hashing on manifolds. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 1562--1569. Google ScholarDigital Library
- Anshumali Shrivastava. 2016. Exact weighted minwise hashing in constant time. arXiv Preprint arXiv:1602.08393 (2016).Google Scholar
- Anshumali Shrivastava and Ping Li. 2014a. Densifying one permutation hashing via rotation for fast near neighbor search. In ICML. 557--565.Google Scholar
- Anshumali Shrivastava and Ping Li. 2014b. Improved densification of one permutation hashing. arXiv Preprint arXiv:1406.4784 (2014).Google Scholar
- Anshumali Shrivastava and Ping Li. 2014c. In defense of minhash over simhash. In AISTATS. 886--894.Google Scholar
- Alan Siegel. 2004. On universal classes of extremely random constant-time hash functions. SIAM Journal on Computing 33, 3 (2004), 505--543. Google ScholarDigital Library
- Nádia F. F. Silva, Eduardo R. Hruschka, and Estevam Rafael Hruschka Jr. 2014. Biocom usp: Tweet sentiment analysis with adaptive boosting ensemble. SemEval 2014 (2014), 123.Google Scholar
- Jingkuan Song. 2015. Effective Hashing for Searching Large-scale Multimedia Databases. Thesis.Google Scholar
- Martin Steinebach, Huajian Liu, and York Yannikos. 2012. Forbild: Efficient robust image hashing. In Proceedings of the SPIE Conference on Media Watermarking, Security and Forensics, Vol. 8303. Google ScholarCross Ref
- Christoph Strecha, Alexander M. Bronstein, Michael M. Bronstein, and Pascal Fua. 2012. LDAHash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 1 (2012), 66--78. Google ScholarDigital Library
- Antonio Torralba, Robert Fergus, and Yair Weiss. 2008. Small codes and large image databases for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008 (CVPR’08). IEEE, 1--8. Google ScholarCross Ref
- Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2010a. Semi-supervised hashing for scalable image retrieval. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3424--3431. Google ScholarCross Ref
- Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2010b. Sequential projection learning for hashing with compact codes. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 1127--1134.Google ScholarDigital Library
- Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2012. Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 12 (2012), 2393--2406. Google ScholarDigital Library
- Xiong Wang, Jason T. L. Wang, King-Ip Lin, Dennis Shasha, Bruce A. Shapiro, and Kaizhong Zhang. 2000. An index structure for data mining and clustering. Knowledge and Information Systems 2, 2 (2000), 161--184. Google ScholarCross Ref
- X.-J. Wang, Lei Zhang, Feng Jing, and Wei-Ying Ma. 2006. Annosearch: Image auto-annotation by search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 1483--1490.Google Scholar
- Roger Weber, Hans-Jörg Schek, and Stephen Blott. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. 98. 194--205.Google ScholarDigital Library
- Mark N. Wegman and J. Lawrence Carter. 1981. New hash functions and their use in authentication and set equality. Journal of Computer and System Sciences 22, 3 (1981), 265--279. Google ScholarCross Ref
- Yair Weiss, Rob Fergus, and Antonio Torralba. 2012. Multidimensional Spectral Hashing. Springer, 340--353. Google ScholarDigital Library
- Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Advances in Neural Information Processing Systems. 1753--1760.Google Scholar
- Chenxia Wu, Jianke Zhu, Deng Cai, Chun Chen, and Jiajun Bu. 2013. Semi-supervised nonlinear hashing using bootstrap sequential projection learning. IEEE Transactions on Knowledge and Data Engineering 25, 6 (2013), 1380--1393. Google ScholarDigital Library
- Hao Xu, Jingdong Wang, Zhu Li, Gang Zeng, Shipeng Li, and Nenghai Yu. 2011b. Complementary hashing for approximate nearest neighbor search. In Proceedings of the IEEE Conference on Computer Vision (ICCV’11). 1631--1638. Google ScholarDigital Library
- Yang Xu, Lei Ma, Zhaobo Liu, and H Jonathan Chao. 2011a. A multi-dimensional progressive perfect hashing for high-speed string matching. In Prof. of ACM/IEEE ANCS Symp. 167--177. Google ScholarDigital Library
- Zhao Xu, Kristian Kersting, and Christian Bauckhage. 2012. Efficient learning for hashing proportional data. In 2012 IEEE 12th International Conference on Data Mining (ICDM’12). 735--744. Google ScholarDigital Library
- Atsushi Yoshioka, Shariful Hasan Shaikot, and Min Sik Kim. 2008. Rule hashing for efficient packet classification in network intrusion detection. In Proc. of Computer Communications and Networks. 1--6. Google ScholarCross Ref
- Xiang Yu, Shaoting Zhang, Bo Liu, Lin Zhong, and Dimitris N Metaxas. 2013. Large scale medical image search via unsupervised PCA hashing. In Proceedings of IEEEE CVPR Workshops. 393--398. Google ScholarDigital Library
- Feng Yue, Bin Li, Ming Yu, and JiaQiang Wang. 2011. Fast palmprint identification using orientation pattern hashing. In Proceedings of the 2011 International Conference on Hand-Based Biometrics (ICHB’11). IEEE, 1--6.Google Scholar
- Feng Yue, Bin Li, Ming Yu, and Jiaqiang Wang. 2013. Hashing based fast palmprint identification for large-scale databases. IEEE Transactions on Information Forensics and Security 8, 5 (2013), 769--778. Google ScholarDigital Library
- Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010a. Laplacian Co-Hashing of Terms and Documents. Springer, 577--580.Google Scholar
- Dell Zhang, Jun Wang, Deng Cai, and Jinsong Lu. 2010b. Self-taught hashing for fast similarity search. In Proceedings of the 33rd ACM SIGIR Conference. 18--25.Google ScholarDigital Library
- Lei Zhang, Yongdong Zhang, Xiaoguang Gu, Jinhui Tang, and Qi Tian. 2014a. Scalable similarity search with topology preserving hashing. IEEE Transactions on Image Processing 23, 7 (2014), 3025--3039. Google ScholarCross Ref
- Lei Zhang, Yongdong Zhang, Dongming Zhang, and Qi Tian. 2013. Distribution-Aware Locality Sensitive Hashing. Springer, 395--406. Google ScholarCross Ref
- Peichao Zhang, Wei Zhang, Wu-Jun Li, and Minyi Guo. 2014b. Supervised hashing with latent factor models. In Proceedings of the 37th ACM SIGIR Conf. ACM, 173--182. Google ScholarDigital Library
- Yi Zhen and Dit-Yan Yeung. 2012. A probabilistic model for multimodal hash function learning. In Proceedings of the 18th SIGKDD Conf. on Knowledge Discovery and Data Mining. ACM, 940--948. Google ScholarDigital Library
- Yuliang Zheng, Josef Pieprzyk, and Jennifer Seberry. 1992. HAVAL: A one-way hashing algorithm with variable length of output. In Proceedings of the International Workshop on the Theory and Application of Cryptographic Technology. Springer, 81--104.Google Scholar
- Xiaofeng Zhu, Zi Huang, Hong Cheng, Jiangtao Cui, and Heng Tao Shen. 2013a. Sparse hashing for fast multimedia search. ACM Transactions on Information Systems (TOIS) 31, 2 (2013), 9.Google ScholarDigital Library
- Xiaofeng Zhu, Zi Huang, Heng Tao Shen, and Xin Zhao. 2013b. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the 21st ACM Intl. Conf. on Multimedia. ACM, 143--152. Google ScholarDigital Library
Index Terms
- Hashing Techniques: A Survey and Taxonomy
Recommendations
Fast locality-sensitive hashing
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningLocality-sensitive hashing (LSH) is a basic primitive in several large-scale data processing applications, including nearest-neighbor search, de-duplication, clustering, etc. In this paper we propose a new and simple method to speed up the widely-used ...
Extendible hashing—a fast access method for dynamic files
Extendible hashing is a new access technique, in which the user is guaranteed no more than two page faults to locate the data associated with a given unique identifier, or key. Unlike conventional hashing, extendible hashing has a dynamic structure that ...
Complementary Projection Hashing
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer VisionRecently, hashing techniques have been widely applied to solve the approximate nearest neighbors search problem in many vision applications. Generally, these hashing approaches generate 2^c buckets, where c is the length of the hash code. A good hashing ...
Comments