ABSTRACT
With the rapid development of the Internet and multimedia technologies over the last decade, a huge amount of data has become available, from text corpus, to collections of online images and videos. Cheap storage cost and modern database technologies have made it possible to accumulate large-scale datasets. However, the ever-growing sizes of the datasets make it harder to search useful information from such data. A fundamental computational primitive for dealing with massive multimedia datasets is the similarity search problem. Multimedia similarity search aims to preprocess a database so that given a query object, one can quickly find its similar objects in the database. Searching similar objects from a large dataset in high-dimensional spaces is at the heart of many multimedia applications, such as near-duplicate retrieval, multimedia tagging, recommendation, and so on. Driven by its significance, lots of efforts have been made on this topic. The goal of my research is to design efficient hashing methods for large-scale multimedia search. In this paper, we first present the general framework for multimedia similarity search and discuss the latest improvements and progresses in the field. Then we describe the contributions we have made to effectively and efficiently search similar multimedia objects from large-scale databases. Finally, we discuss the future work and draw a conclusion.
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468, 2006. Google ScholarDigital Library
- C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322--373, 2001. Google ScholarDigital Library
- T. Bozkaya and Z. M. Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In SIGMOD, pages 357--368, 1997. Google ScholarDigital Library
- M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.Google ScholarCross Ref
- M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: binary robust independent elementary features. In ECCV, pages 778--792, 2010. Google ScholarDigital Library
- R. Cappelli. Fast and accurate fingerprint indexing based on ridge orientation and frequency. TSMCB, 41(6):1511--1521, 2011. Google ScholarDigital Library
- P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004. Google ScholarDigital Library
- M. Datar and P. Indyk. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004. Google ScholarDigital Library
- R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2), 2008. Google ScholarDigital Library
- J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552. ACM, 2012. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarDigital Library
- Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824, 2011. Google ScholarDigital Library
- J. Haitsma, A. Kalker, C. Baggen, and J. Oostveen. Generating and matching hashes of multimedia content, Apr. 5 2011. US Patent 7,921,296.Google Scholar
- J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon. Spherical hashing. In CVPR, pages 2957--2964, 2012. Google ScholarDigital Library
- Z. Huang, H. Shen, J. Liu, and X. Zhou. Effective data co-reduction for multimedia similarity search. In SIGMOD, pages 1021--1032, 2011. Google ScholarDigital Library
- H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Z. 0003. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. TODS, 30(2):364--397, 2005. Google ScholarDigital Library
- P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
- W. Kong and W.-J. Li. Isotropic hashing. In NIPS, pages 1655--1663, 2012.Google ScholarDigital Library
- W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale image retrieval. In SIGIR, pages 45--54, 2012. Google ScholarDigital Library
- B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. NIPS, 22:1042--1050, 2009.Google Scholar
- Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007. Google ScholarDigital Library
- R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009. Google ScholarDigital Library
- H. T. Shen, B. C. Ooi, and X. Zhou. Towards effective indexing for very large video sequence database. In SIGMOD, pages 730--741, 2005. Google ScholarDigital Library
- J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM Multimedia, pages 423--432, 2011. Google ScholarDigital Library
- J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogenous data sources. In SIGMOD, 2013. Google ScholarDigital Library
- C. Strecha, A. M. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. TPAMI, 34(1):66--78, 2012. Google ScholarDigital Library
- Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. TODS, 35(3), 2010. Google ScholarDigital Library
- J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.Google ScholarCross Ref
- R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998. Google ScholarDigital Library
- Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.Google ScholarDigital Library
- D. Zhang, D. Agrawal, G. Chen, and A. K. H. Tung. Hashfile: An efficient index structure for multimedia data. In ICDE, pages 1103--1114, 2011. Google ScholarDigital Library
- D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, pages 18--25, 2010. Google ScholarDigital Library
- L. Zhang, L. Wang, and W. Lin. Generalized biased discriminant analysis for content-based image retrieval. TSMCB, 42(1):282--290, 2012. Google ScholarDigital Library
Index Terms
- Effective hashing for large-scale multimedia search
Recommendations
Inter-media hashing for large-scale retrieval from heterogeneous data sources
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataIn this paper, we present a new multimedia retrieval paradigm to innovate large-scale search of heterogenous multimedia data. It is able to return results of different media types from heterogeneous data sources, e.g., using a query image to retrieve ...
Semi-Supervised Hashing for Large-Scale Search
Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions ...
Neighborhood Discriminant Hashing for Large-Scale Image Retrieval
With the proliferation of large-scale community-contributed images, hashing-based approximate nearest neighbor search in huge databases has aroused considerable interest from the fields of computer vision and multimedia in recent years because of its ...
Comments