skip to main content
10.1145/2483574.2483585acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Effective hashing for large-scale multimedia search

Published:22 June 2013Publication History

ABSTRACT

With the rapid development of the Internet and multimedia technologies over the last decade, a huge amount of data has become available, from text corpus, to collections of online images and videos. Cheap storage cost and modern database technologies have made it possible to accumulate large-scale datasets. However, the ever-growing sizes of the datasets make it harder to search useful information from such data. A fundamental computational primitive for dealing with massive multimedia datasets is the similarity search problem. Multimedia similarity search aims to preprocess a database so that given a query object, one can quickly find its similar objects in the database. Searching similar objects from a large dataset in high-dimensional spaces is at the heart of many multimedia applications, such as near-duplicate retrieval, multimedia tagging, recommendation, and so on. Driven by its significance, lots of efforts have been made on this topic. The goal of my research is to design efficient hashing methods for large-scale multimedia search. In this paper, we first present the general framework for multimedia similarity search and discuss the latest improvements and progresses in the field. Then we describe the contributions we have made to effectively and efficiently search similar multimedia objects from large-scale databases. Finally, we discuss the future work and draw a conclusion.

References

  1. A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322--373, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Bozkaya and Z. M. Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In SIGMOD, pages 357--368, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: binary robust independent elementary features. In ECCV, pages 778--792, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Cappelli. Fast and accurate fingerprint indexing based on ridge orientation and frequency. TSMCB, 41(6):1511--1521, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Datar and P. Indyk. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Haitsma, A. Kalker, C. Baggen, and J. Oostveen. Generating and matching hashes of multimedia content, Apr. 5 2011. US Patent 7,921,296.Google ScholarGoogle Scholar
  15. J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon. Spherical hashing. In CVPR, pages 2957--2964, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Huang, H. Shen, J. Liu, and X. Zhou. Effective data co-reduction for multimedia similarity search. In SIGMOD, pages 1021--1032, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Z. 0003. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. TODS, 30(2):364--397, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, pages 1--8, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  19. W. Kong and W.-J. Li. Isotropic hashing. In NIPS, pages 1655--1663, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale image retrieval. In SIGIR, pages 45--54, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. NIPS, 22:1042--1050, 2009.Google ScholarGoogle Scholar
  22. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. T. Shen, B. C. Ooi, and X. Zhou. Towards effective indexing for very large video sequence database. In SIGMOD, pages 730--741, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM Multimedia, pages 423--432, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogenous data sources. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Strecha, A. M. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. TPAMI, 34(1):66--78, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. TODS, 35(3), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  30. R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Zhang, D. Agrawal, G. Chen, and A. K. H. Tung. Hashfile: An efficient index structure for multimedia data. In ICDE, pages 1103--1114, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, pages 18--25, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Zhang, L. Wang, and W. Lin. Generalized biased discriminant analysis for content-based image retrieval. TSMCB, 42(1):282--290, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Effective hashing for large-scale multimedia search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposium
        June 2013
        78 pages
        ISBN:9781450321556
        DOI:10.1145/2483574
        • Program Chairs:
        • Lei Chen,
        • Xin Luna Dong

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD'13 PhD Symposium Paper Acceptance Rate12of26submissions,46%Overall Acceptance Rate40of60submissions,67%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader