research-article

Effective hashing for large-scale multimedia search

Author:
Jingkuan Song

The University of Queensland, Brisbane, Australia

The University of Queensland, Brisbane, Australia
View Profile

SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposiumJune 2013Pages 55–60https://doi.org/10.1145/2483574.2483585

Published:22 June 2013Publication History

SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposium

Pages 55–60

ABSTRACT

With the rapid development of the Internet and multimedia technologies over the last decade, a huge amount of data has become available, from text corpus, to collections of online images and videos. Cheap storage cost and modern database technologies have made it possible to accumulate large-scale datasets. However, the ever-growing sizes of the datasets make it harder to search useful information from such data. A fundamental computational primitive for dealing with massive multimedia datasets is the similarity search problem. Multimedia similarity search aims to preprocess a database so that given a query object, one can quickly find its similar objects in the database. Searching similar objects from a large dataset in high-dimensional spaces is at the heart of many multimedia applications, such as near-duplicate retrieval, multimedia tagging, recommendation, and so on. Driven by its significance, lots of efforts have been made on this topic. The goal of my research is to design efficient hashing methods for large-scale multimedia search. In this paper, we first present the general framework for multimedia similarity search and discuss the latest improvements and progresses in the field. Then we describe the contributions we have made to effectively and efficiently search similar multimedia objects from large-scale databases. Finally, we discuss the future work and draw a conclusion.

References

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459--468, 2006. Google ScholarDigital Library
C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322--373, 2001. Google ScholarDigital Library
T. Bozkaya and Z. M. Özsoyoglu. Distance-based indexing for high-dimensional metric spaces. In SIGMOD, pages 357--368, 1997. Google ScholarDigital Library
M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601, 2010.Google ScholarCross Ref
M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: binary robust independent elementary features. In ECCV, pages 778--792, 2010. Google ScholarDigital Library
R. Cappelli. Fast and accurate fingerprint indexing based on ridge orientation and frequency. TSMCB, 41(6):1511--1521, 2011. Google ScholarDigital Library
P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In VLDB, pages 426--435, 1997. Google ScholarDigital Library
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004. Google ScholarDigital Library
M. Datar and P. Indyk. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, pages 253--262, 2004. Google ScholarDigital Library
R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2), 2008. Google ScholarDigital Library
J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552. ACM, 2012. Google ScholarDigital Library
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarDigital Library
Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824, 2011. Google ScholarDigital Library
J. Haitsma, A. Kalker, C. Baggen, and J. Oostveen. Generating and matching hashes of multimedia content, Apr. 5 2011. US Patent 7,921,296.Google Scholar
J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon. Spherical hashing. In CVPR, pages 2957--2964, 2012. Google ScholarDigital Library
Z. Huang, H. Shen, J. Liu, and X. Zhou. Effective data co-reduction for multimedia similarity search. In SIGMOD, pages 1021--1032, 2011. Google ScholarDigital Library
H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Z. 0003. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. TODS, 30(2):364--397, 2005. Google ScholarDigital Library
P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, pages 1--8, 2008.Google ScholarCross Ref
W. Kong and W.-J. Li. Isotropic hashing. In NIPS, pages 1655--1663, 2012.Google ScholarDigital Library
W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale image retrieval. In SIGIR, pages 45--54, 2012. Google ScholarDigital Library
B. Kulis and T. Darrell. Learning to hash with binary reconstructive embeddings. NIPS, 22:1042--1050, 2009.Google Scholar
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007. Google ScholarDigital Library
R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx. Reasoning, 50(7):969--978, 2009. Google ScholarDigital Library
H. T. Shen, B. C. Ooi, and X. Zhou. Towards effective indexing for very large video sequence database. In SIGMOD, pages 730--741, 2005. Google ScholarDigital Library
J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM Multimedia, pages 423--432, 2011. Google ScholarDigital Library
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogenous data sources. In SIGMOD, 2013. Google ScholarDigital Library
C. Strecha, A. M. Bronstein, M. M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. TPAMI, 34(1):66--78, 2012. Google ScholarDigital Library
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. TODS, 35(3), 2010. Google ScholarDigital Library
J. Wang, O. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, pages 3424--3431, 2010.Google ScholarCross Ref
R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, pages 194--205, 1998. Google ScholarDigital Library
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.Google ScholarDigital Library
D. Zhang, D. Agrawal, G. Chen, and A. K. H. Tung. Hashfile: An efficient index structure for multimedia data. In ICDE, pages 1103--1114, 2011. Google ScholarDigital Library
D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, pages 18--25, 2010. Google ScholarDigital Library
L. Zhang, L. Wang, and W. Lin. Generalized biased discriminant analysis for content-based image retrieval. TSMCB, 42(1):282--290, 2012. Google ScholarDigital Library

Index Terms

Effective hashing for large-scale multimedia search
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Inter-media hashing for large-scale retrieval from heterogeneous data sources
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

In this paper, we present a new multimedia retrieval paradigm to innovate large-scale search of heterogenous multimedia data. It is able to return results of different media types from heterogeneous data sources, e.g., using a query image to retrieve ...
Read More
Semi-Supervised Hashing for Large-Scale Search

Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions ...
Read More
Neighborhood Discriminant Hashing for Large-Scale Image Retrieval
With the proliferation of large-scale community-contributed images, hashing-based approximate nearest neighbor search in huge databases has aroused considerable interest from the fields of computer vision and multimedia in recent years because of its ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposium
June 2013
78 pages
ISBN:9781450321556
DOI:10.1145/2483574
Program Chairs:
Lei Chen
Hong Kong University of Science and Technology, China
,
Xin Luna Dong
Google Inc., USA
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binary codes
hashing
indexing
multimedia retrieval
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD'13 PhD Symposium Paper Acceptance Rate12of26submissions,46%Overall Acceptance Rate40of60submissions,67%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 453
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effective hashing for large-scale multimedia search

SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inter-media hashing for large-scale retrieval from heterogeneous data sources

Semi-Supervised Hashing for Large-Scale Search

Neighborhood Discriminant Hashing for Large-Scale Image Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Effective hashing for large-scale multimedia search

SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposium

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inter-media hashing for large-scale retrieval from heterogeneous data sources

Semi-Supervised Hashing for Large-Scale Search

Neighborhood Discriminant Hashing for Large-Scale Image Retrieval

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media