research-article

Latent semantic sparse hashing for cross-modal similarity search

Authors:
Jile Zhou

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Guiguang Ding

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yuchen Guo

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalJuly 2014Pages 415–424https://doi.org/10.1145/2600428.2609610

Published:03 July 2014Publication History

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Pages 415–424

ABSTRACT

Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to effectively construct correlation between multi-modal representations which are heterogeneous intrinsically in the process of hash function learning. Analogous to Canonical Correlation Analysis (CCA), most existing cross-modal hash methods embed the heterogeneous data into a joint abstraction space by linear projections. However, these methods fail to bridge the semantic gap more effectively, and capture high-level latent semantic information which has been proved that it can lead to better performance for image retrieval. To address these challenges, in this paper, we propose a novel Latent Semantic Sparse Hashing (LSSH) to perform cross-modal similarity search by employing Sparse Coding and Matrix Factorization. In particular, LSSH uses Sparse Coding to capture the salient structures of images, and Matrix Factorization to learn the latent concepts from text. Then the learned latent semantic features are mapped to a joint abstraction space. Moreover, an iterative strategy is applied to derive optimal solutions efficiently, and it helps LSSH to explore the correlation between multi-modal representations efficiently and automatically. Finally, the unified hashcodes are generated through the high level abstraction space by quantization. Extensive experiments on three different datasets highlight the advantage of our method under cross-modal scenarios and show that LSSH significantly outperforms several state-of-the-art methods.

References

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS'06, 2006. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601. IEEE, 2010.Google ScholarCross Ref
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarCross Ref
M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. TIP, 2006. Google ScholarDigital Library
A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarDigital Library
Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824. IEEE, 2011. Google ScholarDigital Library
K. He, F. Wen, and J. Sun. K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In CVPR, 2013. Google ScholarDigital Library
H. Hotelling. Relations between two sets of variates. Biometrika, 1936.Google Scholar
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STC. ACM, 1998. Google ScholarDigital Library
S. Kim, Y. Kang, and S. Choi. Sequential spectral learning to hash with multiple representations. In ECCV, pages 538--551. Springer, 2012. Google ScholarDigital Library
W. Kong and W.-J. Li. Double-bit quantization for hashing. In AAAI, 2012.Google ScholarDigital Library
B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, pages 2130--2137. IEEE, 2009.Google ScholarCross Ref
S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, 2011. Google ScholarDigital Library
H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparse coding algorithms. In NIPS, 2006.Google ScholarDigital Library
W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR, pages 2074--2081. IEEE, 2012. Google ScholarDigital Library
W. Liu, J. Wang, S. Kumar, and S. F. Chang. Hashing with graphs. In ICML, 2011.Google ScholarDigital Library
D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
Z. Lu and Y. Peng. Latent semantic learning by efficient sparse coding with hypergraph regularization. In AAAI, 2011.Google Scholar
J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. Image Processing, IEEE Transactions on, 2008. Google ScholarDigital Library
A. Oliva and T.Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42:145--175, 2001. Google ScholarDigital Library
B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, 1997.Google Scholar
N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia. ACM, 2010. Google ScholarDigital Library
N. Rasiwasia, P. J. Moreno, and N. Vasconcelos. Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 2007. Google ScholarDigital Library
R. Salakhutdinov and G. Hinton. Semantic hashing. IJAR, 50(7):969--978, 2009. Google ScholarDigital Library
G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In ICCV. IEEE, 2003. Google ScholarDigital Library
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In ICMD. ACM, 2013. Google ScholarDigital Library
R. H. H. L. Z. L. T. Chua, J. Tang and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In CIVR, 2009. Google ScholarDigital Library
J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, 2010.Google ScholarCross Ref
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 2008.Google ScholarDigital Library
J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR. IEEE, 2009.Google Scholar
M. Yang, L. Zhang, J. Yang, and D. Zhang. Robust sparse coding for face recognition. In CVPR, 2011.Google ScholarDigital Library
D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, 2011. Google ScholarDigital Library
D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, 2010. Google ScholarDigital Library
Y. Zhen and D. Yang. A probabilistic model for multimodal hash function learning. In SIGKDD, 2012. Google ScholarDigital Library
Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages 1385--1393, 2012.Google ScholarDigital Library

Index Terms

Latent semantic sparse hashing for cross-modal similarity search
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Correlation Autoencoder Hashing for Supervised Cross-Modal Search
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Due to its storage and query efficiency, hashing has been widely applied to approximate nearest neighbor search from large-scale datasets. While there is increasing interest in cross-modal hashing which facilitates cross-media retrieval by embedding ...
Read More
An efficient dual semantic preserving hashing for cross-modal retrieval
Abstract
Hashing methods have recently received widespread attention due to their flexibility and effectiveness for cross-modal retrieval tasks. However, most existing cross-modal hashing methods have some challenging problems, in particular, ...
Read More
Latent semantic-enhanced discrete hashing for cross-modal retrieval
Abstract
Hashing methods have been proposed for the cross-modal retrieval tasks due to their flexibility and effectiveness. The main idea of cross-modal hashing is to embed heterogeneous multimedia data into common Hamming space. How to effectively exploit ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
July 2014
1330 pages
ISBN:9781450322577
DOI:10.1145/2600428
General Chairs:
Shlomo Geva
Queensland University of Technology
,
Andrew Trotman
University of Dunedin
,
Program Chairs:
Peter Bruza
Queensland University of Technology
,
Charles L.A. Clarke
University of Waterloo
,
Kal Järvelin
University of Tampere
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 July 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
correlation
cross-modal retrieval
hashing
heterogeneous data sources
matrix factorization
sparse coding
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 319
  Total Citations
  View Citations
- 1,327
  Total Downloads
- Downloads (Last 12 months)50
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Latent semantic sparse hashing for cross-modal similarity search

SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Correlation Autoencoder Hashing for Supervised Cross-Modal Search

An efficient dual semantic preserving hashing for cross-modal retrieval

Latent semantic-enhanced discrete hashing for cross-modal retrieval