ABSTRACT
Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to effectively construct correlation between multi-modal representations which are heterogeneous intrinsically in the process of hash function learning. Analogous to Canonical Correlation Analysis (CCA), most existing cross-modal hash methods embed the heterogeneous data into a joint abstraction space by linear projections. However, these methods fail to bridge the semantic gap more effectively, and capture high-level latent semantic information which has been proved that it can lead to better performance for image retrieval. To address these challenges, in this paper, we propose a novel Latent Semantic Sparse Hashing (LSSH) to perform cross-modal similarity search by employing Sparse Coding and Matrix Factorization. In particular, LSSH uses Sparse Coding to capture the salient structures of images, and Matrix Factorization to learn the latent concepts from text. Then the learned latent semantic features are mapped to a joint abstraction space. Moreover, an iterative strategy is applied to derive optimal solutions efficiently, and it helps LSSH to explore the correlation between multi-modal representations efficiently and automatically. Finally, the unified hashcodes are generated through the high level abstraction space by quantization. Extensive experiments on three different datasets highlight the advantage of our method under cross-modal scenarios and show that LSSH significantly outperforms several state-of-the-art methods.
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS'06, 2006. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601. IEEE, 2010.Google ScholarCross Ref
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarCross Ref
- M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. TIP, 2006. Google ScholarDigital Library
- A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarDigital Library
- Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824. IEEE, 2011. Google ScholarDigital Library
- K. He, F. Wen, and J. Sun. K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In CVPR, 2013. Google ScholarDigital Library
- H. Hotelling. Relations between two sets of variates. Biometrika, 1936.Google Scholar
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STC. ACM, 1998. Google ScholarDigital Library
- S. Kim, Y. Kang, and S. Choi. Sequential spectral learning to hash with multiple representations. In ECCV, pages 538--551. Springer, 2012. Google ScholarDigital Library
- W. Kong and W.-J. Li. Double-bit quantization for hashing. In AAAI, 2012.Google ScholarDigital Library
- B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, pages 2130--2137. IEEE, 2009.Google ScholarCross Ref
- S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, 2011. Google ScholarDigital Library
- H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparse coding algorithms. In NIPS, 2006.Google ScholarDigital Library
- W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR, pages 2074--2081. IEEE, 2012. Google ScholarDigital Library
- W. Liu, J. Wang, S. Kumar, and S. F. Chang. Hashing with graphs. In ICML, 2011.Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
- Z. Lu and Y. Peng. Latent semantic learning by efficient sparse coding with hypergraph regularization. In AAAI, 2011.Google Scholar
- J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. Image Processing, IEEE Transactions on, 2008. Google ScholarDigital Library
- A. Oliva and T.Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42:145--175, 2001. Google ScholarDigital Library
- B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, 1997.Google Scholar
- N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia. ACM, 2010. Google ScholarDigital Library
- N. Rasiwasia, P. J. Moreno, and N. Vasconcelos. Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 2007. Google ScholarDigital Library
- R. Salakhutdinov and G. Hinton. Semantic hashing. IJAR, 50(7):969--978, 2009. Google ScholarDigital Library
- G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In ICCV. IEEE, 2003. Google ScholarDigital Library
- J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In ICMD. ACM, 2013. Google ScholarDigital Library
- R. H. H. L. Z. L. T. Chua, J. Tang and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In CIVR, 2009. Google ScholarDigital Library
- J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, 2010.Google ScholarCross Ref
- Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 2008.Google ScholarDigital Library
- J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR. IEEE, 2009.Google Scholar
- M. Yang, L. Zhang, J. Yang, and D. Zhang. Robust sparse coding for face recognition. In CVPR, 2011.Google ScholarDigital Library
- D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, 2011. Google ScholarDigital Library
- D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, 2010. Google ScholarDigital Library
- Y. Zhen and D. Yang. A probabilistic model for multimodal hash function learning. In SIGKDD, 2012. Google ScholarDigital Library
- Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages 1385--1393, 2012.Google ScholarDigital Library
Index Terms
- Latent semantic sparse hashing for cross-modal similarity search
Recommendations
Correlation Autoencoder Hashing for Supervised Cross-Modal Search
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia RetrievalDue to its storage and query efficiency, hashing has been widely applied to approximate nearest neighbor search from large-scale datasets. While there is increasing interest in cross-modal hashing which facilitates cross-media retrieval by embedding ...
An efficient dual semantic preserving hashing for cross-modal retrieval
AbstractHashing methods have recently received widespread attention due to their flexibility and effectiveness for cross-modal retrieval tasks. However, most existing cross-modal hashing methods have some challenging problems, in particular, ...
Latent semantic-enhanced discrete hashing for cross-modal retrieval
AbstractHashing methods have been proposed for the cross-modal retrieval tasks due to their flexibility and effectiveness. The main idea of cross-modal hashing is to embed heterogeneous multimedia data into common Hamming space. How to effectively exploit ...
Comments