skip to main content
10.1145/2600428.2609610acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Latent semantic sparse hashing for cross-modal similarity search

Published:03 July 2014Publication History

ABSTRACT

Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to effectively construct correlation between multi-modal representations which are heterogeneous intrinsically in the process of hash function learning. Analogous to Canonical Correlation Analysis (CCA), most existing cross-modal hash methods embed the heterogeneous data into a joint abstraction space by linear projections. However, these methods fail to bridge the semantic gap more effectively, and capture high-level latent semantic information which has been proved that it can lead to better performance for image retrieval. To address these challenges, in this paper, we propose a novel Latent Semantic Sparse Hashing (LSSH) to perform cross-modal similarity search by employing Sparse Coding and Matrix Factorization. In particular, LSSH uses Sparse Coding to capture the salient structures of images, and Matrix Factorization to learn the latent concepts from text. Then the learned latent semantic features are mapped to a joint abstraction space. Moreover, an iterative strategy is applied to derive optimal solutions efficiently, and it helps LSSH to explore the correlation between multi-modal representations efficiently and automatically. Finally, the unified hashcodes are generated through the high level abstraction space by quantization. Extensive experiments on three different datasets highlight the advantage of our method under cross-modal scenarios and show that LSSH significantly outperforms several state-of-the-art methods.

References

  1. A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In CVPR, pages 3594--3601. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. TIP, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In CVPR, pages 817--824. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. He, F. Wen, and J. Sun. K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In CVPR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Hotelling. Relations between two sets of variates. Biometrika, 1936.Google ScholarGoogle Scholar
  10. P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STC. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Kim, Y. Kang, and S. Choi. Sequential spectral learning to hash with multiple representations. In ECCV, pages 538--551. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Kong and W.-J. Li. Double-bit quantization for hashing. In AAAI, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalable image search. In ICCV, pages 2130--2137. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Kumar and R. Udupa. Learning hash functions for cross-view similarity search. In IJCAI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparse coding algorithms. In NIPS, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In CVPR, pages 2074--2081. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Liu, J. Wang, S. Kumar, and S. F. Chang. Hashing with graphs. In ICML, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Lu and Y. Peng. Latent semantic learning by efficient sparse coding with hypergraph regularization. In AAAI, 2011.Google ScholarGoogle Scholar
  20. J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. Image Processing, IEEE Transactions on, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Oliva and T.Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. IJCV, 42:145--175, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research, 1997.Google ScholarGoogle Scholar
  23. N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In ACM Multimedia. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Rasiwasia, P. J. Moreno, and N. Vasconcelos. Bridging the gap: Query by semantic example. IEEE Transactions on Multimedia, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Salakhutdinov and G. Hinton. Semantic hashing. IJAR, 50(7):969--978, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose estimation with parameter-sensitive hashing. In ICCV. IEEE, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In ICMD. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. H. H. L. Z. L. T. Chua, J. Tang and Y. Zheng. Nus-wide: A real-world web image database from national university of singapore. In CIVR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for scalable image retrieval. In CVPR, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  30. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In CVPR. IEEE, 2009.Google ScholarGoogle Scholar
  32. M. Yang, L. Zhang, J. Yang, and D. Zhang. Robust sparse coding for face recognition. In CVPR, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Zhang, F. Wang, and L. Si. Composite hashing with multiple information sources. In SIGIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Zhang, J. Wang, D. Cai, and J. Lu. Self-taught hashing for fast similarity search. In SIGIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Zhen and D. Yang. A probabilistic model for multimodal hash function learning. In SIGKDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. In NIPS, pages 1385--1393, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Latent semantic sparse hashing for cross-modal similarity search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
        July 2014
        1330 pages
        ISBN:9781450322577
        DOI:10.1145/2600428

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 July 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader