skip to main content
research-article

Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods

Published:17 February 2015Publication History
Skip Abstract Section

Abstract

Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content-based multimedia information retrieval. We focus on graph-based methods, which have proven to provide state-of-the-art performances. We particularly examine two such methods: cross-media similarities and random-walk-based scores. From a theoretical viewpoint, we propose a unifying graph-based framework, which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph-based technique for the combination of visual and textual information. We compare cross-media and random-walk-based results using three different real-world datasets. From a practical standpoint, our extended empirical analyses allow us to provide insights and guidelines about the use of graph-based methods for multimodal information fusion in content-based multimedia information retrieval.

References

  1. J. Ah-Pine, M. Bressan, S. Clinchant, G. Csurka, Y. Hoppenot, and J.M. Renders. 2009. Crossing textual and visual content in different application scenarios. Multimedia Tools and Applications 42, 1 (2009), 31--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Ah-Pine, C. Cifarelli, S. Clinchant, G. Csurka, and J.M. Renders. 2008. XRCE’s participation to ImageCLEF 2008. In Working Notes of CLEF 2008.Google ScholarGoogle Scholar
  3. J. Ah-Pine, S. Clinchant, and G. Csurka. 2010. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation. Lecture Notes in Computer Science. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Ah-Pine, S. Clinchant, G. Csurka, and Y. Liu. 2009. XRCE’s participation to ImageCLEF 2009. In Working Notes of the 2009 CLEF Workshop.Google ScholarGoogle Scholar
  5. J. Ah-Pine, S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2010. Leveraging image, text and cross-media similarities for diversity-focused multimedia retrieval. In ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE, H. MÜller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). Retrieval. Springer, Chapter 3.4.Google ScholarGoogle Scholar
  6. A. L. Berger and J. D. Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 222--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin and L. Page. 1998a. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1--7 (April 1998), 107--117. DOI:http://dx.doi.org/10.1016/S0169-7552(98)00110-X Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Brin and L. Page. 1998b. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1998), 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Bruno, N. Moënne-Loccoz, and S. Marchand-Maillet. 2008. Design of multimodal dissimilarity spaces for retrieval of video documents. PAMI 30, 9 (2008), 1520--1533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. C. Caicedo, J. G. Moreno, E. A. Niño, and F. A. González. 2010. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Multimedia Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Clinchant, G. Csurka, J. Ah-Pine, G. Jacquet, F. Perronnin, J. Sánchez, and K. Minoukadeh. 2010. XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops).Google ScholarGoogle Scholar
  12. S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2007. XRCE’s participation to ImagEval. In ImageEval Workshop at CVIR.Google ScholarGoogle Scholar
  13. S. Clinchant and E. Gaussier. 2010. Information-based models for ad hoc IR. In SIGIR. ACM, 8. DOI:http://dx.doi.org/10.1145/1835449.1835490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Clinchant, C. Goutte, and É. Gaussier. 2006. Lexical entailment for information retrieval. In Advances in Information Retrieval, 28th European Conference on IR Research (ECIR’06). 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Clinchant, J. M. Renders, and G. Csurka. 2007. XRCE’s participation to ImageCLEF. In CLEF Working Notes.Google ScholarGoogle Scholar
  16. S. Clinchant, J.-M. Renders, and G. Csurka. 2008. Trans--media pseudo--relevance feedback methods in multimedia retrieval. In Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, Vol. 552. Springer, 569--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Craswell and M. Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, USA, 239--246. DOI:http://dx.doi.org/10.1145/1277741.1277784 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Csurka and S. Clinchant. 2012. An empirical study of fusion operators for multimodal image retrieval. In CBMI.Google ScholarGoogle Scholar
  19. G. Csurka, S. Clinchant, and A. Popescu. 2011. XRCE’s participation at Wikipedia retrieval of ImageCLEF 2011. In CLEF (Notebook Papers/Labs/Workshop).Google ScholarGoogle Scholar
  20. G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning for Computer Vision.Google ScholarGoogle Scholar
  21. H. J. Escalante, C. A. Hernández, L. E. Sucar, and M. Montes y Gómez. 2008. Late fusion of heterogeneous methods for multimedia image retrieval. In MIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Franceschet. 2011. PageRank: Standing on the shoulders of giants. Communications of the ACM 54, 6 (2011), 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing, 22, 1 (2013), 363--376. DOI:http://dx.doi.org/10.1109/TIP.2012.2202676 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Grubinger, P. D. Clough, H. Müller, and T. Deselaers. 2006. The IAPR benchmark: A new evaluation resource for visual information systems. In International Conference on Language Resources and Evaluation, Genoa, Italy.Google ScholarGoogle Scholar
  25. W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2006. Video search reranking via information bottleneck principle. In ACM Multimedia. 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007a. Reranking methods for visual search. IEEE MultiMedia 14, 3 (2007), 14--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007b. Video search reranking through random walk over document-level context graph. In ACM Multimedia. 971--980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217--240. DOI:http://dx.doi.org/10.1016/0020-0271(71)90051-9Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’03). ACM, New York, NY, 119--126. DOI:http://dx.doi.org/10.1145/860435.860459 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Karimzadehgan and C.-X. Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In SIGIR, F. Crestani, S. Marchand-Maillet, H-H. Chen, E.N. Efthimiadis, and J. Savoy (Eds.). ACM, 323--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5 (Sept. 1999), 604--632. DOI:http://dx.doi.org/10.1145/324133.324140 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Krapac, M. Allan, J. Verbeek, and F. Jurie. 2010. Improving web-image search results using query-relative classifiers. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR’’10). IEEE Computer Society, San Francisco, CA, 1094--1101. DOI:http://dx.doi.org/10.1109/CVPR.2010.5540092Google ScholarGoogle Scholar
  33. A. N. Langville and C. D. Meyer. 2005. A survey of eigenvector methods for web information retrieval. SIAM Reviews 47, 1 (Jan. 2005), 135--161. DOI:http://dx.doi.org/10.1137/S0036144503424786 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. Lavrenko, R. Manmatha, and J. Jeon. 2003. A model for learning the semantics of pictures. In NIPS.Google ScholarGoogle Scholar
  35. S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T.-Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (March 2009), 225--331. DOI:http://dx.doi.org/10.1561/1500000016 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Ma, J. Zhu, M. R. Lyu, and I. King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Magalhães and S. M. RÜger. 2010. An information-theoretic framework for semantic-multimedia retrieval. ACM Transactions on Information and Systems 28, 4 (2010), 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. Maillot, J.-P. Chevallet, and J.-H. Lim. 2006. Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval. In CLEF, C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, M. de Rijke, and M. Stempfhuber (Eds.). Lecture Notes in Computer Science, Vol. 4730. Springer, 735--738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Mori, H. Takahashi, and R. Oka. 1999. Image--to--word transformation based on dividing and vector quantizing images with words. In Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99).Google ScholarGoogle Scholar
  41. N. Morioka and J. Wang. 2011. Robust visual reranking via sparsity and ranking constraints. In ACM Multimedia. 533--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. Müller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). 2010. ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Natsev, A. Haubold, J. Tešić, L. Xie, and R. Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA’07). ACM, New York, NY, 991--1000. DOI:http://dx.doi.org/10.1145/1291233.1291448 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD. 653--658. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. F. Perronnin and C. Dance. 2007. Fisher Kernels on visual vocabularies for image categorization. In CVPR. IEEE.Google ScholarGoogle Scholar
  46. F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher Kernel for large-scale image classification. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. Popescu, T. Tsikrika, and J. Kludas. 2010. Overview of the Wikipedia retrieval task at ImageCLEF 2010. In Working Notes of the 11th Workshop of the Cross-Language Evaluation Forum. CLEF-campaign. http://clef2010.org/resources/proceedings/clef2010labs_submission_124.pdf.Google ScholarGoogle Scholar
  48. N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In ACM Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Rodriguez-Vaamonde, L. Torresani, and A. Fitzgibbon. 2013. What can pictures tell us about web pages?: Improving document search using images. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, New York, NY, 849--852. DOI:http://dx.doi.org/10.1145/2484028.2484144 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Rueger. 2010. Multimedia Information Retrieval. Morgan and Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. I. Ruthven and M. Lalmas. 2003. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review 18, 2 (2003), 95--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. S. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. F. Smeaton, P. Over, and W. Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR’06). ACM, New York, NY, 321--330. DOI:http://dx.doi.org/10.1145/1178677.1178722 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. C. G. M. Snoek, M. Worring, and A. W. M. Smeulders. 2005. Early versus late fusion in semantic video analysis. In ACM International Conference on Multimedia. 399--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. 2008. Bayesian video search reranking. In ACM Multimedia. 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. A. Vinokourov, D. R. Hardoon, and J. Shawe-Taylor. 2003. Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of the Fourth International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003). 697--701.Google ScholarGoogle Scholar
  57. M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. 2009a. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19, 5 (2009), 733--746. DOI:http://dx.doi.org/10.1109/TCSVT.2009.2017400 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. Wang, X.-S. Hua, J. Tang, and R. Hong. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11, 3 (2009), 465--476. DOI:http://dx.doi.org/10.1109/TMM.2009.2012919 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. M. Wang, H. Li, D. Tao, K. Lu, and X. Wu. 2012. Multimodal graph-based reranking for web image search. IEEE Transactions on Image Processing, 21, 11 (2012), 4649--4661. DOI:http://dx.doi.org/10.1109/TIP.2012.2207397 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. 2004. Multi-model similarity propagation and its application for web image retrieval. In ACM Multimedia. 944--951. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. P. Wilkins, A. F. Smeaton, and P. Ferguson. 2010. Properties of optimally weighted data fusion in CBMIR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 643--650. DOI:http://dx.doi.org/10.1145/1835449.1835556 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. L. Yang and A. Hanjalic. 2010. Supervised reranking for web image search. In ACM Multimedia. 183--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Z.-J. Zha, M. Wang, J. Shen, and T.-S. Chua. 2012. Text mining in multimedia. In Mining Text Data. 361--384.Google ScholarGoogle Scholar

Index Terms

  1. Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 33, Issue 2
      February 2015
      181 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/2737813
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2015
      • Accepted: 1 October 2014
      • Revised: 1 April 2014
      • Received: 1 March 2013
      Published in tois Volume 33, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader