Abstract
Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content-based multimedia information retrieval. We focus on graph-based methods, which have proven to provide state-of-the-art performances. We particularly examine two such methods: cross-media similarities and random-walk-based scores. From a theoretical viewpoint, we propose a unifying graph-based framework, which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph-based technique for the combination of visual and textual information. We compare cross-media and random-walk-based results using three different real-world datasets. From a practical standpoint, our extended empirical analyses allow us to provide insights and guidelines about the use of graph-based methods for multimodal information fusion in content-based multimedia information retrieval.
- J. Ah-Pine, M. Bressan, S. Clinchant, G. Csurka, Y. Hoppenot, and J.M. Renders. 2009. Crossing textual and visual content in different application scenarios. Multimedia Tools and Applications 42, 1 (2009), 31--56. Google ScholarDigital Library
- J. Ah-Pine, C. Cifarelli, S. Clinchant, G. Csurka, and J.M. Renders. 2008. XRCE’s participation to ImageCLEF 2008. In Working Notes of CLEF 2008.Google Scholar
- J. Ah-Pine, S. Clinchant, and G. Csurka. 2010. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation. Lecture Notes in Computer Science. Springer. Google ScholarDigital Library
- J. Ah-Pine, S. Clinchant, G. Csurka, and Y. Liu. 2009. XRCE’s participation to ImageCLEF 2009. In Working Notes of the 2009 CLEF Workshop.Google Scholar
- J. Ah-Pine, S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2010. Leveraging image, text and cross-media similarities for diversity-focused multimedia retrieval. In ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE, H. MÜller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). Retrieval. Springer, Chapter 3.4.Google Scholar
- A. L. Berger and J. D. Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 222--229. Google ScholarDigital Library
- S. Brin and L. Page. 1998a. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1--7 (April 1998), 107--117. DOI:http://dx.doi.org/10.1016/S0169-7552(98)00110-X Google ScholarDigital Library
- S. Brin and L. Page. 1998b. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1998), 107--117. Google ScholarDigital Library
- E. Bruno, N. Moënne-Loccoz, and S. Marchand-Maillet. 2008. Design of multimodal dissimilarity spaces for retrieval of video documents. PAMI 30, 9 (2008), 1520--1533. Google ScholarDigital Library
- J. C. Caicedo, J. G. Moreno, E. A. Niño, and F. A. González. 2010. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Multimedia Information Retrieval. Google ScholarDigital Library
- S. Clinchant, G. Csurka, J. Ah-Pine, G. Jacquet, F. Perronnin, J. Sánchez, and K. Minoukadeh. 2010. XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops).Google Scholar
- S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2007. XRCE’s participation to ImagEval. In ImageEval Workshop at CVIR.Google Scholar
- S. Clinchant and E. Gaussier. 2010. Information-based models for ad hoc IR. In SIGIR. ACM, 8. DOI:http://dx.doi.org/10.1145/1835449.1835490 Google ScholarDigital Library
- S. Clinchant, C. Goutte, and É. Gaussier. 2006. Lexical entailment for information retrieval. In Advances in Information Retrieval, 28th European Conference on IR Research (ECIR’06). 217--228. Google ScholarDigital Library
- S. Clinchant, J. M. Renders, and G. Csurka. 2007. XRCE’s participation to ImageCLEF. In CLEF Working Notes.Google Scholar
- S. Clinchant, J.-M. Renders, and G. Csurka. 2008. Trans--media pseudo--relevance feedback methods in multimedia retrieval. In Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, Vol. 552. Springer, 569--576. Google ScholarDigital Library
- N. Craswell and M. Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, USA, 239--246. DOI:http://dx.doi.org/10.1145/1277741.1277784 Google ScholarDigital Library
- G. Csurka and S. Clinchant. 2012. An empirical study of fusion operators for multimodal image retrieval. In CBMI.Google Scholar
- G. Csurka, S. Clinchant, and A. Popescu. 2011. XRCE’s participation at Wikipedia retrieval of ImageCLEF 2011. In CLEF (Notebook Papers/Labs/Workshop).Google Scholar
- G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning for Computer Vision.Google Scholar
- H. J. Escalante, C. A. Hernández, L. E. Sucar, and M. Montes y Gómez. 2008. Late fusion of heterogeneous methods for multimedia image retrieval. In MIR. Google ScholarDigital Library
- M. Franceschet. 2011. PageRank: Standing on the shoulders of giants. Communications of the ACM 54, 6 (2011), 92--101. Google ScholarDigital Library
- Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing, 22, 1 (2013), 363--376. DOI:http://dx.doi.org/10.1109/TIP.2012.2202676 Google ScholarDigital Library
- M. Grubinger, P. D. Clough, H. Müller, and T. Deselaers. 2006. The IAPR benchmark: A new evaluation resource for visual information systems. In International Conference on Language Resources and Evaluation, Genoa, Italy.Google Scholar
- W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2006. Video search reranking via information bottleneck principle. In ACM Multimedia. 35--44. Google ScholarDigital Library
- W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007a. Reranking methods for visual search. IEEE MultiMedia 14, 3 (2007), 14--22. Google ScholarDigital Library
- W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007b. Video search reranking through random walk over document-level context graph. In ACM Multimedia. 971--980. Google ScholarDigital Library
- N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217--240. DOI:http://dx.doi.org/10.1016/0020-0271(71)90051-9Google ScholarCross Ref
- J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’03). ACM, New York, NY, 119--126. DOI:http://dx.doi.org/10.1145/860435.860459 Google ScholarDigital Library
- M. Karimzadehgan and C.-X. Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In SIGIR, F. Crestani, S. Marchand-Maillet, H-H. Chen, E.N. Efthimiadis, and J. Savoy (Eds.). ACM, 323--330. Google ScholarDigital Library
- J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5 (Sept. 1999), 604--632. DOI:http://dx.doi.org/10.1145/324133.324140 Google ScholarDigital Library
- J. Krapac, M. Allan, J. Verbeek, and F. Jurie. 2010. Improving web-image search results using query-relative classifiers. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR’’10). IEEE Computer Society, San Francisco, CA, 1094--1101. DOI:http://dx.doi.org/10.1109/CVPR.2010.5540092Google Scholar
- A. N. Langville and C. D. Meyer. 2005. A survey of eigenvector methods for web information retrieval. SIAM Reviews 47, 1 (Jan. 2005), 135--161. DOI:http://dx.doi.org/10.1137/S0036144503424786 Google ScholarDigital Library
- V. Lavrenko, R. Manmatha, and J. Jeon. 2003. A model for learning the semantics of pictures. In NIPS.Google Scholar
- S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Google ScholarDigital Library
- T.-Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (March 2009), 225--331. DOI:http://dx.doi.org/10.1561/1500000016 Google ScholarDigital Library
- H. Ma, J. Zhu, M. R. Lyu, and I. King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473. Google ScholarDigital Library
- J. Magalhães and S. M. RÜger. 2010. An information-theoretic framework for semantic-multimedia retrieval. ACM Transactions on Information and Systems 28, 4 (2010), 19. Google ScholarDigital Library
- N. Maillot, J.-P. Chevallet, and J.-H. Lim. 2006. Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval. In CLEF, C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, M. de Rijke, and M. Stempfhuber (Eds.). Lecture Notes in Computer Science, Vol. 4730. Springer, 735--738. Google ScholarDigital Library
- Y. Mori, H. Takahashi, and R. Oka. 1999. Image--to--word transformation based on dividing and vector quantizing images with words. In Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99).Google Scholar
- N. Morioka and J. Wang. 2011. Robust visual reranking via sparsity and ranking constraints. In ACM Multimedia. 533--542. Google ScholarDigital Library
- H. Müller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). 2010. ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE. Springer. Google ScholarDigital Library
- A. Natsev, A. Haubold, J. Tešić, L. Xie, and R. Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA’07). ACM, New York, NY, 991--1000. DOI:http://dx.doi.org/10.1145/1291233.1291448 Google ScholarDigital Library
- J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD. 653--658. Google ScholarDigital Library
- F. Perronnin and C. Dance. 2007. Fisher Kernels on visual vocabularies for image categorization. In CVPR. IEEE.Google Scholar
- F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher Kernel for large-scale image classification. In ECCV. Google ScholarDigital Library
- A. Popescu, T. Tsikrika, and J. Kludas. 2010. Overview of the Wikipedia retrieval task at ImageCLEF 2010. In Working Notes of the 11th Workshop of the Cross-Language Evaluation Forum. CLEF-campaign. http://clef2010.org/resources/proceedings/clef2010labs_submission_124.pdf.Google Scholar
- N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In ACM Multimedia. Google ScholarDigital Library
- S. Rodriguez-Vaamonde, L. Torresani, and A. Fitzgibbon. 2013. What can pictures tell us about web pages?: Improving document search using images. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, New York, NY, 849--852. DOI:http://dx.doi.org/10.1145/2484028.2484144 Google ScholarDigital Library
- S. Rueger. 2010. Multimedia Information Retrieval. Morgan and Claypool. Google ScholarDigital Library
- I. Ruthven and M. Lalmas. 2003. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review 18, 2 (2003), 95--145. Google ScholarDigital Library
- J. S. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In ICCV. Google ScholarDigital Library
- A. F. Smeaton, P. Over, and W. Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR’06). ACM, New York, NY, 321--330. DOI:http://dx.doi.org/10.1145/1178677.1178722 Google ScholarDigital Library
- C. G. M. Snoek, M. Worring, and A. W. M. Smeulders. 2005. Early versus late fusion in semantic video analysis. In ACM International Conference on Multimedia. 399--402. Google ScholarDigital Library
- X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. 2008. Bayesian video search reranking. In ACM Multimedia. 131--140. Google ScholarDigital Library
- A. Vinokourov, D. R. Hardoon, and J. Shawe-Taylor. 2003. Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of the Fourth International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003). 697--701.Google Scholar
- M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. 2009a. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19, 5 (2009), 733--746. DOI:http://dx.doi.org/10.1109/TCSVT.2009.2017400 Google ScholarDigital Library
- M. Wang, X.-S. Hua, J. Tang, and R. Hong. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11, 3 (2009), 465--476. DOI:http://dx.doi.org/10.1109/TMM.2009.2012919 Google ScholarDigital Library
- M. Wang, H. Li, D. Tao, K. Lu, and X. Wu. 2012. Multimodal graph-based reranking for web image search. IEEE Transactions on Image Processing, 21, 11 (2012), 4649--4661. DOI:http://dx.doi.org/10.1109/TIP.2012.2207397 Google ScholarDigital Library
- X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. 2004. Multi-model similarity propagation and its application for web image retrieval. In ACM Multimedia. 944--951. Google ScholarDigital Library
- P. Wilkins, A. F. Smeaton, and P. Ferguson. 2010. Properties of optimally weighted data fusion in CBMIR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 643--650. DOI:http://dx.doi.org/10.1145/1835449.1835556 Google ScholarDigital Library
- L. Yang and A. Hanjalic. 2010. Supervised reranking for web image search. In ACM Multimedia. 183--192. Google ScholarDigital Library
- Z.-J. Zha, M. Wang, J. Shen, and T.-S. Chua. 2012. Text mining in multimedia. In Mining Text Data. 361--384.Google Scholar
Index Terms
- Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods
Recommendations
Semantic combination of textual and visual information in multimedia retrieval
ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia RetrievalThe goal of this paper is to introduce a set of techniques we call semantic combination in order to efficiently fuse text and image retrieval systems in the context of multimedia information access. These techniques emerge from the observation that ...
Relevance feature mapping for content-based multimedia information retrieval
This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to ...
Semantic indexing of multimedia content using textual and visual information
The challenge in multimedia information retrieval remains in the indexing process, an active search area. There are three fundamental techniques for indexing multimedia content: using textual information, using low-level information and combining ...
Comments