research-article

Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods

Authors:
Julien Ah-Pine

University of Lyon, France

University of Lyon, France
View Profile

,
Gabriela Csurka

Xerox Research Centre Europe, Meylan, France

Xerox Research Centre Europe, Meylan, France
View Profile

,
Stéphane Clinchant

Xerox Research Centre Europe, Meylan, France

Xerox Research Centre Europe, Meylan, France
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 33 Issue 2Article No.: 9pp 1–31https://doi.org/10.1145/2699668

Published:17 February 2015Publication History

ACM Transactions on Information Systems

Abstract

Multimedia collections are more than ever growing in size and diversity. Effective multimedia retrieval systems are thus critical to access these datasets from the end-user perspective and in a scalable way. We are interested in repositories of image/text multimedia objects and we study multimodal information fusion techniques in the context of content-based multimedia information retrieval. We focus on graph-based methods, which have proven to provide state-of-the-art performances. We particularly examine two such methods: cross-media similarities and random-walk-based scores. From a theoretical viewpoint, we propose a unifying graph-based framework, which encompasses the two aforementioned approaches. Our proposal allows us to highlight the core features one should consider when using a graph-based technique for the combination of visual and textual information. We compare cross-media and random-walk-based results using three different real-world datasets. From a practical standpoint, our extended empirical analyses allow us to provide insights and guidelines about the use of graph-based methods for multimodal information fusion in content-based multimedia information retrieval.

References

J. Ah-Pine, M. Bressan, S. Clinchant, G. Csurka, Y. Hoppenot, and J.M. Renders. 2009. Crossing textual and visual content in different application scenarios. Multimedia Tools and Applications 42, 1 (2009), 31--56. Google ScholarDigital Library
J. Ah-Pine, C. Cifarelli, S. Clinchant, G. Csurka, and J.M. Renders. 2008. XRCE’s participation to ImageCLEF 2008. In Working Notes of CLEF 2008.Google Scholar
J. Ah-Pine, S. Clinchant, and G. Csurka. 2010. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation. Lecture Notes in Computer Science. Springer. Google ScholarDigital Library
J. Ah-Pine, S. Clinchant, G. Csurka, and Y. Liu. 2009. XRCE’s participation to ImageCLEF 2009. In Working Notes of the 2009 CLEF Workshop.Google Scholar
J. Ah-Pine, S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2010. Leveraging image, text and cross-media similarities for diversity-focused multimedia retrieval. In ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE, H. MÜller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). Retrieval. Springer, Chapter 3.4.Google Scholar
A. L. Berger and J. D. Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 222--229. Google ScholarDigital Library
S. Brin and L. Page. 1998a. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1--7 (April 1998), 107--117. DOI:http://dx.doi.org/10.1016/S0169-7552(98)00110-X Google ScholarDigital Library
S. Brin and L. Page. 1998b. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30 (1998), 107--117. Google ScholarDigital Library
E. Bruno, N. Moënne-Loccoz, and S. Marchand-Maillet. 2008. Design of multimodal dissimilarity spaces for retrieval of video documents. PAMI 30, 9 (2008), 1520--1533. Google ScholarDigital Library
J. C. Caicedo, J. G. Moreno, E. A. Niño, and F. A. González. 2010. Combining visual features and text data for medical image retrieval using latent semantic kernels. In Multimedia Information Retrieval. Google ScholarDigital Library
S. Clinchant, G. Csurka, J. Ah-Pine, G. Jacquet, F. Perronnin, J. Sánchez, and K. Minoukadeh. 2010. XRCE’s participation in Wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops).Google Scholar
S. Clinchant, G. Csurka, F. Perronnin, and J.-M. Renders. 2007. XRCE’s participation to ImagEval. In ImageEval Workshop at CVIR.Google Scholar
S. Clinchant and E. Gaussier. 2010. Information-based models for ad hoc IR. In SIGIR. ACM, 8. DOI:http://dx.doi.org/10.1145/1835449.1835490 Google ScholarDigital Library
S. Clinchant, C. Goutte, and É. Gaussier. 2006. Lexical entailment for information retrieval. In Advances in Information Retrieval, 28th European Conference on IR Research (ECIR’06). 217--228. Google ScholarDigital Library
S. Clinchant, J. M. Renders, and G. Csurka. 2007. XRCE’s participation to ImageCLEF. In CLEF Working Notes.Google Scholar
S. Clinchant, J.-M. Renders, and G. Csurka. 2008. Trans--media pseudo--relevance feedback methods in multimedia retrieval. In Advances in Multilingual and Multimodal Information Retrieval. Lecture Notes in Computer Science, Vol. 552. Springer, 569--576. Google ScholarDigital Library
N. Craswell and M. Szummer. 2007. Random walks on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, New York, NY, USA, 239--246. DOI:http://dx.doi.org/10.1145/1277741.1277784 Google ScholarDigital Library
G. Csurka and S. Clinchant. 2012. An empirical study of fusion operators for multimodal image retrieval. In CBMI.Google Scholar
G. Csurka, S. Clinchant, and A. Popescu. 2011. XRCE’s participation at Wikipedia retrieval of ImageCLEF 2011. In CLEF (Notebook Papers/Labs/Workshop).Google Scholar
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. 2004. Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning for Computer Vision.Google Scholar
H. J. Escalante, C. A. Hernández, L. E. Sucar, and M. Montes y Gómez. 2008. Late fusion of heterogeneous methods for multimedia image retrieval. In MIR. Google ScholarDigital Library
M. Franceschet. 2011. PageRank: Standing on the shoulders of giants. Communications of the ACM 54, 6 (2011), 92--101. Google ScholarDigital Library
Y. Gao, M. Wang, Z.-J. Zha, J. Shen, X. Li, and X. Wu. 2013. Visual-textual joint relevance learning for tag-based social image search. IEEE Transactions on Image Processing, 22, 1 (2013), 363--376. DOI:http://dx.doi.org/10.1109/TIP.2012.2202676 Google ScholarDigital Library
M. Grubinger, P. D. Clough, H. Müller, and T. Deselaers. 2006. The IAPR benchmark: A new evaluation resource for visual information systems. In International Conference on Language Resources and Evaluation, Genoa, Italy.Google Scholar
W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2006. Video search reranking via information bottleneck principle. In ACM Multimedia. 35--44. Google ScholarDigital Library
W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007a. Reranking methods for visual search. IEEE MultiMedia 14, 3 (2007), 14--22. Google ScholarDigital Library
W. H. Hsu, L. S. Kennedy, and S.-F. Chang. 2007b. Video search reranking through random walk over document-level context graph. In ACM Multimedia. 971--980. Google ScholarDigital Library
N. Jardine and C. J. van Rijsbergen. 1971. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217--240. DOI:http://dx.doi.org/10.1016/0020-0271(71)90051-9Google ScholarCross Ref
J. Jeon, V. Lavrenko, and R. Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’03). ACM, New York, NY, 119--126. DOI:http://dx.doi.org/10.1145/860435.860459 Google ScholarDigital Library
M. Karimzadehgan and C.-X. Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In SIGIR, F. Crestani, S. Marchand-Maillet, H-H. Chen, E.N. Efthimiadis, and J. Savoy (Eds.). ACM, 323--330. Google ScholarDigital Library
J. M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5 (Sept. 1999), 604--632. DOI:http://dx.doi.org/10.1145/324133.324140 Google ScholarDigital Library
J. Krapac, M. Allan, J. Verbeek, and F. Jurie. 2010. Improving web-image search results using query-relative classifiers. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR’’10). IEEE Computer Society, San Francisco, CA, 1094--1101. DOI:http://dx.doi.org/10.1109/CVPR.2010.5540092Google Scholar
A. N. Langville and C. D. Meyer. 2005. A survey of eigenvector methods for web information retrieval. SIAM Reviews 47, 1 (Jan. 2005), 135--161. DOI:http://dx.doi.org/10.1137/S0036144503424786 Google ScholarDigital Library
V. Lavrenko, R. Manmatha, and J. Jeon. 2003. A model for learning the semantics of pictures. In NIPS.Google Scholar
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Google ScholarDigital Library
T.-Y. Liu. 2009. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3, 3 (March 2009), 225--331. DOI:http://dx.doi.org/10.1561/1500000016 Google ScholarDigital Library
H. Ma, J. Zhu, M. R. Lyu, and I. King. 2010. Bridging the semantic gap between image contents and tags. IEEE Transactions on Multimedia 12, 5 (2010), 462--473. Google ScholarDigital Library
J. Magalhães and S. M. RÜger. 2010. An information-theoretic framework for semantic-multimedia retrieval. ACM Transactions on Information and Systems 28, 4 (2010), 19. Google ScholarDigital Library
N. Maillot, J.-P. Chevallet, and J.-H. Lim. 2006. Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval. In CLEF, C. Peters, P. Clough, F. C. Gey, J. Karlgren, B. Magnini, D. W. Oard, M. de Rijke, and M. Stempfhuber (Eds.). Lecture Notes in Computer Science, Vol. 4730. Springer, 735--738. Google ScholarDigital Library
Y. Mori, H. Takahashi, and R. Oka. 1999. Image--to--word transformation based on dividing and vector quantizing images with words. In Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99).Google Scholar
N. Morioka and J. Wang. 2011. Robust visual reranking via sparsity and ranking constraints. In ACM Multimedia. 533--542. Google ScholarDigital Library
H. Müller, P. Clough, Th. Deselaers, and B. Caputo (Eds.). 2010. ImageCLEF- Experimental Evaluation in Visual Information Retrieval. Vol. INRE. Springer. Google ScholarDigital Library
A. Natsev, A. Haubold, J. Tešić, L. Xie, and R. Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA’07). ACM, New York, NY, 991--1000. DOI:http://dx.doi.org/10.1145/1291233.1291448 Google ScholarDigital Library
J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD. 653--658. Google ScholarDigital Library
F. Perronnin and C. Dance. 2007. Fisher Kernels on visual vocabularies for image categorization. In CVPR. IEEE.Google Scholar
F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher Kernel for large-scale image classification. In ECCV. Google ScholarDigital Library
A. Popescu, T. Tsikrika, and J. Kludas. 2010. Overview of the Wikipedia retrieval task at ImageCLEF 2010. In Working Notes of the 11th Workshop of the Cross-Language Evaluation Forum. CLEF-campaign. http://clef2010.org/resources/proceedings/clef2010labs_submission_124.pdf.Google Scholar
N. Rasiwasia, J. C. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In ACM Multimedia. Google ScholarDigital Library
S. Rodriguez-Vaamonde, L. Torresani, and A. Fitzgibbon. 2013. What can pictures tell us about web pages&quest;: Improving document search using images. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, New York, NY, 849--852. DOI:http://dx.doi.org/10.1145/2484028.2484144 Google ScholarDigital Library
S. Rueger. 2010. Multimedia Information Retrieval. Morgan and Claypool. Google ScholarDigital Library
I. Ruthven and M. Lalmas. 2003. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review 18, 2 (2003), 95--145. Google ScholarDigital Library
J. S. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In ICCV. Google ScholarDigital Library
A. F. Smeaton, P. Over, and W. Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (MIR’06). ACM, New York, NY, 321--330. DOI:http://dx.doi.org/10.1145/1178677.1178722 Google ScholarDigital Library
C. G. M. Snoek, M. Worring, and A. W. M. Smeulders. 2005. Early versus late fusion in semantic video analysis. In ACM International Conference on Multimedia. 399--402. Google ScholarDigital Library
X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua. 2008. Bayesian video search reranking. In ACM Multimedia. 131--140. Google ScholarDigital Library
A. Vinokourov, D. R. Hardoon, and J. Shawe-Taylor. 2003. Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of the Fourth International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003). 697--701.Google Scholar
M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song. 2009a. Unified video annotation via multigraph learning. IEEE Transactions on Circuits and Systems for Video Technology, 19, 5 (2009), 733--746. DOI:http://dx.doi.org/10.1109/TCSVT.2009.2017400 Google ScholarDigital Library
M. Wang, X.-S. Hua, J. Tang, and R. Hong. 2009b. Beyond distance measurement: Constructing neighborhood similarity for video annotation. IEEE Transactions on Multimedia, 11, 3 (2009), 465--476. DOI:http://dx.doi.org/10.1109/TMM.2009.2012919 Google ScholarDigital Library
M. Wang, H. Li, D. Tao, K. Lu, and X. Wu. 2012. Multimodal graph-based reranking for web image search. IEEE Transactions on Image Processing, 21, 11 (2012), 4649--4661. DOI:http://dx.doi.org/10.1109/TIP.2012.2207397 Google ScholarDigital Library
X.-J. Wang, W.-Y. Ma, G.-R. Xue, and X. Li. 2004. Multi-model similarity propagation and its application for web image retrieval. In ACM Multimedia. 944--951. Google ScholarDigital Library
P. Wilkins, A. F. Smeaton, and P. Ferguson. 2010. Properties of optimally weighted data fusion in CBMIR. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 643--650. DOI:http://dx.doi.org/10.1145/1835449.1835556 Google ScholarDigital Library
L. Yang and A. Hanjalic. 2010. Supervised reranking for web image search. In ACM Multimedia. 183--192. Google ScholarDigital Library
Z.-J. Zha, M. Wang, J. Shen, and T.-S. Chua. 2012. Text mining in multimedia. In Mining Text Data. 361--384.Google Scholar

Index Terms

Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Semantic combination of textual and visual information in multimedia retrieval
ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval

The goal of this paper is to introduce a set of techniques we call semantic combination in order to efficiently fuse text and image retrieval systems in the context of multimedia information access. These techniques emerge from the observation that ...
Read More
Relevance feature mapping for content-based multimedia information retrieval

This paper presents a novel ranking framework for content-based multimedia information retrieval (CBMIR). The framework introduces relevance features and a new ranking scheme. Each relevance feature measures the relevance of an instance with respect to ...
Read More
Semantic indexing of multimedia content using textual and visual information

The challenge in multimedia information retrieval remains in the indexing process, an active search area. There are three fundamental techniques for indexing multimedia content: using textual information, using low-level information and combining ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 33, Issue 2
February 2015
181 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2737813
Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 February 2015
- Accepted: 1 October 2014
- Revised: 1 April 2014
- Received: 1 March 2013
Published in tois Volume 33, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Content-based multimedia information retrieval
Visual reranking
cross-media similarity
graph-based methods
information fusion
random walk
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 390
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Semantic combination of textual and visual information in multimedia retrieval

Relevance feature mapping for content-based multimedia information retrieval

Semantic indexing of multimedia content using textual and visual information