ABSTRACT
Effective multimedia retrieval requires the combination of the heterogeneous media contained within multimedia objects and the features that can be extracted from them. To this end, we extend a unifying framework that integrates all well-known weighted, graph-based, and diffusion-based fusion techniques that combine two modalities (textual and visual similarities) to model the fusion of multiple modalities. We also provide a theoretical formula for the optimal number of documents that need to be initially selected, so that the memory cost in the case of multiple modalities remains the same as in the case of two modalities. Experiments using two test collections and three modalities (similarities based on visual descriptors, visual concepts, and textual concepts) indicate improvements in the effectiveness over bimodal fusion under the same memory complexity.
- J. Ah-Pine, S. Clinchant, and G. Csurka. Comparison of several combinations of multimodal and diversity seeking methods for multimedia retrieval. In Multilingual Information Access Evaluation II. Multimedia Experiments: Proceedings of the 10th Workshop of the Cross-Language Evaluation Forum (CLEF), pages 124--132. Springer, 2009. Google ScholarDigital Library
- J. Ah-Pine, G. Csurka, and S. Clinchant. Unsupervised visual and textual information fusion in cbmir using graph-based methods. ACM Transactions on Information Systems (TOIS), 33(2):9, 2015. Google ScholarDigital Library
- P. K. Atrey, M. A. Hossain, A. El Saddik, and M. S. Kankanhalli. Multimodal fusion for multimedia analysis: a survey. Multimedia systems, 16(6):345--379, 2010. Google ScholarDigital Library
- J. Costa Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. R. Lanckriet, R. Levy, and N. Vasconcelos. On the role of correlation and abstraction in cross-modal multimedia retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 36(3):521--535, 2014. Google ScholarDigital Library
- J. Hafner, H. S. Sawhney, W. Equitz, M. Flickner, and W. Niblack. Efficient color histogram indexing for quadratic form distancefunctions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(7):729--736, 1995. Google ScholarDigital Library
- W. H. Hsu, L. S. Kennedy, and S.-F. Chang. Video search reranking through random walk over document-level context graph. In Proceedings of the 15th International Conference on Multimedia, pages 971--980. ACM, 2007. Google ScholarDigital Library
- H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3304--3311. IEEE, 2010.Google ScholarCross Ref
- B. Safadi and G. Quénot. Re-ranking by local re-scoring for video indexing and retrieval. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pages 2081--2084. ACM, 2011. Google ScholarDigital Library
- B. Safadi, M. Sahuguet, and B. Huet. When textual and visual information join forces for multimedia retrieval. In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR), page 265. ACM, 2014. Google ScholarDigital Library
- B. Siddiquie, B. White, A. Sharma, and L. S. Davis. Multi-modal image retrieval for complex queries using small codes. In Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR), page 321. ACM, 2014. Google ScholarDigital Library
- K. E. Van De Sande, T. Gevers, and C. G. Snoek. Evaluating color descriptors for object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(9):1582--1596, 2010. Google ScholarDigital Library
- J. Wang, Y. He, C. Kang, S. Xiang, and C. Pan. Image-text cross-modal retrieval via modality-specific feature learning. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR), pages 347--354. ACM, 2015. Google ScholarDigital Library
- Y. Wang, X. Lin, and Q. Zhang. Towards metric fusion on multi-view data: a cross-view based graph random walk approach. In Proceedings of the 22nd ACM International Conference on Information and knowledge management (CIKM), pages 805--810. ACM, 2013. Google ScholarDigital Library
- S. Xu, H. Li, X. Chang, S.-I. Yu, X. Du, X. Li, L. Jiang, Z. Mao, Z. Lan, S. Burger, et al. Incremental multimodal query construction for video search. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR), pages 675--678. ACM, 2015. Google ScholarDigital Library
Index Terms
- Retrieval of Multimedia Objects by Fusing Multiple Modalities
Recommendations
Content-based multimedia information retrieval: State of the art and challenges
Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100+ recent ...
A Relevance Feedback Architecture for Content-based Multimedia Information Retrieval Systems
CAIVL '97: Proceedings of the 1997 Workshop on Content-Based Access of Image and Video Libraries (CBAIVL '97)Content-based multimedia information retrieval (MIR) has become one of the most active research areas in the past few years. Many retrieval approaches based on extracting and representing visual properties of multimedia data have been developed. While ...
Applications of Image Understanding in Semantics-Oriented Multimedia Information Retrieval
MSE '00: Proceedings of the 2000 International Conference on Microelectronic Systems EducationThis paper focuses on research in development of semantics-oriented multimedia information retrieval techniques.Semantics-oriented information retrieval addresses the effectiveness of the retrieval.With the goal of significantly improving retrieval ...
Comments