skip to main content
research-article

Semantic Reasoning in Zero Example Video Event Retrieval

Published:04 October 2017Publication History
Skip Abstract Section

Abstract

Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the retrieval of high-level events using visual examples, but without examples it is still hard to (1) determine which concepts are useful to pre-train (Vocabulary challenge) and (2) which pre-trained concept detectors are relevant for a certain unseen high-level event (Concept Selection challenge). In our article, we present our Semantic Event Retrieval System which (1) shows the importance of high-level concepts in a vocabulary for the retrieval of complex and generic high-level events and (2) uses a novel concept selection method (i-w2v) based on semantic embeddings. Our experiments on the international TRECVID Multimedia Event Detection benchmark show that a diverse vocabulary including high-level concepts improves performance on the retrieval of high-level events in videos and that our novel method outperforms a knowledge-based concept selection method.

References

  1. Robin Aly, Djoerd Hiemstra, Franciska de Jong, and Peter M. G. Apers. 2012. Simulating the future of concept-based video retrieval under improved detector performance. Multimed. Tools Appl. 60, 1 (2012), 203--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lamberto Ballan, Marco Bertini, Alberto Del Bimbo, Lorenzo Seidenari, and Giuseppe Serra. 2011. Event detection and recognition for semantic annotation of video. Multimed. Tools Appl. 51, 1 (2011), pp. 279--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44, 1 (2012), 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Xiaojun Chang, Yi Yang, Alexander G. Hauptmann, Eric P. Xing, and Yao-Liang Yu. 2015. Semantic concept discovery for large-scale zero-shot event detection. In Proceedings of the 24th International Conference on Artificial Intelligence. AAAI Press, 2234--2240.Google ScholarGoogle Scholar
  5. Xiaojun Chang, Yi Yang, Guodong Long, Chengqi Zhang, and Alexander G. Hauptmann. 2016. Dynamic concept composition for zero-example event detection. In AAAI. 3464--3470.Google ScholarGoogle Scholar
  6. Jiawei Chen, Yin Cui, Guangnan Ye, Dong Liu, and Shih-Fu Chang. 2014. Event-driven semantic concept discovery by exploiting weakly tagged internet images. In Proceedings of the International Conference on Multimedia Retrieval. ACM, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeffrey Dalton, James Allan, and Pranav Mirajkar. 2013. Zero-shot video retrieval using content and concepts. In Proceedings of the 22nd ACM International Conference Information & Knowledge Management. ACM, 1857--1860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Maaike de Boer, Klamer Schutte, and Wessel Kraaij. 2015. Knowledge based query expansion in complex multimedia event detection. Multimed. Tools Appl. (2015), 1--19.Google ScholarGoogle Scholar
  9. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  10. Amirhossein Habibian, Thomas Mensink, and Cees G. M. Snoek. 2014a. Composite concept discovery for zero-shot video event detection. In Proceedings of the International Conference on Multimedia Retrieval. ACM, 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Amirhossein Habibian, Thomas Mensink, and Cees G. M. Snoek. 2014b. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the International Conference on Multimedia. ACM, 17--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Amirhossein Habibian, Koen E. A. van de Sande, and Cees G. M. Snoek. 2013. Recommendations for video event recognition using concept vocabularies. In Proceedings of the 3rd International Conference on Multimedia Retrieval. ACM, 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alexander Hauptmann, Rong Yan, and Wei-Hao Lin. 2007a. How many high-level concepts will fill the semantic gap in news video retrieval?. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. ACM, 627--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alexander Hauptmann, Rong Yan, Wei-Hao Lin, Michael Christel, and Howard Wactlar. 2007b. Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans. Multimed. 9, 5 (2007), 958--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bouke Huurnink, Katja Hofmann, and Maarten De Rijke. 2008. Assessing concept selection for video retrieval. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM, 459--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mihir Jain, Jan C. van Gemert, Thomas Mensink, and Cees G. M. Snoek. 2015. Objects2action: Classifying and localizing actions without any video example. In Proceedings of the IEEE International Conference on Computer Vision. 4588--4596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Lu Jiang, Deyu Meng, Teruko Mitamura, and Alexander G. Hauptmann. 2014a. Easy samples first: Self-paced reranking for zero-example multimedia search. In Proceedings of the ACM International Conference on Multimedia. ACM, 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lu Jiang, Teruko Mitamura, Shoou-I. Yu, and Alexander G. Hauptmann. 2014b. Zero-example event search using multimodal pseudo relevance feedback. In Proceedings of the International Conference on Multimedia Retrieval. ACM, 297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lu Jiang, Shoou-I. Yu, Deyu Meng, Teruko Mitamura, and Alexander G. Hauptmann. 2015b. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In Proceedings of the ACM International Conference on Multimedia Retrieval. 27--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yu-Gang Jiang, Subhabrata Bhattacharya, Shih-Fu Chang, and Mubarak Shah. 2012. High-level event recognition in unconstrained videos. Int. J. Multimed. Inf. Retriev. (2012), 1--29.Google ScholarGoogle Scholar
  21. Yu-Gang Jiang, Zuxuan Wu, Jun Wang, Xiangyang Xue, and Shi-Fu Chang. 2017. Exploiting feature and class relationships in video categorization with regularized deep neural networks. In IEEE Transactions on Pattern Analysis and Machine Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). 1725--1732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lyndon Kennedy and Alexander Hauptmann. 2006. LSCOM lexicon definitions and annotations (version 1.0). (2006).Google ScholarGoogle Scholar
  24. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google ScholarGoogle Scholar
  25. Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems. 2177--2185.Google ScholarGoogle Scholar
  26. Omer Levy, Yoav Goldberg, and Ido Dagan. 2015. Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Ling. 3 (2015), 211--225.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ying Liu, Dengsheng Zhang, Guojun Lu, and Wei-Ying Ma. 2007. A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40, 1 (2007), 262--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yi-Jie Lu, Hao Zhang, Maaike de Boer, and Chong-Wah Ngo. 2016. Event detection with zero example: Select the right and suppress the wrong concepts. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Masoud Mazloom, Efstratios Gavves, Koen van de Sande, and Cees Snoek. 2013. Searching informative concept banks for video event detection. In Proceedings of the 3rd International Conference on Multimedia Retrieval. ACM, 255--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thomas Mensink, Efstratios Gavves, and Cees G. M. Snoek. 2014. COSTA: Co-occurrence statistics for zero-shot classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, 2441--2448.Google ScholarGoogle Scholar
  31. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.Google ScholarGoogle Scholar
  32. George A. Miller. 1995. WordNet: A lexical database for english. Commun. ACM 38, 11 (1995), pp. 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David Milne and Ian H. Witten. 2013. An open-source toolkit for mining Wikipedia. Artif. Intell. 194 (2013), pp. 222--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Apostol Paul Natsev, Alexander Haubold, Jelena Tešić, Lexing Xie, and Rong Yan. 2007. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th International Conference on Multimedia. ACM, 991--1000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shi-Yong Neo, Jin Zhao, Min-Yen Kan, and Tat-Seng Chua. 2006. Video retrieval using high level features: Exploiting query matching and confidence-based weighting. In International Conference on Image and Video Retrieval. Springer, 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Paul Over, George Awad, Martial Michel, Jonathan Fiscus, Greg Sanders, Wessel Kraaij, Alan F. Smeaton, and Georges Quenot. 2014. TRECVID 2014 -- An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID’14). NIST, USA.Google ScholarGoogle Scholar
  37. Paul Over, George Awad, Martial Michel, Jonathan Fiscus, Greg Sanders, Wessel Kraaij, Alan F. Smeaton, Georges Quenot, and Roeland Ordelman. 2015. TRECVID 2015—An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the Annual TREC Video Retrieval Evaluation (TRECVID’15). NIST.Google ScholarGoogle Scholar
  38. Pushpa B. Patil and Manesh B. Kokare. 2011. Relevance feedback in content based image retrieval: A review.J. Appl. Comput. Sci. Math. 10, 10 (2011), pp. 40--47.Google ScholarGoogle Scholar
  39. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Vol. 14. 1532--1543.Google ScholarGoogle Scholar
  40. Alan F. Smeaton, Paul Over, and Wessel Kraaij. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval. ACM, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Steve Spagnola and Carl Lagoze. 2011. Edge dependent pathway scoring for calculating semantic similarity in ConceptNet. In Proceedings of the 9th International Conference on Computational Semantics. Association for Computational Linguistics, 385--389.Google ScholarGoogle Scholar
  42. Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817 (2015).Google ScholarGoogle Scholar
  43. Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Christos Tzelepis, Damianos Galanopoulos, Vasileios Mezaris, and Ioannis Patras. 2016. Learning to detect video events from zero or very few video examples. Image and Vision Computing 53, 35--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Shuang Wu, Sravanthi Bondugula, Florian Luisier, Xiaodan Zhuang, and Prem Natarajan. 2014. Zero-shot event detection using multi-modal fusion of weakly supervised concepts. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2665--2672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shicheng Xu, Huan Li, Xiaojun Chang, Shoou-I. Yu, Xingzhong Du, Xuanchong Li, Lu Jiang, Zexi Mao, Zhenzhong Lan, Susanne Burger, and others. 2015. Incremental multimodal query construction for video search. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 675--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yan Yan, Yi Yang, Haoquan Shen, Deyu Meng, Gaowen Liu, Alex Hauptmann, and Nicu Sebe. 2015. Complex event detection via event oriented dictionary learning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  48. Guangnan Ye, Yitong Li, Hongliang Xu, Dong Liu, and Shih-Fu Chang. 2015. Eventnet: A large scale structured concept library for complex event detection in video. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference. ACM, 471--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Shoou-I. Yu, Lu Jiang, and Alexander Hauptmann. 2014. Instructional videos for unsupervised harvesting and learning of action examples. In Proceedings of the ACM International Conference on Multimedia. ACM, 825--828.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems. 487--495.Google ScholarGoogle Scholar

Index Terms

  1. Semantic Reasoning in Zero Example Video Event Retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 13, Issue 4
        November 2017
        362 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3129737
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 October 2017
        • Accepted: 1 July 2017
        • Revised: 1 May 2017
        • Received: 1 July 2016
        Published in tomm Volume 13, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader