skip to main content
10.1145/2733373.2806315acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition

Published:13 October 2015Publication History

ABSTRACT

Human action recognition is one of the most active research areas in both computer vision and machine learning communities. Several methods for human action recognition have been proposed in the literature and promising results have been achieved on the popular datasets. However, the comparison of existing methods is often limited given the different datasets, experimental settings, feature representations, and so on. In particularly, there are no human action dataset that allow concurrent analysis on three popular scenarios, namely single view, cross view, and cross domain. In this paper, we introduce a Multi-modal & Multi-view & Interactive (M2I) dataset, which is designed for the evaluation of the performances of human action recognition under multi-view scenario. This dataset consists of 1760 action samples, including 9 person-person interaction actions and 13 person-object interaction actions. Moreover, we respectively evaluate three representative methods for the single-view, cross-view, and cross domain human action recognition on this dataset with the proposed evaluation protocol. It is experimentally demonstrated that this dataset is extremely challenging due to large intraclass variation, multiple similar actions, significant view difference. This benchmark can provide solid basis for the evaluation of this task and will benefit advancing related computer vision and machine learning research topics.

References

  1. J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Comput. Surv., 43(3):16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Chang, W. Zheng, and J. Zhang. Learning person-person interaction in collective activity recognition. IEEE Transactions on Image Processing, 24(6):1905--1918, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  3. W. Choi and S. Savarese. Understanding collective activities of people from videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1242--1257, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Duan, D. Xu, I. W. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. IEEE Trans. Pattern Anal. Mach. Intell, 34(9):1667--1680, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Hu, W. Zheng, J. Lai, S. Gong, and T. Xiang. Exemplar-based recognition of human-object interactions. IEEE Transactions on Circuits and Systems for Video Technology, 2015.Google ScholarGoogle Scholar
  6. K. K. Reddy and M. Shah. Recognizing 50 human action categories of web videos. Mach. Vis. Appl., 24(5):971--981, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Schüldt, I. Laptev, and B. Caputo. Recognizing human actions: A local SVM approach. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, August 23-26, 2004., pages 32--36, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Wang, A. Kläser, C. Schmid, and C. Liu. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1):60--79, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, pages 1290--1297, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Weinland, E. Boyer, and R. Ronfard. Action recognition from arbitrary views using 3d exemplars. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14-20, 2007, pages 1--7, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  11. L. Xia, C. Chen, and J. K. Aggarwal. View invariant human action recognition using histograms of 3d joints. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, June 16-21, 2012, pages 20--27, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Zheng, Z. Jiang, P. J. Phillips, and R. Chellappa. Cross-view action recognition via a transferable dictionary pair. In British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012, pages 1--11, 2012.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '15: Proceedings of the 23rd ACM international conference on Multimedia
        October 2015
        1402 pages
        ISBN:9781450334594
        DOI:10.1145/2733373

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 October 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        MM '15 Paper Acceptance Rate56of252submissions,22%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader