ABSTRACT
Human action recognition is one of the most active research areas in both computer vision and machine learning communities. Several methods for human action recognition have been proposed in the literature and promising results have been achieved on the popular datasets. However, the comparison of existing methods is often limited given the different datasets, experimental settings, feature representations, and so on. In particularly, there are no human action dataset that allow concurrent analysis on three popular scenarios, namely single view, cross view, and cross domain. In this paper, we introduce a Multi-modal & Multi-view & Interactive (M2I) dataset, which is designed for the evaluation of the performances of human action recognition under multi-view scenario. This dataset consists of 1760 action samples, including 9 person-person interaction actions and 13 person-object interaction actions. Moreover, we respectively evaluate three representative methods for the single-view, cross-view, and cross domain human action recognition on this dataset with the proposed evaluation protocol. It is experimentally demonstrated that this dataset is extremely challenging due to large intraclass variation, multiple similar actions, significant view difference. This benchmark can provide solid basis for the evaluation of this task and will benefit advancing related computer vision and machine learning research topics.
- J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Comput. Surv., 43(3):16, 2011. Google ScholarDigital Library
- X. Chang, W. Zheng, and J. Zhang. Learning person-person interaction in collective activity recognition. IEEE Transactions on Image Processing, 24(6):1905--1918, 2015.Google ScholarCross Ref
- W. Choi and S. Savarese. Understanding collective activities of people from videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1242--1257, 2012. Google ScholarDigital Library
- L. Duan, D. Xu, I. W. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. IEEE Trans. Pattern Anal. Mach. Intell, 34(9):1667--1680, 2012. Google ScholarDigital Library
- J. Hu, W. Zheng, J. Lai, S. Gong, and T. Xiang. Exemplar-based recognition of human-object interactions. IEEE Transactions on Circuits and Systems for Video Technology, 2015.Google Scholar
- K. K. Reddy and M. Shah. Recognizing 50 human action categories of web videos. Mach. Vis. Appl., 24(5):971--981, 2013. Google ScholarDigital Library
- C. Schüldt, I. Laptev, and B. Caputo. Recognizing human actions: A local SVM approach. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, August 23-26, 2004., pages 32--36, 2004. Google ScholarDigital Library
- H. Wang, A. Kläser, C. Schmid, and C. Liu. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1):60--79, 2013.Google ScholarCross Ref
- J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, pages 1290--1297, 2012. Google ScholarDigital Library
- D. Weinland, E. Boyer, and R. Ronfard. Action recognition from arbitrary views using 3d exemplars. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14-20, 2007, pages 1--7, 2007.Google ScholarCross Ref
- L. Xia, C. Chen, and J. K. Aggarwal. View invariant human action recognition using histograms of 3d joints. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, June 16-21, 2012, pages 20--27, 2012.Google ScholarCross Ref
- J. Zheng, Z. Jiang, P. J. Phillips, and R. Chellappa. Cross-view action recognition via a transferable dictionary pair. In British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012, pages 1--11, 2012.Google ScholarCross Ref
Index Terms
- Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition
Recommendations
Attention-based network for effective action recognition from multi-view video
AbstractA human action recognition system is affected by many challenges such as background clutter, partial occlusion, lighting, viewpoint, execution rate. Using complementary information from different views can improve view changing and occlusion ...
Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis
MM '16: Proceedings of the 24th ACM international conference on MultimediaIn this paper, we propose a novel multi-modal multi-view topic-opinion mining (MMTOM) model for social event analysis in multiple collection sources. Compared with existing topic-opinion mining methods, our proposed model has several advantages: (1) The ...
Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition
Survey Paper and Regular PapersMulti-view human action recognition remains a challenging problem due to large view changes. In this article, we propose a transfer learning-based framework called transferable dictionary learning and view adaptation (TDVA) model for multi-view human ...
Comments