short-paper

Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition

Authors:
Ning Xu

Tianjin University, Tianjin, China

Tianjin University, Tianjin, China
View Profile

,
Anan Liu

Tianjin University, Tianjin, China

Tianjin University, Tianjin, China
View Profile

,
Weizhi Nie

Tianjin University, Tianjin, China

Tianjin University, Tianjin, China
View Profile

,
Yongkang Wong

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Fuwu Li

Tianjin University, Tianjin , China

Tianjin University, Tianjin , China
View Profile

,
Yuting Su

Tianjin University, Tianjin, China

Tianjin University, Tianjin, China
View Profile

MM '15: Proceedings of the 23rd ACM international conference on MultimediaOctober 2015Pages 1195–1198https://doi.org/10.1145/2733373.2806315

Published:13 October 2015Publication History

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Pages 1195–1198

ABSTRACT

Human action recognition is one of the most active research areas in both computer vision and machine learning communities. Several methods for human action recognition have been proposed in the literature and promising results have been achieved on the popular datasets. However, the comparison of existing methods is often limited given the different datasets, experimental settings, feature representations, and so on. In particularly, there are no human action dataset that allow concurrent analysis on three popular scenarios, namely single view, cross view, and cross domain. In this paper, we introduce a Multi-modal & Multi-view & Interactive (M²I) dataset, which is designed for the evaluation of the performances of human action recognition under multi-view scenario. This dataset consists of 1760 action samples, including 9 person-person interaction actions and 13 person-object interaction actions. Moreover, we respectively evaluate three representative methods for the single-view, cross-view, and cross domain human action recognition on this dataset with the proposed evaluation protocol. It is experimentally demonstrated that this dataset is extremely challenging due to large intraclass variation, multiple similar actions, significant view difference. This benchmark can provide solid basis for the evaluation of this task and will benefit advancing related computer vision and machine learning research topics.

References

J. K. Aggarwal and M. S. Ryoo. Human activity analysis: A review. ACM Comput. Surv., 43(3):16, 2011. Google ScholarDigital Library
X. Chang, W. Zheng, and J. Zhang. Learning person-person interaction in collective activity recognition. IEEE Transactions on Image Processing, 24(6):1905--1918, 2015.Google ScholarCross Ref
W. Choi and S. Savarese. Understanding collective activities of people from videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6):1242--1257, 2012. Google ScholarDigital Library
L. Duan, D. Xu, I. W. Tsang, and J. Luo. Visual event recognition in videos by learning from web data. IEEE Trans. Pattern Anal. Mach. Intell, 34(9):1667--1680, 2012. Google ScholarDigital Library
J. Hu, W. Zheng, J. Lai, S. Gong, and T. Xiang. Exemplar-based recognition of human-object interactions. IEEE Transactions on Circuits and Systems for Video Technology, 2015.Google Scholar
K. K. Reddy and M. Shah. Recognizing 50 human action categories of web videos. Mach. Vis. Appl., 24(5):971--981, 2013. Google ScholarDigital Library
C. Schüldt, I. Laptev, and B. Caputo. Recognizing human actions: A local SVM approach. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, August 23-26, 2004., pages 32--36, 2004. Google ScholarDigital Library
H. Wang, A. Kläser, C. Schmid, and C. Liu. Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision, 103(1):60--79, 2013.Google ScholarCross Ref
J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, pages 1290--1297, 2012. Google ScholarDigital Library
D. Weinland, E. Boyer, and R. Ronfard. Action recognition from arbitrary views using 3d exemplars. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14-20, 2007, pages 1--7, 2007.Google ScholarCross Ref
L. Xia, C. Chen, and J. K. Aggarwal. View invariant human action recognition using histograms of 3d joints. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, June 16-21, 2012, pages 20--27, 2012.Google ScholarCross Ref
J. Zheng, Z. Jiang, P. J. Phillips, and R. Chellappa. Cross-view action recognition via a transferable dictionary pair. In British Machine Vision Conference, BMVC 2012, Surrey, UK, September 3-7, 2012, pages 1--11, 2012.Google ScholarCross Ref

Index Terms

Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Attention-based network for effective action recognition from multi-view video
Abstract
A human action recognition system is affected by many challenges such as background clutter, partial occlusion, lighting, viewpoint, execution rate. Using complementary information from different views can improve view changing and occlusion ...
Read More
Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis
MM '16: Proceedings of the 24th ACM international conference on Multimedia

In this paper, we propose a novel multi-modal multi-view topic-opinion mining (MMTOM) model for social event analysis in multiple collection sources. Compared with existing topic-opinion mining methods, our proposed model has several advantages: (1) The ...
Read More
Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition
Survey Paper and Regular Papers

Multi-view human action recognition remains a challenging problem due to large view changes. In this article, we propose a transfer learning-based framework called transferable dictionary learning and view adaptation (TDVA) model for multi-view human ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '15: Proceedings of the 23rd ACM international conference on Multimedia
October 2015
1402 pages
ISBN:9781450334594
DOI:10.1145/2733373
General Chairs:
Xiaofang Zhou
The University of Queensland, Australia
,
Alan F. Smeaton
Dublin City University, Ireland
,
Qi Tian
The University of Texas at San Antonio, USA
,
Program Chairs:
Dick C.A. Bulterman
FXPAL, USA
,
Heng Tao Shen
The University of Queensland, Australia
,
Ketan Mayer-Patel
The University of North Carolina, USA
,
Shuicheng Yan
National University of Singapore, Singapore
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
human action recognition
multi-modal
multi-view
Qualifiers
- short-paper
Conference

Acceptance Rates
MM '15 Paper Acceptance Rate56of252submissions,22%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 816
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-modal & Multi-view & Interactive Benchmark Dataset for Human Action Recognition

MM '15: Proceedings of the 23rd ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Attention-based network for effective action recognition from multi-view video

Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis

Joint Transferable Dictionary Learning and View Adaptation for Multi-view Human Action Recognition