short-paper

Open Access

Multi-Feature Based Emotion Recognition for Video Clips

Authors:
Chuanhe Liu

Beijing Situ Data Technology Service Co. Ltd., Beijing, China

Beijing Situ Data Technology Service Co. Ltd., Beijing, China
View Profile

,
Tianhao Tang

Beijing Situ Data Technology Service Co. Ltd., Beijing, China

Beijing Situ Data Technology Service Co. Ltd., Beijing, China
View Profile

,
Kui Lv

Beijing Situ Data Technology Service Co. Ltd., Beijing, China

Beijing Situ Data Technology Service Co. Ltd., Beijing, China
View Profile

,
Minghao Wang

Beijing Situ Data Technology Service Co. Ltd., Beijing, China

Beijing Situ Data Technology Service Co. Ltd., Beijing, China
View Profile

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal InteractionOctober 2018Pages 630–634https://doi.org/10.1145/3242969.3264989

Published:02 October 2018Publication History

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 630–634

ABSTRACT

In this paper, we present our latest progress in Emotion Recognition techniques, which combines acoustic features and facial features in both non-temporal and temporal mode. This paper presents the details of our techniques used in the Audio-Video Emotion Recognition subtask in the 2018 Emotion Recognition in the Wild (EmotiW) Challenge. After the multimodal results fusion, our final accuracy in Acted Facial Expression in Wild (AFEW) test dataset achieves 61.87%, which is 1.53% higher than the best results last year. Such improvements prove the effectiveness of our methods.

References

Abhinav Dhall, Amanjot Kaur, Roland Goecke and Tom Gedeon. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction. Proceedings of the 20th. ACM International Conference on Multimodal Interaction 2018 (ACM ICMI 2018). October 16-20, 2018, Boulder, Colorado, USA. Google ScholarDigital Library
Abhinav Dhall, Roland Goecke, Simon Lucey, and Tom Gedeon. 2012. Collecting Large, Richly Annotated Facial-Expression Databases from Movies. MultiMedia. IEEE 19, 3 (Jul. 2012), 34--41. Google ScholarDigital Library
Boris Knyazev, Roman Shvetsov, Natalia Efremova and Artem Kuharenko. 2017. Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv. 1711, 04598 (Nov. 2017).Google Scholar
Li Shan, Deng Weihong and Du JunPing. 2017. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, (Jul. 2007), 2584--2593.Google Scholar
K. Zhang, Z. Zhang, Z. Li and Y. Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters. IEEE 23, 10 (Oct, 2016), 1499--1503.Google Scholar
Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research (Jul. 2009), 1755--1758. Google ScholarDigital Library
Konar, A. and Chakraborty, A. 2014. Emotion recognition: A pattern analysis approach. John Wiley & Sons. 138--140.Google Scholar
Bulat, Adrian and Tzimiropoulos, Georgios. 2017. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In Proceedings of the International Conference on Computer Vision. IEEE. DIO:Google Scholar
Aytar, Y., Vondrick, C. and Torralba, A., 2016. Soundnet: Learning sound representations from unlabeled video.In Advances in Neural Information Processing Systems (NIPS 2016), 892--900. Google ScholarDigital Library
Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 1459--1462. Google ScholarDigital Library
X. Xiong and F. De la Torre. (2013). Supervised Descent Method and Its Applications to Face Alignment. Computer Vision and Pattern Recognition. IEEE, 532--539. Google ScholarDigital Library
Parkhi, Omkar M, Andrea Vedaldi and Andrew Zisserman. 2015. Deep face recognition. In British Machine Vision Conference, Vol. 1, No. 3, p. 6.Google Scholar
Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, and others. 2013. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing. Springer, 117--124.Google ScholarCross Ref
Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, and K. Saenko. 2015. Longterm recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2625--2634.Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.Google Scholar
G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR). Vol. 1, No. 2, p. 3.Google Scholar
P. Hu, D. Cai, S. Wang, A. Yao, and Y. Chen. 2017. Learning supervised scoring ensemble for emotion recognition in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM. 553--560. Google ScholarDigital Library
Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems. Elsevier 2, 1-3, (Aug. 1987), 37--52.Google Scholar
Scovanner, Paul, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. Proceedings of the 15th ACM international conference on Multimedia. ACM. 357--360. Google ScholarDigital Library
Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, and others. 2013. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing. Springer, 117--124.Google ScholarCross Ref
Fan, Y., Lu, X., Li, D., et al. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM. 445--450. Google ScholarDigital Library

Index Terms

Multi-Feature Based Emotion Recognition for Video Clips
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Appearance and texture representations

Recommendations

Bi-modality Fusion for Emotion Recognition in the Wild
ICMI '19: 2019 International Conference on Multimodal Interaction

The emotion recognition in the wild has been a hot research topic in the field of affective computing. Though some progresses have been achieved, the emotion recognition in the wild is still an unsolved problem due to the challenge of head movement, ...
Read More
EmotiW 2016: video and group-level emotion recognition challenges
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

This paper discusses the baseline for the Emotion Recognition in the Wild (EmotiW) 2016 challenge. Continuing on the theme of automatic affect recognition `in the wild', the EmotiW challenge 2016 consists of two sub-challenges: an audio-video based ...
Read More
Human-Computer Interaction Using Emotion Recognition from Facial Expression
EMS '11: Proceedings of the 2011 UKSim 5th European Symposium on Computer Modeling and Simulation

This paper describes emotion recognition system based on facial expression. A fully automatic facial expression recognition system is based on three steps: face detection, facial characteristic extraction and facial expression classification. We have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
3d face landmark
deep learning
densenet
emotion recognition
emotiw 2018
inception net
lstm
soundnet
Qualifiers
- short-paper
Conference

Acceptance Rates
ICMI '18 Paper Acceptance Rate63of149submissions,42%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 78
  Total Citations
  View Citations
- 3,685
  Total Downloads
- Downloads (Last 12 months)324
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-Feature Based Emotion Recognition for Video Clips

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bi-modality Fusion for Emotion Recognition in the Wild

EmotiW 2016: video and group-level emotion recognition challenges

Human-Computer Interaction Using Emotion Recognition from Facial Expression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-Feature Based Emotion Recognition for Video Clips

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Bi-modality Fusion for Emotion Recognition in the Wild

EmotiW 2016: video and group-level emotion recognition challenges

Human-Computer Interaction Using Emotion Recognition from Facial Expression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media