ABSTRACT
In this paper, we present our latest progress in Emotion Recognition techniques, which combines acoustic features and facial features in both non-temporal and temporal mode. This paper presents the details of our techniques used in the Audio-Video Emotion Recognition subtask in the 2018 Emotion Recognition in the Wild (EmotiW) Challenge. After the multimodal results fusion, our final accuracy in Acted Facial Expression in Wild (AFEW) test dataset achieves 61.87%, which is 1.53% higher than the best results last year. Such improvements prove the effectiveness of our methods.
- Abhinav Dhall, Amanjot Kaur, Roland Goecke and Tom Gedeon. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction. Proceedings of the 20th. ACM International Conference on Multimodal Interaction 2018 (ACM ICMI 2018). October 16-20, 2018, Boulder, Colorado, USA. Google ScholarDigital Library
- Abhinav Dhall, Roland Goecke, Simon Lucey, and Tom Gedeon. 2012. Collecting Large, Richly Annotated Facial-Expression Databases from Movies. MultiMedia. IEEE 19, 3 (Jul. 2012), 34--41. Google ScholarDigital Library
- Boris Knyazev, Roman Shvetsov, Natalia Efremova and Artem Kuharenko. 2017. Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv. 1711, 04598 (Nov. 2017).Google Scholar
- Li Shan, Deng Weihong and Du JunPing. 2017. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, (Jul. 2007), 2584--2593.Google Scholar
- K. Zhang, Z. Zhang, Z. Li and Y. Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters. IEEE 23, 10 (Oct, 2016), 1499--1503.Google Scholar
- Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research (Jul. 2009), 1755--1758. Google ScholarDigital Library
- Konar, A. and Chakraborty, A. 2014. Emotion recognition: A pattern analysis approach. John Wiley & Sons. 138--140.Google Scholar
- Bulat, Adrian and Tzimiropoulos, Georgios. 2017. How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In Proceedings of the International Conference on Computer Vision. IEEE. DIO:Google Scholar
- Aytar, Y., Vondrick, C. and Torralba, A., 2016. Soundnet: Learning sound representations from unlabeled video.In Advances in Neural Information Processing Systems (NIPS 2016), 892--900. Google ScholarDigital Library
- Florian Eyben, Martin Wöllmer, and Björn Schuller. 2010. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 1459--1462. Google ScholarDigital Library
- X. Xiong and F. De la Torre. (2013). Supervised Descent Method and Its Applications to Face Alignment. Computer Vision and Pattern Recognition. IEEE, 532--539. Google ScholarDigital Library
- Parkhi, Omkar M, Andrea Vedaldi and Andrew Zisserman. 2015. Deep face recognition. In British Machine Vision Conference, Vol. 1, No. 3, p. 6.Google Scholar
- Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, and others. 2013. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing. Springer, 117--124.Google ScholarCross Ref
- Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, and K. Saenko. 2015. Longterm recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE. 2625--2634.Google Scholar
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.Google Scholar
- G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR). Vol. 1, No. 2, p. 3.Google Scholar
- P. Hu, D. Cai, S. Wang, A. Yao, and Y. Chen. 2017. Learning supervised scoring ensemble for emotion recognition in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. ACM. 553--560. Google ScholarDigital Library
- Wold, Svante, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems. Elsevier 2, 1-3, (Aug. 1987), 37--52.Google Scholar
- Scovanner, Paul, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. Proceedings of the 15th ACM international conference on Multimedia. ACM. 357--360. Google ScholarDigital Library
- Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, and others. 2013. Challenges in representation learning: A report on three machine learning contests. In International Conference on Neural Information Processing. Springer, 117--124.Google ScholarCross Ref
- Fan, Y., Lu, X., Li, D., et al. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. Proceedings of the 18th ACM International Conference on Multimodal Interaction. ACM. 445--450. Google ScholarDigital Library
Index Terms
- Multi-Feature Based Emotion Recognition for Video Clips
Recommendations
Bi-modality Fusion for Emotion Recognition in the Wild
ICMI '19: 2019 International Conference on Multimodal InteractionThe emotion recognition in the wild has been a hot research topic in the field of affective computing. Though some progresses have been achieved, the emotion recognition in the wild is still an unsolved problem due to the challenge of head movement, ...
EmotiW 2016: video and group-level emotion recognition challenges
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal InteractionThis paper discusses the baseline for the Emotion Recognition in the Wild (EmotiW) 2016 challenge. Continuing on the theme of automatic affect recognition `in the wild', the EmotiW challenge 2016 consists of two sub-challenges: an audio-video based ...
Human-Computer Interaction Using Emotion Recognition from Facial Expression
EMS '11: Proceedings of the 2011 UKSim 5th European Symposium on Computer Modeling and SimulationThis paper describes emotion recognition system based on facial expression. A fully automatic facial expression recognition system is based on three steps: face detection, facial characteristic extraction and facial expression classification. We have ...
Comments