Abstract
Affect detection is an important pattern recognition problem that has inspired researchers from several areas. The field is in need of a systematic review due to the recent influx of Multimodal (MM) affect detection systems that differ in several respects and sometimes yield incompatible results. This article provides such a survey via a quantitative review and meta-analysis of 90 peer-reviewed MM systems. The review indicated that the state of the art mainly consists of person-dependent models (62.2% of systems) that fuse audio and visual (55.6%) information to detect acted (52.2%) expressions of basic emotions and simple dimensions of arousal and valence (64.5%) with feature- (38.9%) and decision-level (35.6%) fusion techniques. However, there were also person-independent systems that considered additional modalities to detect nonbasic emotions and complex dimensions using model-level fusion techniques. The meta-analysis revealed that MM systems were consistently (85% of systems) more accurate than their best unimodal counterparts, with an average improvement of 9.83% (median of 6.60%). However, improvements were three times lower when systems were trained on natural (4.59%) versus acted data (12.7%). Importantly, MM accuracy could be accurately predicted (cross-validated R2 of 0.803) from unimodal accuracies and two system-level factors. Theoretical and applied implications and recommendations are discussed.
- S. Afzal and P. Robinson. 2011. Natural affect data: Collection and annotation. In New Perspectives on Affect and Learning Technologies, R. Calvo and S. D'Mello (Eds.) Springer, New York, NY, 44--70.Google Scholar
- J. Bailenson, E. Pontikakis, I. Mauss, J. Gross, M. Jabon, C. Hutcherson, C. Nass, and O. John. 2008. Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int. J. Hum. Comput. Stud. 66, 303--317. Google ScholarDigital Library
- T. Baltrušaitis, N. Banda, and P. Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In Proceedings of the International Conference on Multimedia and Expo (Workshop on Affective Analysis in Multimedia).Google Scholar
- N. Banda and P. Robinson. 2011. Noise analysis in audio-visual emotion recognition. In Proceedings of the 11th International Conference on Multimodal Interaction (ICMI).Google Scholar
- L. Barrett. 2006. Are emotions natural kinds? Perspect. Psychol. Sci. 1, 28--58.Google ScholarCross Ref
- L. Barrett, B. Mesquita, K. Ochsner, and J. Gross. 2007. The experience of emotion. Ann. Rev. Psychol. 58, 373--403.Google ScholarCross Ref
- M. Borenstein, L. V. Hedges, J. P. T. Higgins, and H. R. Rothstein. 2009. Introduction to Meta-Analysis. John Wiley & Sons, Inc., Hoboken, NJ.Google Scholar
- S. Brave and C. Nass. 2002. Emotion in human-computer interaction. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. Jacko and A. Sears (Eds.). Erlbaum Associates, Inc., Hillsdale, NJ, 81--96. Google ScholarDigital Library
- C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Mutlmodal Interfaces (ICMI'04), R. Sharma, T. M. P. Darrell Harper, G. Lazzari and M. Turk (Eds.). ACM, State College, PA, 205--211. Google ScholarDigital Library
- R. Calvo, S. K. D'Mello, J. Gratch, and A. Kappas. 2014. The Oxford Handbook of Affective Computing. Oxford University Press, New York, NY.Google Scholar
- R. A. Calvo and S. K. D'Mello. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 1 (2007), 18--37. Google ScholarDigital Library
- G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Paouzaiou, and K. Karpouzis. 2006. Modeling naturalistic affective states via facial and vocal expression recognition. In International Conference on Multimidal Interfaces. ACM, New York, NY, 146--154. Google ScholarDigital Library
- G. Castellano, L. Kessous, and G. Caridakis. 2008. Emotion recognition through multiple modalities: Face, body gesture, speech. In Affect and Emotion in Human-Computer Interaction, C. Peter and R. Beale (Eds.). Lecture Notes in Computer Science, Vol. 4868. Springer, Berlin, 92--103. Google ScholarDigital Library
- G. Castellano, A. Pereira, I. Leite, A. Paiva, and P. McOwan. 2009. Detecting user engagement with a robot companion using task and social interaction-based features. In Proceedings of the 2009 International Conference on Multimodal interfaces. ACM, New York, NY, 119--126. Google ScholarDigital Library
- G. Chanel, C. Rebetez, M. Bétrancourt, and T. Pun. 2011. Emotion assessment from physiological signals for adaptation of game difficulty. IEEE Trans. Syst., Man Cybern. Part A Syst. Humans 41, 1052--1063. Google ScholarDigital Library
- C.-Y. Chen, Y.-K. Huang, and P. Cook. 2005. Visual/Acoustic emotion recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, Washington, DC, 1468--1471.Google Scholar
- G. Chetty and M. Wagner. 2008. A multilevel fusion approach for audiovisual emotion recognition. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 115--120.Google Scholar
- Z.-J. Chuang and C.-H. Wu. 2004. Multi-modal emotion recognition from speech and text. Int. J. Comput. Ling. Chin. Lang. Process. 9, 1--18.Google Scholar
- J. A. Coan. 2010. Emergent ghosts of the emotion machine. Emotion Rev. 2, 274--285.Google ScholarCross Ref
- J. Cohen. 1992. A power primer. Psychol. Bull. 112, 155--159.Google ScholarCross Ref
- C. Conati and H. Maclaren. 2009. Empirically building and evaluating a probabilistic model of user affect. User Model. User-Adapt. Interact. 19, 267--303. Google ScholarDigital Library
- C. Conati, S. Marsella, and A. Paiva. 2005. Affective interactions: The computer in the affective loop. In Proceedings of the 10th International Conference on Intelligent User Interfaces, J. Riedl and A. Jameson (Eds.). ACM, New York, NY, 7. Google ScholarDigital Library
- R. Cowie, E. Douglas-Cowie, and C. Cox. 2005. Beyond emotion archetypes: Databases for emotion modelling using neural networks. Neur. Netw. 18, 371--388. Google ScholarDigital Library
- R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Sig. Process. Mag. 18, 32--80.Google ScholarCross Ref
- D. Cueva, R. Gonçalves, F. Cozman, and M. Pereira-Barretto. 2011. Crawling to improve multimodal emotion detection. In Proceedings of the 10th Mexican International Conference on Artificial Intelligence (MICAI'11). Springer-Verlag, Puebla, Mexico, 343--350. Google ScholarDigital Library
- S. D'Mello. 2013. A selective meta-analysis on the relative incidence of discrete affective states during learning with technology. J. Educ. Psychology Psychol. 105, 1082--1099.Google ScholarCross Ref
- S. D'Mello and A. Graesser. 2007. Mind and body: Dialogue and posture for affect detection in learning environments. In Proceedings of the 13th International Conference on Artificial Intelligence in Education, R. Lukin et al. (Eds.). IOS Press, Amsterdam, 161--168. Google ScholarDigital Library
- S. D'Mello and A. Graesser. 2010. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model. User-Adap. Interact. 20, 147--187. Google ScholarDigital Library
- S. D'Mello and A. Graesser. 2012. AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Trans. Interact. Intell. Syst. 2, 23:22--23:39. Google ScholarDigital Library
- S. D'Mello and J. Kory. 2012. Consistent but modest: Comparing multimodal and unimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, L.-P. Morency, D. Bohus, H. Aghajan, A. Nijholt, J. Cassell and J. Epps (Eds.). ACM New York, NY, 31--38. Google ScholarDigital Library
- S. K. D'Mello and A. C. Graesser. 2014. Confusion. In International Handbook of Emotions in Education, R. Pekrun and L. Linnenbrink-Garcia (Eds.). Routledge, New York, NY, 289--310.Google Scholar
- S. D'Mello and A. Graesser. 2011. The half-life of cognitive-affective states during complex learning. Cognition Emotion 25, 1299--1308.Google ScholarCross Ref
- D. Datcu and L. Rothkrantz. 2011. Emotion recognition using bimodal data fusion. In Proceedings of the 12th International Conference on Computer Systems and Technologies. ACM, New York, NY, 122--128. Google ScholarDigital Library
- S. Dobrišek, R. Gajšek, F. Mihelič, N. Pavešić, and V. Štruc. 2013. Towards efficient multi-modal emotion recognition. Int. J. Adv. Robotic Syst. 10, 1--10.Google ScholarCross Ref
- E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. Mcrorie, J. C. Martin, L. Devillers, S. Abrilian, and A. Batliner. 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction. Springer, Berlin, 488--500. Google ScholarDigital Library
- S. Duval and R. Tweedie. 2000. Trim and fill: A simple funnel-plot--based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56, 455--463.Google ScholarCross Ref
- M. Dy, I. Espinosa, P. Go, C. Mendez, and J. Cu. 2010. Multimodal emotion recognition using a spontaneous Filipino emotion database. In Proceedings of the 3rd International Conference on Human-Centric Computing. IEEE, Washington, DC, 1--5.Google Scholar
- P. Ekman. 1992. An argument for basic emotions. Cognition Emotion 6, 169--200.Google ScholarCross Ref
- H. Elfenbein and N. Ambady. 2002. On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychol. Bull. 128, 203--235.Google ScholarCross Ref
- S. Emerich, E. Lupu, and A. Apatean. 2009. Emotions recognition by speech and facial expressions analysis. In Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009). Glasgow, Scotland.Google Scholar
- F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie. 2010. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Int. 3, 7--19.Google ScholarCross Ref
- F. Eyben, M. Wollmer, M. F. Valstar, H. Gunes, B. Schuller, and M. Pantic. 2011. String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In Ninth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011). IEEE, Santa Barbara, CA, 322--329.Google Scholar
- J. Fontaine, K. Scherer, E. Roesch, and P. Ellsworth. 2007. The world of emotions is not two-dimensional. Psychol. Sci. 18, 12 (Dec. 2007) 1050--1057.Google ScholarCross Ref
- K. Forbes-Riley and D. Litman. 2004. Predicting emotion in spoken dialogue from multiple knowledge sources. In Proceedings of the 4th Meeting of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 201--208.Google Scholar
- K. Forbes-Riley and D. J. Litman. 2011. Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor. Speech Commun. 53, 1115--1136. Google ScholarDigital Library
- R. Gajsek, V. Struc, and F. Mihelic. 2010. Multi-modal emotion recognition using canonical correlations and acoustic features. In Proceedings of the 20th International Conference on Pattern Recognition. IEEE, Washington, DC, 4133--4136. Google ScholarDigital Library
- M. Glodek, S. Reuter, M. Schels, K. Dietmayer, and F. Schwenker. 2013. Kalman filter based classifier fusion for affective state recognition. In Proceedings of the 11th International Workshop on Multiple Classifier Systems, Z.-H. Zhou, F. Roli, and J. Kittler (Eds.). Springer, Berlin, 85--94.Google Scholar
- M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, and G. Palm. 2011. Multiple classifier systems for the classification of audio-visual emotional states. In 4th International Conference on Affective Computing and Intelligent Interaction (ACII'11), S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer, Memphis, TN, 359--368. Google ScholarDigital Library
- S. Gong, C. Shan, and T. Xiang. 2007. Visual inference of human emotion and behaviour. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 22--29. Google ScholarDigital Library
- A. Graesser, B. McDaniel, P. Chipman, A. Witherspoon, S. D'Mello, and B. Gholson. 2006. Detection of emotions during learning with AutoTutor. In Proceedings of the 28th Annual Conference of the Cognitive Science Society, R. Sun and N. Miyake (Eds.). Cognitive Science Society, Austin, TX, 285--290.Google Scholar
- H. Gunes and M. Piccardi. 2005. Fusing face and body display for bi-modal emotion recognition: Single frame analysis and multi-frame post integration. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction (ACII'05), J. Tao and R. Picard (Eds.). Springer-Verlag, 102--111. Google ScholarDigital Library
- H. Gunes and M. Piccardi. 2009. Automatic temporal segment detection and affect recognition from face and body display. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 39, 64--84. Google ScholarDigital Library
- M. Han, J. Hsu, K.-T. Song, and F.-Y. Chang. 2007. A new information fusion method for SVM-based robotic audio-visual emotion recognition. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. IEEE, Washington, DC, 2656--2661.Google Scholar
- S. Haq and P. Jackson. 2009. Speaker-dependent audio-visual emotion recognition. In Proceedings of International Conference on Auditory-Visual Speech Processing, 53--58.Google Scholar
- S. Haq, P. Jackson, and J. Edge. 2008. Audio-visual feature selection and reduction for emotion classification. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 185--190.Google Scholar
- S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll. 2005. Bimodal fusion of emotional data in an automotive environment. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1085--1088.Google Scholar
- S. Hommel, A. Rabie, and U. Handmann. 2013. Attention and emotion based adaption of dialog systems. In Intelligent Systems: Models and Applications, E. Pap (Ed.). Springer-Verlag, Berlin, 215--235.Google Scholar
- M. Hoque and R. W. Picard. 2011. Acted vs. natural frustration and delight: Many people smile in natural frustration. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG'11). IEEE, Washington, DC, 354--359.Google Scholar
- M. Hussain, H. Monkaresi, and R. Calvo. 2012. Combining classifiers in multimodal affect detection. In Proceedings of the Australasian Data Mining Conference. Google ScholarDigital Library
- C. Izard. 2010. The many meanings/aspects of emotion: Definitions, functions, activation, and regulation. Emotion Rev. 2, 363--370.Google ScholarCross Ref
- C. E. Izard. 2007. Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspect. Psychol. Sci. 2, 260--280.Google ScholarCross Ref
- A. Jaimes and N. Sebe. 2007. Multimodal human-computer interaction: A survey. Comput. Vision Image Understanding 108, 116--134. Google ScholarDigital Library
- L. Jeni, J. Cohn, and F. De La Torre. 2013. Facing imbalanced data—Recommendations for the use of performance metrics. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII'13), A. Nijholt, S. K. D'Mello, and M. Pantic (Eds.). IEEE, Washington, DC, 245--251. Google ScholarDigital Library
- D. Jiang, Y. Cui, X. Zhang, P. Fan, I. Ganzalez, and H. Sahli. 2011. Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction, S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer-Verlag, 609--618. Google ScholarDigital Library
- J.-T. Joo, S.-W. Seo, K.-E. Ko, and K.-B. Sim. 2007. Emotion recognition method based on multimodal sensor fusion algorithm. In Proceedings of the 8th Symposium on Advanced Intelligent Systems. 200--204.Google Scholar
- C. Kaernbach. 2011. On dimensions in emotion psychology. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops. IEEE, Washington, DC, 792--796.Google ScholarCross Ref
- I. Kanluan, M. Grimm, and K. Kroschel. 2008. Audio-visual emotion recognition using an emotion space concept. In Proceedings of the 16th European Signal Processing Conference.Google Scholar
- A. Kapoor, B. Burleson, and R. Picard. 2007. Automatic prediction of frustration. Int. J. Hum.Comput. Stud. 65, 724--736. Google ScholarDigital Library
- A. Kapoor and R. Picard. 2005. Multimodal affect recognition in learning environments. In Proceedings of the 13th Annual ACM International Conference on Multimedia. ACM, New York, NY, 677--682. Google ScholarDigital Library
- K. Karpouzis, G. Caridakis, L. Kessous, N. Amir, A. Raouzaiou, L. Malatesta, and S. Kollias. 2007. Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition. In Artifical Intelligence for Human Computing, T. Huang (Ed.). Springer-Verlag, Berlin, 91--112. Google ScholarDigital Library
- L. Kessous, G. Castellano, and G. Caridakis. 2010. Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Int. 3, 33--48.Google ScholarCross Ref
- Z. Khalali and M. Moradi. 2009. Emotion recognition system using brain and peripheral signals: Using correlation dimension to improve the results of EEG. In Proceedings of International Joint Conference on Neural Networks. IEEE, Los Alamitos, CA, 1571--1575. Google ScholarDigital Library
- J. Kim. 2007. Bimodal emotion recognition using speech and physiological changes. In Robust Speech Recognition and Understanding, M. Grimm and K. Kroschel (Eds.). I-Tech, 265--280.Google Scholar
- J. Kim, E. André, M. Rehm, T. Vogt, and J. Wagner. 2005. Integrating information from speech and physiological signals to achieve emotional sensitivity. In Proceedings of 9th European Conference on Speech Communication and Technology. 809--812.Google Scholar
- J. Kim and F. Lingenfelser. 2010. Ensemble approaches to parametric decision fusion for bimodal emotion recognition. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing. BIOSTEC, 460--463.Google Scholar
- S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. 2012. Deap: A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 3, 18--31. Google ScholarDigital Library
- J. Kory and S. K. D'Mello. 2014. Affect elicitation for affective computing. In The Oxford Handbook of Affective Computing, R. Calvo, S. D'Mello, J. Gratch, and A. Kappas (Eds.). Oxford University Press, New York, NY.Google Scholar
- G. Krell, M. Glodek, A. Panning, I. Siegert, B. Michaelis, A. Wendemuth, and F. Schwenker. 2013. Fusion of fragmentary classifier decisions for affective state recognition. In Proceedings of the 1st International Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction, F. Schwenker, S. Scherer, and L.-P. Morency (Eds.). Springer-Verlag, Berlin, 116--130. Google ScholarDigital Library
- M. D. Lewis. 2005. Bridging emotion theory and neurobiology through dynamic systems modeling. Behav. Brain Sci. 28, 169--245.Google ScholarCross Ref
- J. Lin, C. Wu, and W. Wei. 2012. Error weighted semi-coupled hidden markov model for audio-visual emotion recognition. IEEE Trans. Multimedia 14, 142--156. Google ScholarDigital Library
- F. Lingenfelser, J. Wagner, and E. André. 2011. A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In Proceedings of the 13th International Conference on Multimodal Interfaces. ACM, New York, NY, 19--26. Google ScholarDigital Library
- M. W. Lipsey and D. B. Wilson. 2001. Practical meta-analysis. Sage Publications, Inc, Thousand Oaks, CA.Google Scholar
- D. Litman and K. Forbes-Riley. 2004. Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Barcelona, Spain, 352--359. Google ScholarDigital Library
- D. Litman and K. Forbes-Riley. 2006a. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.Google ScholarCross Ref
- D. J. Litman and K. Forbes-Riley. 2006b. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.Google ScholarCross Ref
- K. Lu and Y. Jia. 2012. Audio-visual emotion recognition with boosted coupled HMM. In Proceedings of the 21st International Conference on Pattern Recognition. IEEE, Washington, DC, 1148--1151.Google Scholar
- M. Mansoorizadeh and N. Charkari. 2010. Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools Appl. 49, 277--297. Google ScholarDigital Library
- D. McDuff, R. Kaliouby, and R. W. Picard. 2012. Crowdsourcing facial responses to online videos. IEEE Trans. Affective Comput. 3, 456--468. Google ScholarDigital Library
- G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder. 2012. The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans. Affective Comput. 3, 5--17. Google ScholarDigital Library
- A. Metallinou, S. Lee, and S. Narayanan. 2008. Audio-visual emotion recognition using Gaussian mixture models for face and voice. In Proceedings of the 10th IEEE International Symposium on Multimedia. IEEE, Washington, DC, 250--257. Google ScholarDigital Library
- A. Metallinou, M. Wollmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan. 2012. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affective Comput. 3, 184--198. Google ScholarDigital Library
- H. Monkaresi, M. S. Hussain, and R. Calvo. 2012. Classification of affects using head movement, skin color features and physiological signals. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, Washington, DC, 2664--2669.Google Scholar
- M. Nicolaou, H. Gunes, and M. Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence and arousal space. IEEE Trans. Affective Comput. 2, 92--105. Google ScholarDigital Library
- J. Ocumpaugh, R. Baker, S. Gowda, N. Heffernan, and C. Heffernan. 2014. Population validity for educational data mining: A case study in affect detection. Brit. J. Educ. Psychol. 45, 487--501.Google Scholar
- A. Ortony, G. Clore, and A. Collins. 1988. The Cognitive Structure of Emotions. Cambridge University Press, New York.Google Scholar
- P. Pal, A. Iyer, and R. Yantorno. 2006. Emotion detection from infant facial expressions and cries. In Proceedings. of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Washington, DC, 721--724.Google Scholar
- M. Paleari, R. Benmokhtar, and B. Huet. 2009. Evidence theory-based multimodal emotion recognition. In Proceedings of the 15th International Multimedia Modeling Conference (MMM'09). Springer-Verlag, 435--446. Google ScholarDigital Library
- B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--135. Google ScholarDigital Library
- M. Pantic and L. Rothkrantz. 2003. Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91, 1370--1390.Google ScholarCross Ref
- J. Park, G. Jang, and Y. Seo. 2012. Music-aided affective interaction between human and service robot. EURASIP J. Audio, Speech, Music Process. 2012, 1, 1--13.Google ScholarCross Ref
- R. Picard. 1997. Affective Computing. MIT Press, Cambridge, Mass. Google ScholarDigital Library
- R. Picard. 2010. Affective Computing: From Laughter to IEEE. IEEE Trans. Affective Comput. 1, 11--17. Google ScholarDigital Library
- R. Plutchik. 2001. The nature of emotions. American Scientist 89, 344--350.Google ScholarCross Ref
- A. Rabie, B. Wrede, T. Vogt, and M. Hanheide. 2009. Evaluation and discussion of multi-modal emotion recognition. In Proceedings of the Second International Conference on Computer and Electrical Engineering (ICCEE'09). IEEE Computer Society, 598--602. Google ScholarDigital Library
- M. Rashid, S. Abu-Bakar, and M. Mokji. 2012. Human emotion recognition from videos using spatio-temporal and audio features. Visual Comput. 29, 12, 1269--1275. Google ScholarDigital Library
- G. Rigoll, R. Muller, and B. Schuller. 2005. Speech emotion recognition exploiting acoustic and linguistic information sources. In Proceedings of the 10th International Conference Speech and Computer. 61--67.Google Scholar
- V. Rosas, R. Mihalcea, and L. Morency. 2013. Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28, 38--45. Google ScholarDigital Library
- E. Rosenberg. 1998. Levels of analysis and the organization of affect. Rev. Gen. Psychol. 2, 247--270.Google ScholarCross Ref
- E. Rosenberg and P. Ekman. 1994. Coherence between expressive and experiential systems in emotion. Cognition Emotion 8, 201--229.Google ScholarCross Ref
- V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, and R. Prasad. 2012. Ensemble of SVM trees for multimodal emotion recognition. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference. IEEE, Washington, DC, 1--4.Google Scholar
- J. Russell. 1994. Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychol. Bull. 115, 102--141.Google ScholarCross Ref
- J. Russell. 2003. Core affect and the psychological construction of emotion. Psychol. Rev. 110, 145--172.Google ScholarCross Ref
- J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols. 2003. Facial and vocal expressions of emotion. Ann. Rev. Psychol. 54, 329--349.Google ScholarCross Ref
- A. Savran, H. Cao, M. Shah, A. Nenkova, and R. Verma. 2012. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the 14th ACM International Conference on Multimodal Interaction. ACM Press, New York, NY, 485--492. Google ScholarDigital Library
- B. Schuller. 2011. Recognizing affect from linguistic information in 3D continuous space. IEEE Trans. Affective Comput. 2, 192--205. Google ScholarDigital Library
- B. Schuller, R. Müeller, B. Höernler, A. Höethker, H. Konosu, and G. Rigoll. 2007. Audiovisual recognition of spontaneous interest within conversations. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 30--37. Google ScholarDigital Library
- N. Sebe, I. Cohen, T. Gevers, and T. Huang. 2006. Emotion recognition based on joint visual and audio cues. In Proceedings of the 18th International Conference on Pattern Recognition. IEEE, Washington, DC, 1136--1139. Google ScholarDigital Library
- D. Seppi, A. Batliner, B. Schuller, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, and V. Aharonson. 2008. Patterns, prototypes, performance: Classifying emotional user states. In Proceedings of the 9th Annual Conference of the International Speech Communication Association, 601--604.Google Scholar
- C. Shan, S. Gong, and P. McOwan. 2007. Beyond facial expressions: Learning human emotion from body gestures. In Proceedings of the British Machine Vision Conference, 1--10.Google Scholar
- M. Soleymani, M. Pantic, and T. Pun. 2012. Multi-modal emotion recognition in response to videos. IEEE Trans. Affective Comput. 3, 211--223. Google ScholarDigital Library
- A. Sutdiffe. 2008. Multimedia user interface design. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, A. Sears and J. Jacko (Eds.). Taylor & Francis, New York, NY, 245--261. Google ScholarDigital Library
- S. S. Tomkins. 1962. Affect Imagery Consciousness: Volume I, The Positive Affects. Tavistock, London.Google Scholar
- B. Tu and F. Yu. 2012. Bimodal emotion recognition based on speech signals and facial expression. In Proceedings of the 6th International Conference on Intelligent Systems and Knowledge. Springer, Berlin, 691--696.Google Scholar
- J. Tukey and D. McLaughlin. 1963. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/Winsorization 1. Sankhyā: The Indian Journal of Statistics 25, 331--352.Google Scholar
- M. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer. 2012. Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 42, 966--979. Google ScholarDigital Library
- M. van der Zwaag, J. Janssen, and J. Westerink. 2013. Directing physiology and mood through music: Validation of an affective music player. IEEE Trans. Affective Comput. 4, 57--68. Google ScholarDigital Library
- H. Vu, Y. Yamazaki, F. Dong, and K. Hirota. 2011. Emotion recognition based on human gesture and speech information using RT middleware. In IEEE International Conference on Fuzzy Systems. IEEE, Washington, DC, 787--791.Google Scholar
- J. Wagner, E. Andre, F. Lingenfelser, J. Kim, and T. Vogt. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans. Affective Comput. 2, 206--218. Google ScholarDigital Library
- S. Walter, S. Scherer, M. Schels, M. Glodek, D. Hrabal, M. Schmidt, R. Böck, K. Limbrecht, H. Traue, and F. Schwenker. 2011. Multimodal emotion classification in naturalistic user behavior. In Proceedings of the International Conference on Human-Computer Interaction, J. Jacko (Ed.). Springer, Berlin, 603--611. Google ScholarDigital Library
- S. Wang, Y. Zhu, G. Wu, and Q. Ji. 2013. Hybrid video emotional tagging using users' EEG and video content. Multimed. Tools Appl. 1--27. Google ScholarDigital Library
- Y. Wang and L. Guan. 2005. Recognizing human emotion from audiovisual information. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1125--1128.Google Scholar
- Y. Wang and L. Guan. 2008. Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10, 936--946. Google ScholarDigital Library
- M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. 2008. Low-level fusion of audio and video feature for multi-modal emotion recognition. In Proceedings of the 3rd International Conference on Computer Vision Theory and Applications, 145--151.Google Scholar
- M. Wöllmer, M. Kaiser, F. Eyben, and B. Schuller. 2013a. LSTM modeling of continuous emotions in an audiovisual affect recognition framework. Image Vision Comput. 31, 2 (Feb. 2013), 153--163. Google ScholarDigital Library
- M. Wöllmer, A. Metallinou, F. Eyben, B. Schuller, and S. S. Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH'10). 2362--2365.Google Scholar
- M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun, K. Sagae, and L. Morency. 2013b. YouTube movie reviews: Sentiment analysis in an audiovisual context. IEEE Intell. Syst. 28, 46--53. Google ScholarDigital Library
- C. Wu and W. Liang. 2011. Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans. Affective Comput. 2, 10--21. Google ScholarDigital Library
- Z. Zeng, Y. Hu, Y. Fu, T. Huang, G. Roisman, and Z. Wen. 2006. Audio-visual emotion recognition in adult attachment interview. In Proceedings of the 8th International Conference on Multimodal Interfaces. ACM, Washington, DC, 139--145. Google ScholarDigital Library
- Z. Zeng, M. Pantic, G. Roisman, and T. Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39--58. Google ScholarDigital Library
- Z. Zeng, J. Tu, M. Liu, and T. Huang. 2005. Multi-stream confidence analysis for audio-visual affect recognition. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction, J. Tao., T. Tan. and R. Picard. (Eds.). Springer, Berlin, 964--971. Google ScholarDigital Library
- Z. Zeng, J. Tu, M. Liu, T. Huang, B. Pianfetti, D. Roth, and S. Levinson. 2007. Audio-visual affect recognition. IEEE Trans. Multimedia 9, 424--428. Google ScholarDigital Library
Index Terms
- A Review and Meta-Analysis of Multimodal Affect Detection Systems
Recommendations
Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features
We developed and evaluated a multimodal affect detector that combines conversational cues, gross body language, and facial features. The multimodal affect detector uses feature-level fusion to combine the sensory channels and linear discriminant ...
Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications
This survey describes recent progress in the field of Affective Computing (AC), with a focus on affect detection. Although many AC researchers have traditionally attempted to remain agnostic to the different emotion theories proposed by psychologists, ...
Communicative Signals and Social Contextual Factors in Multimodal Affect Recognition
ICMI '19: 2019 International Conference on Multimodal InteractionOne research branch in Affective Computing focuses on using multimodal ‘emotional’ expressions (e.g. facial expressions or non-verbal vocalisations) to automatically detect emotions and affect experienced by persons. The field is increasingly interested ...
Comments