skip to main content
survey

A Review and Meta-Analysis of Multimodal Affect Detection Systems

Published:17 February 2015Publication History
Skip Abstract Section

Abstract

Affect detection is an important pattern recognition problem that has inspired researchers from several areas. The field is in need of a systematic review due to the recent influx of Multimodal (MM) affect detection systems that differ in several respects and sometimes yield incompatible results. This article provides such a survey via a quantitative review and meta-analysis of 90 peer-reviewed MM systems. The review indicated that the state of the art mainly consists of person-dependent models (62.2% of systems) that fuse audio and visual (55.6%) information to detect acted (52.2%) expressions of basic emotions and simple dimensions of arousal and valence (64.5%) with feature- (38.9%) and decision-level (35.6%) fusion techniques. However, there were also person-independent systems that considered additional modalities to detect nonbasic emotions and complex dimensions using model-level fusion techniques. The meta-analysis revealed that MM systems were consistently (85% of systems) more accurate than their best unimodal counterparts, with an average improvement of 9.83% (median of 6.60%). However, improvements were three times lower when systems were trained on natural (4.59%) versus acted data (12.7%). Importantly, MM accuracy could be accurately predicted (cross-validated R2 of 0.803) from unimodal accuracies and two system-level factors. Theoretical and applied implications and recommendations are discussed.

References

  1. S. Afzal and P. Robinson. 2011. Natural affect data: Collection and annotation. In New Perspectives on Affect and Learning Technologies, R. Calvo and S. D'Mello (Eds.) Springer, New York, NY, 44--70.Google ScholarGoogle Scholar
  2. J. Bailenson, E. Pontikakis, I. Mauss, J. Gross, M. Jabon, C. Hutcherson, C. Nass, and O. John. 2008. Real-time classification of evoked emotions using facial feature tracking and physiological responses. Int. J. Hum. Comput. Stud. 66, 303--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Baltrušaitis, N. Banda, and P. Robinson. 2013. Dimensional affect recognition using continuous conditional random fields. In Proceedings of the International Conference on Multimedia and Expo (Workshop on Affective Analysis in Multimedia).Google ScholarGoogle Scholar
  4. N. Banda and P. Robinson. 2011. Noise analysis in audio-visual emotion recognition. In Proceedings of the 11th International Conference on Multimodal Interaction (ICMI).Google ScholarGoogle Scholar
  5. L. Barrett. 2006. Are emotions natural kinds? Perspect. Psychol. Sci. 1, 28--58.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Barrett, B. Mesquita, K. Ochsner, and J. Gross. 2007. The experience of emotion. Ann. Rev. Psychol. 58, 373--403.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Borenstein, L. V. Hedges, J. P. T. Higgins, and H. R. Rothstein. 2009. Introduction to Meta-Analysis. John Wiley & Sons, Inc., Hoboken, NJ.Google ScholarGoogle Scholar
  8. S. Brave and C. Nass. 2002. Emotion in human-computer interaction. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, J. Jacko and A. Sears (Eds.). Erlbaum Associates, Inc., Hillsdale, NJ, 81--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan. 2004. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proceedings of the 6th International Conference on Mutlmodal Interfaces (ICMI'04), R. Sharma, T. M. P. Darrell Harper, G. Lazzari and M. Turk (Eds.). ACM, State College, PA, 205--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Calvo, S. K. D'Mello, J. Gratch, and A. Kappas. 2014. The Oxford Handbook of Affective Computing. Oxford University Press, New York, NY.Google ScholarGoogle Scholar
  11. R. A. Calvo and S. K. D'Mello. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affect. Comput. 1 (2007), 18--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Caridakis, L. Malatesta, L. Kessous, N. Amir, A. Paouzaiou, and K. Karpouzis. 2006. Modeling naturalistic affective states via facial and vocal expression recognition. In International Conference on Multimidal Interfaces. ACM, New York, NY, 146--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Castellano, L. Kessous, and G. Caridakis. 2008. Emotion recognition through multiple modalities: Face, body gesture, speech. In Affect and Emotion in Human-Computer Interaction, C. Peter and R. Beale (Eds.). Lecture Notes in Computer Science, Vol. 4868. Springer, Berlin, 92--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Castellano, A. Pereira, I. Leite, A. Paiva, and P. McOwan. 2009. Detecting user engagement with a robot companion using task and social interaction-based features. In Proceedings of the 2009 International Conference on Multimodal interfaces. ACM, New York, NY, 119--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Chanel, C. Rebetez, M. Bétrancourt, and T. Pun. 2011. Emotion assessment from physiological signals for adaptation of game difficulty. IEEE Trans. Syst., Man Cybern. Part A Syst. Humans 41, 1052--1063. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C.-Y. Chen, Y.-K. Huang, and P. Cook. 2005. Visual/Acoustic emotion recognition. In Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, Washington, DC, 1468--1471.Google ScholarGoogle Scholar
  17. G. Chetty and M. Wagner. 2008. A multilevel fusion approach for audiovisual emotion recognition. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 115--120.Google ScholarGoogle Scholar
  18. Z.-J. Chuang and C.-H. Wu. 2004. Multi-modal emotion recognition from speech and text. Int. J. Comput. Ling. Chin. Lang. Process. 9, 1--18.Google ScholarGoogle Scholar
  19. J. A. Coan. 2010. Emergent ghosts of the emotion machine. Emotion Rev. 2, 274--285.Google ScholarGoogle ScholarCross RefCross Ref
  20. J. Cohen. 1992. A power primer. Psychol. Bull. 112, 155--159.Google ScholarGoogle ScholarCross RefCross Ref
  21. C. Conati and H. Maclaren. 2009. Empirically building and evaluating a probabilistic model of user affect. User Model. User-Adapt. Interact. 19, 267--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Conati, S. Marsella, and A. Paiva. 2005. Affective interactions: The computer in the affective loop. In Proceedings of the 10th International Conference on Intelligent User Interfaces, J. Riedl and A. Jameson (Eds.). ACM, New York, NY, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Cowie, E. Douglas-Cowie, and C. Cox. 2005. Beyond emotion archetypes: Databases for emotion modelling using neural networks. Neur. Netw. 18, 371--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. Taylor. 2001. Emotion recognition in human-computer interaction. IEEE Sig. Process. Mag. 18, 32--80.Google ScholarGoogle ScholarCross RefCross Ref
  25. D. Cueva, R. Gonçalves, F. Cozman, and M. Pereira-Barretto. 2011. Crawling to improve multimodal emotion detection. In Proceedings of the 10th Mexican International Conference on Artificial Intelligence (MICAI'11). Springer-Verlag, Puebla, Mexico, 343--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. D'Mello. 2013. A selective meta-analysis on the relative incidence of discrete affective states during learning with technology. J. Educ. Psychology Psychol. 105, 1082--1099.Google ScholarGoogle ScholarCross RefCross Ref
  27. S. D'Mello and A. Graesser. 2007. Mind and body: Dialogue and posture for affect detection in learning environments. In Proceedings of the 13th International Conference on Artificial Intelligence in Education, R. Lukin et al. (Eds.). IOS Press, Amsterdam, 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. D'Mello and A. Graesser. 2010. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model. User-Adap. Interact. 20, 147--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. D'Mello and A. Graesser. 2012. AutoTutor and affective autotutor: Learning by talking with cognitively and emotionally intelligent computers that talk back. ACM Trans. Interact. Intell. Syst. 2, 23:22--23:39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. D'Mello and J. Kory. 2012. Consistent but modest: Comparing multimodal and unimodal affect detection accuracies from 30 studies. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, L.-P. Morency, D. Bohus, H. Aghajan, A. Nijholt, J. Cassell and J. Epps (Eds.). ACM New York, NY, 31--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. K. D'Mello and A. C. Graesser. 2014. Confusion. In International Handbook of Emotions in Education, R. Pekrun and L. Linnenbrink-Garcia (Eds.). Routledge, New York, NY, 289--310.Google ScholarGoogle Scholar
  32. S. D'Mello and A. Graesser. 2011. The half-life of cognitive-affective states during complex learning. Cognition Emotion 25, 1299--1308.Google ScholarGoogle ScholarCross RefCross Ref
  33. D. Datcu and L. Rothkrantz. 2011. Emotion recognition using bimodal data fusion. In Proceedings of the 12th International Conference on Computer Systems and Technologies. ACM, New York, NY, 122--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Dobrišek, R. Gajšek, F. Mihelič, N. Pavešić, and V. Štruc. 2013. Towards efficient multi-modal emotion recognition. Int. J. Adv. Robotic Syst. 10, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  35. E. Douglas-Cowie, R. Cowie, I. Sneddon, C. Cox, O. Lowry, M. Mcrorie, J. C. Martin, L. Devillers, S. Abrilian, and A. Batliner. 2007. The HUMAINE database: Addressing the collection and annotation of naturalistic and induced emotional data. In Proceedings of the 2nd International Conference on Affective Computing and Intelligent Interaction. Springer, Berlin, 488--500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Duval and R. Tweedie. 2000. Trim and fill: A simple funnel-plot--based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56, 455--463.Google ScholarGoogle ScholarCross RefCross Ref
  37. M. Dy, I. Espinosa, P. Go, C. Mendez, and J. Cu. 2010. Multimodal emotion recognition using a spontaneous Filipino emotion database. In Proceedings of the 3rd International Conference on Human-Centric Computing. IEEE, Washington, DC, 1--5.Google ScholarGoogle Scholar
  38. P. Ekman. 1992. An argument for basic emotions. Cognition Emotion 6, 169--200.Google ScholarGoogle ScholarCross RefCross Ref
  39. H. Elfenbein and N. Ambady. 2002. On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychol. Bull. 128, 203--235.Google ScholarGoogle ScholarCross RefCross Ref
  40. S. Emerich, E. Lupu, and A. Apatean. 2009. Emotions recognition by speech and facial expressions analysis. In Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009). Glasgow, Scotland.Google ScholarGoogle Scholar
  41. F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie. 2010. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. J. Multimodal User Int. 3, 7--19.Google ScholarGoogle ScholarCross RefCross Ref
  42. F. Eyben, M. Wollmer, M. F. Valstar, H. Gunes, B. Schuller, and M. Pantic. 2011. String-based audiovisual fusion of behavioural events for the assessment of dimensional affect. In Ninth IEEE International Conference on Automatic Face and Gesture Recognition (FG 2011). IEEE, Santa Barbara, CA, 322--329.Google ScholarGoogle Scholar
  43. J. Fontaine, K. Scherer, E. Roesch, and P. Ellsworth. 2007. The world of emotions is not two-dimensional. Psychol. Sci. 18, 12 (Dec. 2007) 1050--1057.Google ScholarGoogle ScholarCross RefCross Ref
  44. K. Forbes-Riley and D. Litman. 2004. Predicting emotion in spoken dialogue from multiple knowledge sources. In Proceedings of the 4th Meeting of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 201--208.Google ScholarGoogle Scholar
  45. K. Forbes-Riley and D. J. Litman. 2011. Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor. Speech Commun. 53, 1115--1136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. R. Gajsek, V. Struc, and F. Mihelic. 2010. Multi-modal emotion recognition using canonical correlations and acoustic features. In Proceedings of the 20th International Conference on Pattern Recognition. IEEE, Washington, DC, 4133--4136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. M. Glodek, S. Reuter, M. Schels, K. Dietmayer, and F. Schwenker. 2013. Kalman filter based classifier fusion for affective state recognition. In Proceedings of the 11th International Workshop on Multiple Classifier Systems, Z.-H. Zhou, F. Roli, and J. Kittler (Eds.). Springer, Berlin, 85--94.Google ScholarGoogle Scholar
  48. M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, and G. Palm. 2011. Multiple classifier systems for the classification of audio-visual emotional states. In 4th International Conference on Affective Computing and Intelligent Interaction (ACII'11), S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer, Memphis, TN, 359--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Gong, C. Shan, and T. Xiang. 2007. Visual inference of human emotion and behaviour. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 22--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. A. Graesser, B. McDaniel, P. Chipman, A. Witherspoon, S. D'Mello, and B. Gholson. 2006. Detection of emotions during learning with AutoTutor. In Proceedings of the 28th Annual Conference of the Cognitive Science Society, R. Sun and N. Miyake (Eds.). Cognitive Science Society, Austin, TX, 285--290.Google ScholarGoogle Scholar
  51. H. Gunes and M. Piccardi. 2005. Fusing face and body display for bi-modal emotion recognition: Single frame analysis and multi-frame post integration. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction (ACII'05), J. Tao and R. Picard (Eds.). Springer-Verlag, 102--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. H. Gunes and M. Piccardi. 2009. Automatic temporal segment detection and affect recognition from face and body display. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 39, 64--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Han, J. Hsu, K.-T. Song, and F.-Y. Chang. 2007. A new information fusion method for SVM-based robotic audio-visual emotion recognition. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. IEEE, Washington, DC, 2656--2661.Google ScholarGoogle Scholar
  54. S. Haq and P. Jackson. 2009. Speaker-dependent audio-visual emotion recognition. In Proceedings of International Conference on Auditory-Visual Speech Processing, 53--58.Google ScholarGoogle Scholar
  55. S. Haq, P. Jackson, and J. Edge. 2008. Audio-visual feature selection and reduction for emotion classification. In Proceedings of the International Conference on Auditory-Visual Speech Processing, 185--190.Google ScholarGoogle Scholar
  56. S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll. 2005. Bimodal fusion of emotional data in an automotive environment. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1085--1088.Google ScholarGoogle Scholar
  57. S. Hommel, A. Rabie, and U. Handmann. 2013. Attention and emotion based adaption of dialog systems. In Intelligent Systems: Models and Applications, E. Pap (Ed.). Springer-Verlag, Berlin, 215--235.Google ScholarGoogle Scholar
  58. M. Hoque and R. W. Picard. 2011. Acted vs. natural frustration and delight: Many people smile in natural frustration. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG'11). IEEE, Washington, DC, 354--359.Google ScholarGoogle Scholar
  59. M. Hussain, H. Monkaresi, and R. Calvo. 2012. Combining classifiers in multimodal affect detection. In Proceedings of the Australasian Data Mining Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. C. Izard. 2010. The many meanings/aspects of emotion: Definitions, functions, activation, and regulation. Emotion Rev. 2, 363--370.Google ScholarGoogle ScholarCross RefCross Ref
  61. C. E. Izard. 2007. Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspect. Psychol. Sci. 2, 260--280.Google ScholarGoogle ScholarCross RefCross Ref
  62. A. Jaimes and N. Sebe. 2007. Multimodal human-computer interaction: A survey. Comput. Vision Image Understanding 108, 116--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. L. Jeni, J. Cohn, and F. De La Torre. 2013. Facing imbalanced data—Recommendations for the use of performance metrics. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII'13), A. Nijholt, S. K. D'Mello, and M. Pantic (Eds.). IEEE, Washington, DC, 245--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. D. Jiang, Y. Cui, X. Zhang, P. Fan, I. Ganzalez, and H. Sahli. 2011. Audio visual emotion recognition based on triple-stream dynamic bayesian network models. In Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction, S. D'Mello, A. Graesser, B. Schuller, and J. Martin (Eds.). Springer-Verlag, 609--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. J.-T. Joo, S.-W. Seo, K.-E. Ko, and K.-B. Sim. 2007. Emotion recognition method based on multimodal sensor fusion algorithm. In Proceedings of the 8th Symposium on Advanced Intelligent Systems. 200--204.Google ScholarGoogle Scholar
  66. C. Kaernbach. 2011. On dimensions in emotion psychology. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops. IEEE, Washington, DC, 792--796.Google ScholarGoogle ScholarCross RefCross Ref
  67. I. Kanluan, M. Grimm, and K. Kroschel. 2008. Audio-visual emotion recognition using an emotion space concept. In Proceedings of the 16th European Signal Processing Conference.Google ScholarGoogle Scholar
  68. A. Kapoor, B. Burleson, and R. Picard. 2007. Automatic prediction of frustration. Int. J. Hum.Comput. Stud. 65, 724--736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. A. Kapoor and R. Picard. 2005. Multimodal affect recognition in learning environments. In Proceedings of the 13th Annual ACM International Conference on Multimedia. ACM, New York, NY, 677--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. K. Karpouzis, G. Caridakis, L. Kessous, N. Amir, A. Raouzaiou, L. Malatesta, and S. Kollias. 2007. Modeling naturalistic affective states via facial, vocal, and bodily expressions recognition. In Artifical Intelligence for Human Computing, T. Huang (Ed.). Springer-Verlag, Berlin, 91--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. L. Kessous, G. Castellano, and G. Caridakis. 2010. Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Int. 3, 33--48.Google ScholarGoogle ScholarCross RefCross Ref
  72. Z. Khalali and M. Moradi. 2009. Emotion recognition system using brain and peripheral signals: Using correlation dimension to improve the results of EEG. In Proceedings of International Joint Conference on Neural Networks. IEEE, Los Alamitos, CA, 1571--1575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. J. Kim. 2007. Bimodal emotion recognition using speech and physiological changes. In Robust Speech Recognition and Understanding, M. Grimm and K. Kroschel (Eds.). I-Tech, 265--280.Google ScholarGoogle Scholar
  74. J. Kim, E. André, M. Rehm, T. Vogt, and J. Wagner. 2005. Integrating information from speech and physiological signals to achieve emotional sensitivity. In Proceedings of 9th European Conference on Speech Communication and Technology. 809--812.Google ScholarGoogle Scholar
  75. J. Kim and F. Lingenfelser. 2010. Ensemble approaches to parametric decision fusion for bimodal emotion recognition. In Proceedings of the International Conference on Bio-Inspired Systems and Signal Processing. BIOSTEC, 460--463.Google ScholarGoogle Scholar
  76. S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. 2012. Deap: A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 3, 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. J. Kory and S. K. D'Mello. 2014. Affect elicitation for affective computing. In The Oxford Handbook of Affective Computing, R. Calvo, S. D'Mello, J. Gratch, and A. Kappas (Eds.). Oxford University Press, New York, NY.Google ScholarGoogle Scholar
  78. G. Krell, M. Glodek, A. Panning, I. Siegert, B. Michaelis, A. Wendemuth, and F. Schwenker. 2013. Fusion of fragmentary classifier decisions for affective state recognition. In Proceedings of the 1st International Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction, F. Schwenker, S. Scherer, and L.-P. Morency (Eds.). Springer-Verlag, Berlin, 116--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. M. D. Lewis. 2005. Bridging emotion theory and neurobiology through dynamic systems modeling. Behav. Brain Sci. 28, 169--245.Google ScholarGoogle ScholarCross RefCross Ref
  80. J. Lin, C. Wu, and W. Wei. 2012. Error weighted semi-coupled hidden markov model for audio-visual emotion recognition. IEEE Trans. Multimedia 14, 142--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. F. Lingenfelser, J. Wagner, and E. André. 2011. A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In Proceedings of the 13th International Conference on Multimodal Interfaces. ACM, New York, NY, 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. M. W. Lipsey and D. B. Wilson. 2001. Practical meta-analysis. Sage Publications, Inc, Thousand Oaks, CA.Google ScholarGoogle Scholar
  83. D. Litman and K. Forbes-Riley. 2004. Predicting student emotions in computer-human tutoring dialogues. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Barcelona, Spain, 352--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. D. Litman and K. Forbes-Riley. 2006a. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.Google ScholarGoogle ScholarCross RefCross Ref
  85. D. J. Litman and K. Forbes-Riley. 2006b. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors. Speech Commun. 48, 559--590.Google ScholarGoogle ScholarCross RefCross Ref
  86. K. Lu and Y. Jia. 2012. Audio-visual emotion recognition with boosted coupled HMM. In Proceedings of the 21st International Conference on Pattern Recognition. IEEE, Washington, DC, 1148--1151.Google ScholarGoogle Scholar
  87. M. Mansoorizadeh and N. Charkari. 2010. Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools Appl. 49, 277--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. D. McDuff, R. Kaliouby, and R. W. Picard. 2012. Crowdsourcing facial responses to online videos. IEEE Trans. Affective Comput. 3, 456--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder. 2012. The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Trans. Affective Comput. 3, 5--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. A. Metallinou, S. Lee, and S. Narayanan. 2008. Audio-visual emotion recognition using Gaussian mixture models for face and voice. In Proceedings of the 10th IEEE International Symposium on Multimedia. IEEE, Washington, DC, 250--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. A. Metallinou, M. Wollmer, A. Katsamanis, F. Eyben, B. Schuller, and S. Narayanan. 2012. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affective Comput. 3, 184--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. H. Monkaresi, M. S. Hussain, and R. Calvo. 2012. Classification of affects using head movement, skin color features and physiological signals. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, Washington, DC, 2664--2669.Google ScholarGoogle Scholar
  93. M. Nicolaou, H. Gunes, and M. Pantic. 2011. Continuous prediction of spontaneous affect from multiple cues and modalities in valence and arousal space. IEEE Trans. Affective Comput. 2, 92--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. J. Ocumpaugh, R. Baker, S. Gowda, N. Heffernan, and C. Heffernan. 2014. Population validity for educational data mining: A case study in affect detection. Brit. J. Educ. Psychol. 45, 487--501.Google ScholarGoogle Scholar
  95. A. Ortony, G. Clore, and A. Collins. 1988. The Cognitive Structure of Emotions. Cambridge University Press, New York.Google ScholarGoogle Scholar
  96. P. Pal, A. Iyer, and R. Yantorno. 2006. Emotion detection from infant facial expressions and cries. In Proceedings. of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Washington, DC, 721--724.Google ScholarGoogle Scholar
  97. M. Paleari, R. Benmokhtar, and B. Huet. 2009. Evidence theory-based multimodal emotion recognition. In Proceedings of the 15th International Multimedia Modeling Conference (MMM'09). Springer-Verlag, 435--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. B. Pang and L. Lee. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. M. Pantic and L. Rothkrantz. 2003. Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91, 1370--1390.Google ScholarGoogle ScholarCross RefCross Ref
  100. J. Park, G. Jang, and Y. Seo. 2012. Music-aided affective interaction between human and service robot. EURASIP J. Audio, Speech, Music Process. 2012, 1, 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  101. R. Picard. 1997. Affective Computing. MIT Press, Cambridge, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. R. Picard. 2010. Affective Computing: From Laughter to IEEE. IEEE Trans. Affective Comput. 1, 11--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. R. Plutchik. 2001. The nature of emotions. American Scientist 89, 344--350.Google ScholarGoogle ScholarCross RefCross Ref
  104. A. Rabie, B. Wrede, T. Vogt, and M. Hanheide. 2009. Evaluation and discussion of multi-modal emotion recognition. In Proceedings of the Second International Conference on Computer and Electrical Engineering (ICCEE'09). IEEE Computer Society, 598--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. M. Rashid, S. Abu-Bakar, and M. Mokji. 2012. Human emotion recognition from videos using spatio-temporal and audio features. Visual Comput. 29, 12, 1269--1275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. G. Rigoll, R. Muller, and B. Schuller. 2005. Speech emotion recognition exploiting acoustic and linguistic information sources. In Proceedings of the 10th International Conference Speech and Computer. 61--67.Google ScholarGoogle Scholar
  107. V. Rosas, R. Mihalcea, and L. Morency. 2013. Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28, 38--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. E. Rosenberg. 1998. Levels of analysis and the organization of affect. Rev. Gen. Psychol. 2, 247--270.Google ScholarGoogle ScholarCross RefCross Ref
  109. E. Rosenberg and P. Ekman. 1994. Coherence between expressive and experiential systems in emotion. Cognition Emotion 8, 201--229.Google ScholarGoogle ScholarCross RefCross Ref
  110. V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, and R. Prasad. 2012. Ensemble of SVM trees for multimodal emotion recognition. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference. IEEE, Washington, DC, 1--4.Google ScholarGoogle Scholar
  111. J. Russell. 1994. Is there universal recognition of emotion from facial expression? A review of the cross-cultural studies. Psychol. Bull. 115, 102--141.Google ScholarGoogle ScholarCross RefCross Ref
  112. J. Russell. 2003. Core affect and the psychological construction of emotion. Psychol. Rev. 110, 145--172.Google ScholarGoogle ScholarCross RefCross Ref
  113. J. A. Russell, J. A. Bachorowski, and J. M. Fernandez-Dols. 2003. Facial and vocal expressions of emotion. Ann. Rev. Psychol. 54, 329--349.Google ScholarGoogle ScholarCross RefCross Ref
  114. A. Savran, H. Cao, M. Shah, A. Nenkova, and R. Verma. 2012. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the 14th ACM International Conference on Multimodal Interaction. ACM Press, New York, NY, 485--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. B. Schuller. 2011. Recognizing affect from linguistic information in 3D continuous space. IEEE Trans. Affective Comput. 2, 192--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. B. Schuller, R. Müeller, B. Höernler, A. Höethker, H. Konosu, and G. Rigoll. 2007. Audiovisual recognition of spontaneous interest within conversations. In Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, New York, NY, 30--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. N. Sebe, I. Cohen, T. Gevers, and T. Huang. 2006. Emotion recognition based on joint visual and audio cues. In Proceedings of the 18th International Conference on Pattern Recognition. IEEE, Washington, DC, 1136--1139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. D. Seppi, A. Batliner, B. Schuller, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, and V. Aharonson. 2008. Patterns, prototypes, performance: Classifying emotional user states. In Proceedings of the 9th Annual Conference of the International Speech Communication Association, 601--604.Google ScholarGoogle Scholar
  119. C. Shan, S. Gong, and P. McOwan. 2007. Beyond facial expressions: Learning human emotion from body gestures. In Proceedings of the British Machine Vision Conference, 1--10.Google ScholarGoogle Scholar
  120. M. Soleymani, M. Pantic, and T. Pun. 2012. Multi-modal emotion recognition in response to videos. IEEE Trans. Affective Comput. 3, 211--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. A. Sutdiffe. 2008. Multimedia user interface design. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, A. Sears and J. Jacko (Eds.). Taylor & Francis, New York, NY, 245--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. S. S. Tomkins. 1962. Affect Imagery Consciousness: Volume I, The Positive Affects. Tavistock, London.Google ScholarGoogle Scholar
  123. B. Tu and F. Yu. 2012. Bimodal emotion recognition based on speech signals and facial expression. In Proceedings of the 6th International Conference on Intelligent Systems and Knowledge. Springer, Berlin, 691--696.Google ScholarGoogle Scholar
  124. J. Tukey and D. McLaughlin. 1963. Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/Winsorization 1. Sankhyā: The Indian Journal of Statistics 25, 331--352.Google ScholarGoogle Scholar
  125. M. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer. 2012. Meta-analysis of the first facial expression recognition challenge. IEEE Trans. Syst., Man, Cybern. Part B Cybern. 42, 966--979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. M. van der Zwaag, J. Janssen, and J. Westerink. 2013. Directing physiology and mood through music: Validation of an affective music player. IEEE Trans. Affective Comput. 4, 57--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. H. Vu, Y. Yamazaki, F. Dong, and K. Hirota. 2011. Emotion recognition based on human gesture and speech information using RT middleware. In IEEE International Conference on Fuzzy Systems. IEEE, Washington, DC, 787--791.Google ScholarGoogle Scholar
  128. J. Wagner, E. Andre, F. Lingenfelser, J. Kim, and T. Vogt. 2011. Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans. Affective Comput. 2, 206--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. S. Walter, S. Scherer, M. Schels, M. Glodek, D. Hrabal, M. Schmidt, R. Böck, K. Limbrecht, H. Traue, and F. Schwenker. 2011. Multimodal emotion classification in naturalistic user behavior. In Proceedings of the International Conference on Human-Computer Interaction, J. Jacko (Ed.). Springer, Berlin, 603--611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. S. Wang, Y. Zhu, G. Wu, and Q. Ji. 2013. Hybrid video emotional tagging using users' EEG and video content. Multimed. Tools Appl. 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  131. Y. Wang and L. Guan. 2005. Recognizing human emotion from audiovisual information. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, Washington, DC, 1125--1128.Google ScholarGoogle Scholar
  132. Y. Wang and L. Guan. 2008. Recognizing human emotional state from audiovisual signals. IEEE Trans. Multimedia 10, 936--946. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. M. Wimmer, B. Schuller, D. Arsic, G. Rigoll, and B. Radig. 2008. Low-level fusion of audio and video feature for multi-modal emotion recognition. In Proceedings of the 3rd International Conference on Computer Vision Theory and Applications, 145--151.Google ScholarGoogle Scholar
  134. M. Wöllmer, M. Kaiser, F. Eyben, and B. Schuller. 2013a. LSTM modeling of continuous emotions in an audiovisual affect recognition framework. Image Vision Comput. 31, 2 (Feb. 2013), 153--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. M. Wöllmer, A. Metallinou, F. Eyben, B. Schuller, and S. S. Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH'10). 2362--2365.Google ScholarGoogle Scholar
  136. M. Wöllmer, F. Weninger, T. Knaup, B. Schuller, C. Sun, K. Sagae, and L. Morency. 2013b. YouTube movie reviews: Sentiment analysis in an audiovisual context. IEEE Intell. Syst. 28, 46--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. C. Wu and W. Liang. 2011. Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. IEEE Trans. Affective Comput. 2, 10--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Z. Zeng, Y. Hu, Y. Fu, T. Huang, G. Roisman, and Z. Wen. 2006. Audio-visual emotion recognition in adult attachment interview. In Proceedings of the 8th International Conference on Multimodal Interfaces. ACM, Washington, DC, 139--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Z. Zeng, M. Pantic, G. Roisman, and T. Huang. 2009. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31, 39--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Z. Zeng, J. Tu, M. Liu, and T. Huang. 2005. Multi-stream confidence analysis for audio-visual affect recognition. In Proceedings of the 1st International Conference on Affective Computing and Intelligent Interaction, J. Tao., T. Tan. and R. Picard. (Eds.). Springer, Berlin, 964--971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Z. Zeng, J. Tu, M. Liu, T. Huang, B. Pianfetti, D. Roth, and S. Levinson. 2007. Audio-visual affect recognition. IEEE Trans. Multimedia 9, 424--428. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Review and Meta-Analysis of Multimodal Affect Detection Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 47, Issue 3
      April 2015
      602 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/2737799
      • Editor:
      • Sartaj Sahni
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2015
      • Accepted: 1 September 2014
      • Revised: 1 April 2014
      • Received: 1 June 2013
      Published in csur Volume 47, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • survey
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader