ABSTRACT
This paper addresses the issue of how linguistic feedback expressions, prosody and head gestures, i.e. head movements and face expressions, relate to one another in a collection of eight video-recorded Danish map-task dialogues. The study shows that in these data, prosodic features and head gestures significantly improve automatic classification of dialogue act labels for linguistic expressions of feedback.
- }}Jens Allwood, Loredana Cerrato, Kristiina Jokinen, Costanza Navarretta, and Patrizia Paggio. 2007. The MUMIN Coding Scheme for the Annotation of Feedback, Turn Management and Sequencing. Multimodal Corpora for Modelling Human Multimodal Behaviour. Special Issue of the International Journal of Language Resources and Evaluation, 41(3--4):273--287.Google Scholar
- }}Anne H. Anderson, Miles Bader, Ellen Gurman Bard, Elizabeth Boyle, Gwyneth Doherty, Simon Garrod, Stephen Isard, Jacqueline Kowtko, Jan McAllister, Jim Miller, Catherine Sotillo, Henry S. Thompson, and Regina Weinert. 1991. The HCRC Map Task Corpus. Language and Speech, 34:351--366.Google ScholarCross Ref
- }}Ron Artstein and Massimo Poesio. 2008. Inter-Coder Agreement for Computational Linguistics. Computational Linguistics, 34(4):555--596. Google ScholarDigital Library
- }}Paul Boersma and David Weenink, 2009. Praat: doing phonetics by computer. Retrieved May 1, 2009, from http://www.praat.org/.Google Scholar
- }}Robert L. Brennan and Dale J. Prediger. 1981. Co-efficient Kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41:687--699.Google ScholarCross Ref
- }}Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37--46.Google ScholarCross Ref
- }}Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bullettin, 76(5):378--382.Google ScholarCross Ref
- }}Shinya Fujie, Y. Ejiri, K. Nakajima, Y Matsusaka, and Tetsunor Kobayashi. 2004. A conversation robot using head gesture recognition as para-linguistic information. In Proceedings of the 13th IEEE International Workshop on Robot and Human Interactive Communication, pages 159--164, september.Google ScholarCross Ref
- }}Agustin Gravano and Julia Hirschberg. 2009. Turnyielding cues in task-oriented dialogue. In Proceedings of SIGDIAL 2009: the 10th Annual Meeting of the Special Interest Group in Discourse and Dialogue, September 2009, pages 253--261, Queen Mary University of London. Google ScholarDigital Library
- }}Nina Grønnum. 2006. DanPASS - a Danish phonetically annotated spontaneous speech corpus. In N. Calzolari, K. Choukri, A. Gangemi, B. Maegaard, J. Mariani, J. Odijk, and D. Tapias, editors, Proceedings of the 5th LREC, pages 1578--1583, Genoa, May.Google Scholar
- }}Kristiina Jokinen and Anton Ragni. 2007. Clustering experiments on the communicative prop- erties of gaze and gestures. In Proceeding of the 3rd. Baltic Conference on Human Language Technologies, Kaunas, Lithuania, October.Google Scholar
- }}Kristiina Jokinen, Costanza Navarretta, and Patrizia Paggio. 2008. Distinguishing the communicative functions of gestures. In Proceedings of the 5th MLMI, LNCS 5237, pages 38--49, Utrecht, The Netherlands, September. Springer. Google ScholarDigital Library
- }}Michael Kipp. 2004. Gesture Generation by Imitation - From Human Behavior to Computer Character Animation. Ph.D. thesis, Saarland University, Saarbruecken, Germany, Boca Raton, Florida, dissertation.com.Google Scholar
- }}Max M. Louwerse, Patrick Jeuniaux, Mohammed E. Hoque, Jie Wu, and Gwineth Lewis. 2006. Multimodal communication in computer-mediated map task scenarios. In R. Sun and N. Miyake, editors, Proceedings of the 28th Annual Conference of the Cognitive Science Society, pages 1717--1722, Mahwah, NJ: Erlbaum.Google Scholar
- }}Max M. Louwerse, Nick Benesh, Mohammed E. Hoque, Patrick Jeuniaux, Gwineth Lewis, Jie Wu, and Megan Zirnstein. 2007. Multimodal communication in face-to-face conversations. In R. Sun and N. Miyake, editors, Proceedings of the 29th Annual Conference of the Cognitive Science Society, pages 1235--1240, Mahwah, NJ: Erlbaum.Google Scholar
- }}Evelyn McClave. 2000. Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32:855--878.Google ScholarCross Ref
- }}Louis-Philippe Morency, Candace Sidner, Christopher Lee, and Trevor Darrell. 2005. Contextual Recognition of Head Gestures. In Proceedings of the International Conference on Multi-modal Interfaces. Google ScholarDigital Library
- }}Louis-Philippe Morency, Candace Sidner, Christopher Lee, and Trevor Darrell. 2007. Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence, 171(8--9):568--585. Google ScholarDigital Library
- }}Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2009. A probabilistic multimodal approach for predicting listener backchannels. Autonomous Agents and Multi-Agent Systems, 20:70--84, Springer. Google ScholarDigital Library
- }}Gabriel Murray and Steve Renals. 2008. Detecting Action Meetings in Meetings. In Proceedings of the 5th MLMI, LNCS 5237, pages 208--213, Utrecht, The Netherlands, September. Springer. Google ScholarDigital Library
- }}Harm Rieks op den Akker and Christian Schulz. 2008. Exploring features and classifiers for dialogue act segmentation. In Proceedings of the 5th MLMI, pages 196--207. Google ScholarDigital Library
- }}Patrizia Paggio and Costanza Navarretta. 2010. Feedback in Head Gesture and Speech. To appear in Proceedings of 7th Conference on Language Resources and Evaluation (LREC-2010), Malta, May.Google Scholar
- }}Dennis Reidsma, Dirk Heylen, and Harm Rieks op den Akker. 2009. On the Contextual Analysis of Agreement Scores. In Michael Kipp, Jean-Claude Martin, Patrizia Paggio, and Dirk Heylen, editors, Multimodal Corpora From Models of Natural Interaction to Systems and Applications, number 5509 in Lecture Notes in Artificial Intelligence, pages 122--137. Springer. Google ScholarDigital Library
- }}Vivek Kumar Rangarajan Sridhar, Srinivas Bangaloreb, and Shrikanth Narayanan. 2009. Combining lexical, syntactic and prosodic cues for improved online dialog act tagging. Computer Speech & Language, 23(4):407--422. Google ScholarDigital Library
- }}Ian H. Witten and Eibe Frank. 2005. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, second edition. Google ScholarDigital Library
- }}Harry Zhang, Liangxiao Jiang, and Jiang Su. 2005. Hidden Naive Bayes. In Proceedings of the Twentieth National Conference on Artificial Intelligence, pages 919--924. Google ScholarDigital Library
Index Terms
- Classification of feedback expressions in multimodal data
Recommendations
Multimodal feedback in first encounter interactions
HCI'13: Proceedings of the 15th international conference on Human-Computer Interaction: interaction modalities and techniques - Volume Part IVHuman interactions are predominantly conducted via verbal communication which allows presentation of sophisticated propositional content. However, much of the interpretation of the utterances and the speaker's attitudes are conveyed using multimodal ...
Multimodal human discourse: gesture and speech
Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the ...
Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training
Emphasis plays an important role in expressive speech synthesis in highlighting the focus of an utterance to draw the attention of the listener. We present a hidden Markov model (HMM)-based emphatic speech synthesis model. The ultimate objective is to ...
Comments