research-article

Detecting head movements in video-recorded dyadic conversations

Authors:
Patrizia Paggio

University of Copenhagen and University of Malta

University of Copenhagen and University of Malta
View Profile

,
Bart Jongejan

University of Copenhagen

University of Copenhagen
View Profile

,
Manex Agirrezabal

University of Copenhagen

University of Copenhagen
View Profile

,
Costanza Navarretta

University of Copenhagen

University of Copenhagen
View Profile

ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: AdjunctOctober 2018Article No.: 1Pages 1–6https://doi.org/10.1145/3281151.3281152

Published:16 October 2018Publication History

ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: Adjunct

Pages 1–6

ABSTRACT

This paper is about the automatic recognition of head movements in videos of face-to-face dyadic conversations. We present an approach where recognition of head movements is casted as a multimodal frame classification problem based on visual and acoustic features. The visual features include velocity, acceleration, and jerk values associated with head movements, while the acoustic ones are pitch and intensity measurements from the co-occuring speech. We present the results obtained by training and testing a number of classifiers on manually annotated data from two conversations. The best performing classifier, a Multilayer Perceptron trained using all the features, obtains 0.75 accuracy and outperforms the mono-modal baseline classifier.

References

Jens Allwood. 1988. The Structure of Dialog. In Structure of Multimodal Dialog II, Martin M. Taylor, Francoise Neél, and Don G. Bouwhuis (Eds.). John Benjamins, Amsterdam, 3--24.Google Scholar
Jens Allwood, Loredana Cerrato, Kristiina Jokinen, Costanza Navarretta, and Patrizia Paggio. 2007. The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In Multimodal Corpora for Modelling Human Multimodal Behaviour, Jean-Claude Martin, Patrizia Paggio, Peter Kuehnlein, Rainer Stiefelhagen, and Fabio Pianesi (Eds.). Special issue of the International Journal of Language Resources and Evaluation, Vol. 41. Springer, 273--287.Google Scholar
Paul Boersma and David Weenink. 2009. Praat: doing phonetics by computer (Version 5.1.05) {Computer program}. (2009). Retrieved May 1, 2009, from http://www.praat.org/.Google Scholar
G. Bradski and A. Koehler. 2008. Learning OpenCV: Computer Vision with the OpenCV Linbrary. O'Reilly.Google Scholar
Michael Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Philadelphia, 1--8. Google ScholarDigital Library
Marion Dohen, Hélène L&oelig;venbruck, and Hill Harold. 2006. Visual correlates of prosodic contrastive focus in French: description and inter-speaker variability. In Speech Prosody 2006. p-221.Google Scholar
Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23 (1972), 283--292.Google ScholarCross Ref
Sebastian Germesin and Theresa Wilson. 2009. Agreement detection in multiparty conversation. In Proceedings of ICMI-MLMI 2009. 7--14. Google ScholarDigital Library
Björn Granström and David House. 2005. Audiovisual representation of prosody in expressive speech communication. Speech Communication 46, 3 (July 2005), 473--484.Google ScholarCross Ref
U. Hadar, T.J. Steiner, E.C. Grant, and F. Clifford Rose. 1983. Head Movement Correlates of Juncture and Stress at Sentence Level. Language and Speech 26, 2 (April 1983), 117--129.Google ScholarCross Ref
D. Heylen, E. Bevacqua, M. Tellier, and C. Pelachaud. 2007. Searching for prototypical facial feedback signals. In Proceedings of 7th International Conference on Intelligent Virtual Agents. 147--153. Google ScholarDigital Library
Bart Jongejan. 2012. Automatic annotation of head velocity and acceleration in Anvil. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Distribution Agency, 201--208.Google Scholar
Bart Jongejan, Patrizia Paggio, and Costanza Navarretta. 2017. Classifying head movements in video-recorded conversations based on movement velocity, acceleration and jerk. In Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), Copenhagen, 29--30 September 2016. LinkÃűping University Electronic Press, LinkÃűpings universitet, 10--17.Google Scholar
Ashish Kapoor and Rosalind W. Picard. 2001. A Real-time Head Nod and Shake Detector. In Proceedings of the 2001 Workshop on Perceptive User Interfaces (PUI '01). ACM, New York, NY, USA, 1--5. Google ScholarDigital Library
Adam Kendon. 2004. Gesture. Cambridge University Press.Google Scholar
Michael Kipp. 2004. Gesture Generation by Imitation - From Human Behavior to Computer Character Animation. Boca Raton, Florida: Dissertation.com.Google Scholar
John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).Google Scholar
Evelyn McClave. 2000. Linguistic functions of head movements in the context of speech. Journal of Pragmatics 32 (2000), 855--878.Google ScholarCross Ref
Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In 2007 IEEE conference on computer vision and pattern recognition. IEEE, 1--8.Google ScholarCross Ref
L.-P. Morency, C. Sidner, C. Lee, and T. Darrell. 2005. Contextual recognition of head gestures. In Proc. Int. Conf. on Multimodal Interfaces (ICMI). Google ScholarDigital Library
Patrizia Paggio, Jens Allwood, Elisabeth Ahlsén, Kristiina Jokinen, and Costanza Navarretta. 2010. The NOMCO Multimodal Nordic Resource - Goals and Characteristics. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (19--21). European Language Resources Association (ELRA), Valletta, Malta.Google Scholar
P. Paggio and C. Navarretta. 2011. Head Movements, Facial Expressions and Feedback in Danish First Encounters Interactions: A Culture-Specific Analysis. In Universal Access in Human-Computer Interaction - Users Diversity. 6th International Conference. UAHCI 2011, Held as Part of HCI International 2011 (LNCS), Constantine Stephanidis (Ed.). Springer Verlag, Orlando Florida, 583--690. Google ScholarDigital Library
Patrizia Paggio and Costanza Navarretta. 2016. The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Language Resources and Evaluation (2016), 1--32. Google ScholarDigital Library
W. Tan and G. Rong. 2003. A real-time head nod and shake detector using HMMs. Expert Systems with Applications 25, 3 (2003), 461--466.Google ScholarCross Ref
Nina Thorsen. 1980. Neutral stress, emphatic stress, and sentence Intonation in Advanced Standard Copenhagen Danish. Technical Report 14. University of Copenhagen. 121--205 pages. https://danpass.hum.ku.dk/ng/papers/aripuc14_1980_121-205.pdfGoogle Scholar
Haolin Wei, Patricia Scanlon, Yingbo Li, David S Monaghan, and Noel E O'Connor. 2013. Real-time head nod and shake detection for continuous human affect recognition. In 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS). IEEE, 1--4.Google ScholarCross Ref
Victor Yngve. 1970. On getting a word in edgewise. In Papers from the sixth regional meeting of the Chicago Linguistic Society. 567--578.Google Scholar
Z. Zhao, Y. Wang, and S. Fu. 2012. Head Movement Recognition Based on the Lucas-Kanade Algorithm. In Computer Science Service System (CSSS), 2012 International Conference on. 2303--2306. Google ScholarDigital Library

Recommendations

Deep Transfer Learning for Recognizing Functional Interactions via Head Movements in Multiparty Conversations
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction

Head movements play various functions in multiparty conversations. To date, convolutional neural networks (CNNs) have been proposed to recognize the functions of individual interlocutors’ head movements. This paper extends the concept of head-movement ...
Read More
Classifying Head Movements to Separate Head-Gaze and Head Gestures as Distinct Modes of Input
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Head movement is widely used as a uniform type of input for human-computer interaction. However, there are fundamental differences between head movements coupled with gaze in support of our visual system, and head movements performed as gestural ...
Read More
Meaningful head movements driven by emotional synthetic speech

Speech-driven head movement methods are motivated by the strong coupling that exists between head movements and speech, providing an appealing solution to create behaviors that are timely synchronized with speech. This paper offers solutions for two of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: Adjunct
October 2018
62 pages
ISBN:9781450360029
DOI:10.1145/3281151

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
head movement classification
multimodal features
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 252
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Detecting head movements in video-recorded dyadic conversations

ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: Adjunct

ABSTRACT

References

Cited By

Recommendations

Deep Transfer Learning for Recognizing Functional Interactions via Head Movements in Multiparty Conversations

Classifying Head Movements to Separate Head-Gaze and Head Gestures as Distinct Modes of Input

Meaningful head movements driven by emotional synthetic speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Detecting head movements in video-recorded dyadic conversations

ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: Adjunct

ABSTRACT

References

Cited By

Recommendations

Deep Transfer Learning for Recognizing Functional Interactions via Head Movements in Multiparty Conversations

Classifying Head Movements to Separate Head-Gaze and Head Gestures as Distinct Modes of Input

Meaningful head movements driven by emotional synthetic speech

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media