skip to main content
10.1145/3281151.3281152acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Detecting head movements in video-recorded dyadic conversations

Published:16 October 2018Publication History

ABSTRACT

This paper is about the automatic recognition of head movements in videos of face-to-face dyadic conversations. We present an approach where recognition of head movements is casted as a multimodal frame classification problem based on visual and acoustic features. The visual features include velocity, acceleration, and jerk values associated with head movements, while the acoustic ones are pitch and intensity measurements from the co-occuring speech. We present the results obtained by training and testing a number of classifiers on manually annotated data from two conversations. The best performing classifier, a Multilayer Perceptron trained using all the features, obtains 0.75 accuracy and outperforms the mono-modal baseline classifier.

References

  1. Jens Allwood. 1988. The Structure of Dialog. In Structure of Multimodal Dialog II, Martin M. Taylor, Francoise Neél, and Don G. Bouwhuis (Eds.). John Benjamins, Amsterdam, 3--24.Google ScholarGoogle Scholar
  2. Jens Allwood, Loredana Cerrato, Kristiina Jokinen, Costanza Navarretta, and Patrizia Paggio. 2007. The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. In Multimodal Corpora for Modelling Human Multimodal Behaviour, Jean-Claude Martin, Patrizia Paggio, Peter Kuehnlein, Rainer Stiefelhagen, and Fabio Pianesi (Eds.). Special issue of the International Journal of Language Resources and Evaluation, Vol. 41. Springer, 273--287.Google ScholarGoogle Scholar
  3. Paul Boersma and David Weenink. 2009. Praat: doing phonetics by computer (Version 5.1.05) {Computer program}. (2009). Retrieved May 1, 2009, from http://www.praat.org/.Google ScholarGoogle Scholar
  4. G. Bradski and A. Koehler. 2008. Learning OpenCV: Computer Vision with the OpenCV Linbrary. O'Reilly.Google ScholarGoogle Scholar
  5. Michael Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Philadelphia, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Marion Dohen, Hélène Lœvenbruck, and Hill Harold. 2006. Visual correlates of prosodic contrastive focus in French: description and inter-speaker variability. In Speech Prosody 2006. p-221.Google ScholarGoogle Scholar
  7. Starkey Duncan. 1972. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology 23 (1972), 283--292.Google ScholarGoogle ScholarCross RefCross Ref
  8. Sebastian Germesin and Theresa Wilson. 2009. Agreement detection in multiparty conversation. In Proceedings of ICMI-MLMI 2009. 7--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Björn Granström and David House. 2005. Audiovisual representation of prosody in expressive speech communication. Speech Communication 46, 3 (July 2005), 473--484.Google ScholarGoogle ScholarCross RefCross Ref
  10. U. Hadar, T.J. Steiner, E.C. Grant, and F. Clifford Rose. 1983. Head Movement Correlates of Juncture and Stress at Sentence Level. Language and Speech 26, 2 (April 1983), 117--129.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Heylen, E. Bevacqua, M. Tellier, and C. Pelachaud. 2007. Searching for prototypical facial feedback signals. In Proceedings of 7th International Conference on Intelligent Virtual Agents. 147--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bart Jongejan. 2012. Automatic annotation of head velocity and acceleration in Anvil. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Distribution Agency, 201--208.Google ScholarGoogle Scholar
  13. Bart Jongejan, Patrizia Paggio, and Costanza Navarretta. 2017. Classifying head movements in video-recorded conversations based on movement velocity, acceleration and jerk. In Proceedings of the 4th European and 7th Nordic Symposium on Multimodal Communication (MMSYM 2016), Copenhagen, 29--30 September 2016. LinkÃűping University Electronic Press, LinkÃűpings universitet, 10--17.Google ScholarGoogle Scholar
  14. Ashish Kapoor and Rosalind W. Picard. 2001. A Real-time Head Nod and Shake Detector. In Proceedings of the 2001 Workshop on Perceptive User Interfaces (PUI '01). ACM, New York, NY, USA, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Adam Kendon. 2004. Gesture. Cambridge University Press.Google ScholarGoogle Scholar
  16. Michael Kipp. 2004. Gesture Generation by Imitation - From Human Behavior to Computer Character Animation. Boca Raton, Florida: Dissertation.com.Google ScholarGoogle Scholar
  17. John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. (2001).Google ScholarGoogle Scholar
  18. Evelyn McClave. 2000. Linguistic functions of head movements in the context of speech. Journal of Pragmatics 32 (2000), 855--878.Google ScholarGoogle ScholarCross RefCross Ref
  19. Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In 2007 IEEE conference on computer vision and pattern recognition. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  20. L.-P. Morency, C. Sidner, C. Lee, and T. Darrell. 2005. Contextual recognition of head gestures. In Proc. Int. Conf. on Multimodal Interfaces (ICMI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Patrizia Paggio, Jens Allwood, Elisabeth Ahlsén, Kristiina Jokinen, and Costanza Navarretta. 2010. The NOMCO Multimodal Nordic Resource - Goals and Characteristics. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (19--21). European Language Resources Association (ELRA), Valletta, Malta.Google ScholarGoogle Scholar
  22. P. Paggio and C. Navarretta. 2011. Head Movements, Facial Expressions and Feedback in Danish First Encounters Interactions: A Culture-Specific Analysis. In Universal Access in Human-Computer Interaction - Users Diversity. 6th International Conference. UAHCI 2011, Held as Part of HCI International 2011 (LNCS), Constantine Stephanidis (Ed.). Springer Verlag, Orlando Florida, 583--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Patrizia Paggio and Costanza Navarretta. 2016. The Danish NOMCO corpus: multimodal interaction in first acquaintance conversations. Language Resources and Evaluation (2016), 1--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Tan and G. Rong. 2003. A real-time head nod and shake detector using HMMs. Expert Systems with Applications 25, 3 (2003), 461--466.Google ScholarGoogle ScholarCross RefCross Ref
  25. Nina Thorsen. 1980. Neutral stress, emphatic stress, and sentence Intonation in Advanced Standard Copenhagen Danish. Technical Report 14. University of Copenhagen. 121--205 pages. https://danpass.hum.ku.dk/ng/papers/aripuc14_1980_121-205.pdfGoogle ScholarGoogle Scholar
  26. Haolin Wei, Patricia Scanlon, Yingbo Li, David S Monaghan, and Noel E O'Connor. 2013. Real-time head nod and shake detection for continuous human affect recognition. In 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  27. Victor Yngve. 1970. On getting a word in edgewise. In Papers from the sixth regional meeting of the Chicago Linguistic Society. 567--578.Google ScholarGoogle Scholar
  28. Z. Zhao, Y. Wang, and S. Fu. 2012. Head Movement Recognition Based on the Lucas-Kanade Algorithm. In Computer Science Service System (CSSS), 2012 International Conference on. 2303--2306. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICMI '18: Proceedings of the 20th International Conference on Multimodal Interaction: Adjunct
    October 2018
    62 pages
    ISBN:9781450360029
    DOI:10.1145/3281151

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 16 October 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate453of1,080submissions,42%
  • Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader