ABSTRACT
"Free-standing conversational groups" are what we call the elementary building blocks of social interactions formed in settings when people are standing and congregate in groups. The automatic detection, analysis, and tracking of such structural conversational units captured on camera poses many interesting challenges for the research community. First, although delineating these formations is strongly linked to other behavioral cues such as head and body poses, finding methods that successfully describe and exploit these links is not obvious. Second, the use of visual data is crucial, but when analyzing crowded scenes, one must account for occlusions and low-resolution images. In this regard, the use of other sensing technologies such as wearable devices can facilitate the analysis of social interactions by complementing the visual information. Yet the exploitation of multiple modalities poses other challenges in terms of data synchronization, calibration, and fusion. In this chapter, we discuss recent advances in multimodal social scene analysis, in particular for the detection of conversational groups or F-formations [Kendon 1990]. More precisely, a multimodal joint head and body pose estimator is described and compared to other recent approaches for head and body pose estimation and F-formation detection. Experimental results on the recently published SALSA dataset are reported, they evidence the long road toward a fully automated high-precision social scene analysis framework.
Index Terms
- Multimodal analysis of free-standing conversational groups
Recommendations
Analyzing Free-standing Conversational Groups: A Multimodal Approach
MM '15: Proceedings of the 23rd ACM international conference on MultimediaDuring natural social gatherings, humans tend to organize themselves in the so-called free-standing conversational groups. In this context, robust head and body pose estimates can facilitate the higher-level description of the ongoing interplay. ...
Analysis environment of conversational structure with nonverbal multimodal data
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal InteractionThis paper shows the IMADE (Interaction Measurement, Analysis, and Design Environment) project to build a recording and anlyzing environment of human conversational interactions. The IMADE room is designed to record audio/visual, human-motion, eye ...
Expressive multimodal conversational acts for SAIBA agents
IVA'11: Proceedings of the 10th international conference on Intelligent virtual agentsWe discuss here the need to define what we call an agent conversational language, a language for Embodied Conversational Agents (ECA) to have conversations with a human. We propose a set of Expressive Multimodal Conversation Acts (EMCA), which is based ...
Comments