ABSTRACT
The automated extraction of semantically meaningful information from multi-modal data is becoming increasingly necessary due to the escalation of captured data for archival. A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting. In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. We show that by using a simple model for dominance estimation we can obtain promising results.
- J. Ajmera and C. Wooters. A robust speaker clustering algorithm. In Proc. IEEE Automatic Speech Recognition Understanding Workshop, 2003.Google ScholarCross Ref
- S. Basu, T. Choudhury, B. Clarkson, and A. Pentland. Learning human interactions with the influence model. In NIPS, 2001.Google Scholar
- J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraiij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, M. McCowan, W. Post, D. Reidsma, and P. Wellner. The ami meeting corpus: A pre-announcement. In Proc. MLMI, 2005. Google ScholarDigital Library
- D. Chai and K. N. Ngan. Face segmentation using skin color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology, 9(4):551--564, 1999. Google ScholarDigital Library
- S.-F. Chang. Compressed-domain techniques for image/video indexing and manipulation. In Proc. IEEE ICIP, pages 314--317, 1995. Google ScholarDigital Library
- M. T. Coimbra and M. Davies. Approximating optical flow within the MPEG-2 compressed domain. IEEE Transactions on Circuits and Systems for Video Technology, 15(1):103--107, 2005. Google ScholarDigital Library
- N. E. Dunbar and J. K. Burgoon. Perceptions of power and interactional dominance in interpersonal relationships. Journal of Social and Personal Relationships, 22(2):207--233, 2005.Google ScholarCross Ref
- R. Rienks, D. Zhang, D. Gatica-Perez, and W. Post. Detection and application of influence rankings in small group meetings. In ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces, pages 257--264. ACM Press, 2006. Google ScholarDigital Library
- J. Rosip and J. Hall. Knowledge of nonverbal cues, gender, and nonverbal decoding accuracy. Journal of Nonverbal Behavior, 28(4):267--286, December 2004.Google ScholarCross Ref
- B. P. X. Anguera, C. Wooters and M. Aguilo. Robust speaker segmentation for meetings: The icsi-sri spring 2005 diarization system. In Proc. of NIST MLMI Meeting Recognition Workshop, Edinburgh, 2005. Google ScholarDigital Library
- D. Zhang, D. Gatica-Perez, S. Bengio, and D. Roy. Learning influence among interacting Markov chains. In NIPS, 2005.Google Scholar
Index Terms
- Using audio and video features to classify the most dominant person in a group meeting
Recommendations
Investigating automatic dominance estimation in groups from visual attention and speaking activity
ICMI '08: Proceedings of the 10th international conference on Multimodal interfacesWe study the automation of the visual dominance ratio (VDR); a classic measure of displayed dominance in social psychology literature, which combines both gaze and speaking activity cues. The VDR is modified to estimate dominance in multi-party group ...
Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues
ICMI '08: Proceedings of the 10th international conference on Multimodal interfacesThis paper addresses the automatic estimation of two aspects of social verticality (status and dominance) in small-group meetings using nonverbal cues. The correlation of nonverbal behavior with these social constructs have been extensively documented ...
Predicting the dominant clique in meetings through fusion of nonverbal cues
MM '08: Proceedings of the 16th ACM international conference on MultimediaThis paper addresses the problem of automatically predicting the dominant clique (i.e., the set of K-dominant people) in face-to-face small group meetings recorded by multiple audio and video sensors. For this goal, we present a framework that ...
Comments