Article

Using audio and video features to classify the most dominant person in a group meeting

Authors:
Hayley Hung

IDIAP Research Institute, Martigny, Switzerland

IDIAP Research Institute, Martigny, Switzerland
View Profile

,
Dinesh Jayagopi

IDIAP, Martigny, Switzerland & Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland

IDIAP, Martigny, Switzerland & Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland
View Profile

,
Chuohao Yeo

University of California, Berkeley

University of California, Berkeley
View Profile

,
Gerald Friedland

International Computer Science Institute (ICSI), Berkeley

International Computer Science Institute (ICSI), Berkeley
View Profile

,
Sileye Ba

IDIAP Research Institute, Martigny, Switzerland

IDIAP Research Institute, Martigny, Switzerland
View Profile

,
Jean-Marc Odobez

IDIAP, Martigny, Switzerland & Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland

IDIAP, Martigny, Switzerland & Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland
View Profile

,
Kannan Ramchandran

University of California, Berkeley

University of California, Berkeley
View Profile

,
Nikki Mirghafori

International Computer Science Institute (ICSI), Berkeley

International Computer Science Institute (ICSI), Berkeley
View Profile

,
Daniel Gatica-Perez

IDIAP, Martigny, Switzerland & Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland

IDIAP, Martigny, Switzerland & Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland
View Profile

MM '07: Proceedings of the 15th ACM international conference on MultimediaSeptember 2007Pages 835–838https://doi.org/10.1145/1291233.1291423

Published:29 September 2007Publication History

MM '07: Proceedings of the 15th ACM international conference on Multimedia

Pages 835–838

ABSTRACT

The automated extraction of semantically meaningful information from multi-modal data is becoming increasingly necessary due to the escalation of captured data for archival. A novel area of multi-modal data labelling, which has received relatively little attention, is the automatic estimation of the most dominant person in a group meeting. In this paper, we provide a framework for detecting dominance in group meetings using different audio and video cues. We show that by using a simple model for dominance estimation we can obtain promising results.

References

J. Ajmera and C. Wooters. A robust speaker clustering algorithm. In Proc. IEEE Automatic Speech Recognition Understanding Workshop, 2003.Google ScholarCross Ref
S. Basu, T. Choudhury, B. Clarkson, and A. Pentland. Learning human interactions with the influence model. In NIPS, 2001.Google Scholar
J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraiij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, M. McCowan, W. Post, D. Reidsma, and P. Wellner. The ami meeting corpus: A pre-announcement. In Proc. MLMI, 2005. Google ScholarDigital Library
D. Chai and K. N. Ngan. Face segmentation using skin color map in videophone applications. IEEE Transactions on Circuits and Systems for Video Technology, 9(4):551--564, 1999. Google ScholarDigital Library
S.-F. Chang. Compressed-domain techniques for image/video indexing and manipulation. In Proc. IEEE ICIP, pages 314--317, 1995. Google ScholarDigital Library
M. T. Coimbra and M. Davies. Approximating optical flow within the MPEG-2 compressed domain. IEEE Transactions on Circuits and Systems for Video Technology, 15(1):103--107, 2005. Google ScholarDigital Library
N. E. Dunbar and J. K. Burgoon. Perceptions of power and interactional dominance in interpersonal relationships. Journal of Social and Personal Relationships, 22(2):207--233, 2005.Google ScholarCross Ref
R. Rienks, D. Zhang, D. Gatica-Perez, and W. Post. Detection and application of influence rankings in small group meetings. In ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces, pages 257--264. ACM Press, 2006. Google ScholarDigital Library
J. Rosip and J. Hall. Knowledge of nonverbal cues, gender, and nonverbal decoding accuracy. Journal of Nonverbal Behavior, 28(4):267--286, December 2004.Google ScholarCross Ref
B. P. X. Anguera, C. Wooters and M. Aguilo. Robust speaker segmentation for meetings: The icsi-sri spring 2005 diarization system. In Proc. of NIST MLMI Meeting Recognition Workshop, Edinburgh, 2005. Google ScholarDigital Library
D. Zhang, D. Gatica-Perez, S. Bengio, and D. Roy. Learning influence among interacting Markov chains. In NIPS, 2005.Google Scholar

Index Terms

Using audio and video features to classify the most dominant person in a group meeting
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Investigating automatic dominance estimation in groups from visual attention and speaking activity
ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

We study the automation of the visual dominance ratio (VDR); a classic measure of displayed dominance in social psychology literature, which combines both gaze and speaking activity cues. The VDR is modified to estimate dominance in multi-party group ...
Read More
Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues
ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

This paper addresses the automatic estimation of two aspects of social verticality (status and dominance) in small-group meetings using nonverbal cues. The correlation of nonverbal behavior with these social constructs have been extensively documented ...
Read More
Predicting the dominant clique in meetings through fusion of nonverbal cues
MM '08: Proceedings of the 16th ACM international conference on Multimedia

This paper addresses the problem of automatically predicting the dominant clique (i.e., the set of K-dominant people) in face-to-face small group meetings recorded by multiple audio and video sensors. For this goal, we present a framework that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '07: Proceedings of the 15th ACM international conference on Multimedia
September 2007
1115 pages
ISBN:9781595937025
DOI:10.1145/1291233
General Chairs:
Rainer Lienhart
University of Augsburg, Germany
,
Anand R. Prasad
DoCoMo Euro-Labs,Germany
,
Program Chairs:
Alan Hanjalic
Delft University of Technology, The Netherlands
,
Sunghyun Choi
Seoul National University, South Korea
,
Brian Bailey
University of Illinois at Urbana-Champaign
,
Nicu Sebe
University of Amsterdam, The Netherlands
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 September 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
audio-visual feature extraction
data annotation
dominance modelling
meetings
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 418
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using audio and video features to classify the most dominant person in a group meeting

MM '07: Proceedings of the 15th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Investigating automatic dominance estimation in groups from visual attention and speaking activity

Predicting two facets of social verticality in meetings from five-minute time slices and nonverbal cues

Predicting the dominant clique in meetings through fusion of nonverbal cues