AutoManner: An Automated Interface for Making Public Speakers Aware of Their Mannerisms

Authors:
M. Iftekhar Tanveer

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Ru Zhao

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Kezhen Chen

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Zoe Tiet

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

,
Mohammed Ehsan Hoque

University of Rochester, Rochester, NY, USA

University of Rochester, Rochester, NY, USA
View Profile

IUI '16: Proceedings of the 21st International Conference on Intelligent User InterfacesMarch 2016Pages 385–396https://doi.org/10.1145/2856767.2856785

Published:07 March 2016Publication History

IUI '16: Proceedings of the 21st International Conference on Intelligent User Interfaces

Pages 385–396

ABSTRACT

Many individuals exhibit unconscious body movements called mannerisms while speaking. These repeated changes often distract the audience when not relevant to the verbal context. We present an intelligent interface that can automatically extract human gestures using Microsoft Kinect to make speakers aware of their mannerisms. We use a sparsity-based algorithm, Shift Invariant Sparse Coding, to automatically extract the patterns of body movements. These patterns are displayed in an interface with subtle question and answer-based feedback scheme that draws attention to the speaker's body language. Our formal evaluation with 27 participants shows that the users became aware of their body language after using the system. In addition, when independent observers annotated the accuracy of the algorithm for every extracted pattern, we find that the patterns extracted by our algorithm is significantly (p<0.001) more accurate than just random selection. This represents a strong evidence that the algorithm is able to extract human-interpretable body movement patterns. An interactive demo of AutoManner is available at http://tinyurl.com/AutoManner.

References

Aggarwal, J. K., et al. Human activity analysis: A review. ACM Computing Surveys (CSUR) (2011). Google ScholarDigital Library
Anderson, K., André, E., Baur, T., Bernardini, S., Chollet, M., Chryssafidou, E., Damian, I., Ennis, C., Egges, A., Gebhard, P., et al. The tardis framework: intelligent virtual agents for social coaching in job interviews. In Advances in Computer Entertainment. Springer, 2013, 476--491. Google ScholarDigital Library
Batrinca, L., Stratou, G., Shapiro, A., Morency, L.-P., and Scherer, S. Cicero-towards a multimodal virtual audience platform for public speaking training. In Intelligent Virtual Agents, Springer (2013), 116--128.Google Scholar
Battiti, R. Accelerated backpropagation learning: Two optimization methods. Complex systems 3, 4 (1989), 331--342.Google Scholar
Beck, A., and Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 1 (2009), 183--202. Google ScholarDigital Library
Chen, L., Feng, G., Joe, J., Leong, C. W., Kitchen, C., and Lee, C. M. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th International Conference on Multimodal Interaction, ACM (2014), 200--203. Google ScholarDigital Library
Cheng, G., et al. Advances in human action recognition: A survey. arXiv preprint arXiv:1501.05964 (2015).Google Scholar
Chollet, M., Wörtwein, T., Morency, L.-P., Shapiro, A., and Scherer, S. Exploring feedback strategies to improve public speaking: an interactive virtual audience framework. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ACM (2015), 1143--1154. Google ScholarDigital Library
D'Arcy, J. Technically speaking: A guide for communicating complex information. Battelle Press Columbus, OH, 1998.Google Scholar
de Gelder, B. Why bodies' twelve reasons for including bodily expressions in affective neuroscience. Philosophical Transactions of the Royal Society B: Biological Sciences 364, 1535 (2009), 3475--3484.Google ScholarCross Ref
DiMatteo, M. R., Hays, R. D., and Prince, L. M. Relationship of physicians' nonverbal communication skill to patient satisfaction, appointment noncompliance, and physician workload. Health Psychology 5, 6 (1986), 581.Google ScholarCross Ref
Fay, M. P., and Proschan, M. A. Wilcoxon-mann-whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics surveys 4 (2010), 1.Google Scholar
Hoogterp, B. Your Perfect Presentation: Speak in Front of Any Audience Anytime Anywhere and Never Be Nervous Again. McGraw-Hill Education, 2014.Google Scholar
Hoque, M. E., Courgeon, M., Martin, J.-C., Mutlu, B., and Picard, R. W. Mach: My automated conversation coach. In Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, ACM (2013), 697--706. Google ScholarDigital Library
Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of educational psychology 24, 6 (1933), 417.Google Scholar
Knapp, M., Hall, J., and Horgan, T. Nonverbal communication in human interaction. Cengage Learning, 2013.Google Scholar
Likert, R. A technique for the measurement of attitudes. Archives of psychology (1932).Google Scholar
Lucas, S. E. The art of public speaking. International Book Publishing Company, 2008. Google ScholarDigital Library
Metaxas, D., and Zhang, S. A review of motion analysis methods for human nonverbal communication computing. Image and Vision Computing (2013). Google ScholarDigital Library
Mitra, T., Hutto, C., and Gilbert, E. Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 1345--1354. Google ScholarDigital Library
Mørup, M., et al. Shift invariant sparse coding of image and music data. Tech. Rep. IMM2008-04659, Technical University of Denmark, 2008.Google Scholar
Murphy, J. The power of your subconscious mind. Courier Corporation, 2012.Google Scholar
Naim, I., Tanveer, M. I., Gildea, D., and Hoque, M. E. Automated prediction and analysis of job interview performance: The role of what you say and how you say it. Automatic Face and Gesture Recognition (FG) (2015).Google Scholar
Nguyen, A.-T., Chen, W., and Rauterberg, M. Online feedback system for public speakers. In E-Learning, E-Management and E-Services (IS3e), 2012 IEEE Symposium on, IEEE (2012), 1--5.Google Scholar
Niebles, J. C., Wang, H., and Fei-Fei, L. Unsupervised learning of human action categories using spatial-temporal words. International journal of computer vision 79, 3 (2008), 299--318. Google ScholarDigital Library
Park, S., Shoemark, P., and Morency, L.-P. Toward crowdsourcing micro-level behavior annotations: the challenges of interface, training, and generalization. In Proceedings of the 19th international conference on Intelligent User Interfaces, ACM (2014), 37--46. Google ScholarDigital Library
Pfister, T., and Robinson, P. Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. Affective Computing, IEEE Transactions on 2, 2 (2011), 66--78. Google ScholarDigital Library
Ranganath, R., Jurafsky, D., and McFarland, D. It's not you, it's me: detecting flirting and its misperception in speed-dates. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, Association for Computational Linguistics (2009), 334--342. Google ScholarDigital Library
Schreiber, L. M., Paul, G. D., and Shibley, L. R. The development and test of the public speaking competence rubric. Communication Education 61, 3 (2012), 205--233.Google ScholarCross Ref
Shim, H. S., Park, S., Chatterjee, M., Scherer, S., Sagae, K., and Morency, L.-P. Acoustic and para-verbal indicators of persuasiveness in social multimedia. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, IEEE (2015), 2239--2243.Google ScholarCross Ref
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. Real-time human pose recognition in parts from single depth images. Communications of the ACM 56, 1 (2013), 116--124. Google ScholarDigital Library
Strangert, E., and Gustafson, J. What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations. In INTERSPEECH, vol. 8 (2008), 1688--1691.Google Scholar
Tanaka, H., Sakti, S., Neubig, G., Toda, T., Negoro, H., Iwasaka, H., and Nakamura, S. Automated social skills trainer. In Proceedings of the 20th International Conference on Intelligent User Interfaces, ACM (2015), 17--27. Google ScholarDigital Library
Tanveer, M. I., Lin, E., and Hoque, M. E. Rhema: A real-time in-situ intelligent interface to help people with public speaking. In Proceedings of the 20th International Conference on Intelligent User Interfaces, ACM (2015), 286--295. Google ScholarDigital Library
Tanveer, M. I., Liu, J., and Hoque, M. E. Unsupervised extraction of human-interpretable nonverbal behavioral cues in a public speaking scenario. In ACM Multimedia (ACMMM'15) (2015). Google ScholarDigital Library
Toastmasters International. Gestures: Your body speaks. Online Document. Available at http://web.mst.edu/?toast/docs/Gestures.pdf, 2011.Google Scholar
Vinciarelli, A., Pantic, M., and Bourlard, H. Social signal processing: Survey of an emerging domain. Image and Vision Computing 27, 12 (2009), 1743--1759. Google ScholarDigital Library
Wilson, T. D. Strangers to ourselves. Harvard University Press, 2004.Google ScholarCross Ref
Zhang, Z. Microsoft kinect sensor and its effect. MultiMedia, IEEE 19, 2 (2012), 4--10. Google ScholarDigital Library
Zhou, F., et al. Aligned cluster analysis for temporal segmentation of human motion. In FG'08 (2008).Google ScholarCross Ref

Index Terms

AutoManner: An Automated Interface for Making Public Speakers Aware of Their Mannerisms
1. Computing methodologies
  1. Machine learning
2. Human-centered computing

Recommendations

A Pen-Based Prosodic User Interface for Schoolchildren

A prosodic user interface is defined as an user interface that can deal with not only what is entered by the user but also how it is entered. A pen-based user interface provides more prosodic information than a mouse-based graphical user interface. The ...
Read More
A Multimodal System for Public Speaking with Real Time Feedback
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

We have developed a multimodal prototype for public speaking with real time feedback using the Microsoft Kinect. Effective speaking involves use of gesture, facial expression, posture, voice as well as the spoken word. These modalities combine to give ...
Read More
Mute robot: cooperative gameplay through body language communication
CHI EA '14: CHI '14 Extended Abstracts on Human Factors in Computing Systems

Body language is an expressive form of communication that transcends language barriers, and can range from subtle to outrageous. We have designed Mute Robot, a game in which 2 players must cooperate to solve a series of puzzle challenges by ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IUI '16: Proceedings of the 21st International Conference on Intelligent User Interfaces
March 2016
446 pages
ISBN:9781450341370
DOI:10.1145/2856767
General Chairs:
Jeffrey Nichols
Google Inc, USA
,
Jalal Mahmud
IBM Research, USA
,
John O'Donovan
UC Santa Barbara, USA
,
Program Chairs:
Cristina Conati
University of British Columbia, Canada
,
Massimo Zancanaro
Bruno Kessler Foundation, Italy
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 March 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
body language
interface design
public speaking
Qualifiers
- research-article
Conference

Acceptance Rates
IUI '16 Paper Acceptance Rate49of194submissions,25%Overall Acceptance Rate746of2,811submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 38
  Total Citations
  View Citations
- 937
  Total Downloads
- Downloads (Last 12 months)133
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AutoManner: An Automated Interface for Making Public Speakers Aware of Their Mannerisms

IUI '16: Proceedings of the 21st International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Pen-Based Prosodic User Interface for Schoolchildren

A Multimodal System for Public Speaking with Real Time Feedback

Mute robot: cooperative gameplay through body language communication