ABSTRACT
Many individuals exhibit unconscious body movements called mannerisms while speaking. These repeated changes often distract the audience when not relevant to the verbal context. We present an intelligent interface that can automatically extract human gestures using Microsoft Kinect to make speakers aware of their mannerisms. We use a sparsity-based algorithm, Shift Invariant Sparse Coding, to automatically extract the patterns of body movements. These patterns are displayed in an interface with subtle question and answer-based feedback scheme that draws attention to the speaker's body language. Our formal evaluation with 27 participants shows that the users became aware of their body language after using the system. In addition, when independent observers annotated the accuracy of the algorithm for every extracted pattern, we find that the patterns extracted by our algorithm is significantly (p<0.001) more accurate than just random selection. This represents a strong evidence that the algorithm is able to extract human-interpretable body movement patterns. An interactive demo of AutoManner is available at http://tinyurl.com/AutoManner.
- Aggarwal, J. K., et al. Human activity analysis: A review. ACM Computing Surveys (CSUR) (2011). Google ScholarDigital Library
- Anderson, K., André, E., Baur, T., Bernardini, S., Chollet, M., Chryssafidou, E., Damian, I., Ennis, C., Egges, A., Gebhard, P., et al. The tardis framework: intelligent virtual agents for social coaching in job interviews. In Advances in Computer Entertainment. Springer, 2013, 476--491. Google ScholarDigital Library
- Batrinca, L., Stratou, G., Shapiro, A., Morency, L.-P., and Scherer, S. Cicero-towards a multimodal virtual audience platform for public speaking training. In Intelligent Virtual Agents, Springer (2013), 116--128.Google Scholar
- Battiti, R. Accelerated backpropagation learning: Two optimization methods. Complex systems 3, 4 (1989), 331--342.Google Scholar
- Beck, A., and Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 1 (2009), 183--202. Google ScholarDigital Library
- Chen, L., Feng, G., Joe, J., Leong, C. W., Kitchen, C., and Lee, C. M. Towards automated assessment of public speaking skills using multimodal cues. In Proceedings of the 16th International Conference on Multimodal Interaction, ACM (2014), 200--203. Google ScholarDigital Library
- Cheng, G., et al. Advances in human action recognition: A survey. arXiv preprint arXiv:1501.05964 (2015).Google Scholar
- Chollet, M., Wörtwein, T., Morency, L.-P., Shapiro, A., and Scherer, S. Exploring feedback strategies to improve public speaking: an interactive virtual audience framework. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ACM (2015), 1143--1154. Google ScholarDigital Library
- D'Arcy, J. Technically speaking: A guide for communicating complex information. Battelle Press Columbus, OH, 1998.Google Scholar
- de Gelder, B. Why bodies' twelve reasons for including bodily expressions in affective neuroscience. Philosophical Transactions of the Royal Society B: Biological Sciences 364, 1535 (2009), 3475--3484.Google ScholarCross Ref
- DiMatteo, M. R., Hays, R. D., and Prince, L. M. Relationship of physicians' nonverbal communication skill to patient satisfaction, appointment noncompliance, and physician workload. Health Psychology 5, 6 (1986), 581.Google ScholarCross Ref
- Fay, M. P., and Proschan, M. A. Wilcoxon-mann-whitney or t-test? on assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics surveys 4 (2010), 1.Google Scholar
- Hoogterp, B. Your Perfect Presentation: Speak in Front of Any Audience Anytime Anywhere and Never Be Nervous Again. McGraw-Hill Education, 2014.Google Scholar
- Hoque, M. E., Courgeon, M., Martin, J.-C., Mutlu, B., and Picard, R. W. Mach: My automated conversation coach. In Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, ACM (2013), 697--706. Google ScholarDigital Library
- Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of educational psychology 24, 6 (1933), 417.Google Scholar
- Knapp, M., Hall, J., and Horgan, T. Nonverbal communication in human interaction. Cengage Learning, 2013.Google Scholar
- Likert, R. A technique for the measurement of attitudes. Archives of psychology (1932).Google Scholar
- Lucas, S. E. The art of public speaking. International Book Publishing Company, 2008. Google ScholarDigital Library
- Metaxas, D., and Zhang, S. A review of motion analysis methods for human nonverbal communication computing. Image and Vision Computing (2013). Google ScholarDigital Library
- Mitra, T., Hutto, C., and Gilbert, E. Comparing person-and process-centric strategies for obtaining quality data on amazon mechanical turk. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, ACM (2015), 1345--1354. Google ScholarDigital Library
- Mørup, M., et al. Shift invariant sparse coding of image and music data. Tech. Rep. IMM2008-04659, Technical University of Denmark, 2008.Google Scholar
- Murphy, J. The power of your subconscious mind. Courier Corporation, 2012.Google Scholar
- Naim, I., Tanveer, M. I., Gildea, D., and Hoque, M. E. Automated prediction and analysis of job interview performance: The role of what you say and how you say it. Automatic Face and Gesture Recognition (FG) (2015).Google Scholar
- Nguyen, A.-T., Chen, W., and Rauterberg, M. Online feedback system for public speakers. In E-Learning, E-Management and E-Services (IS3e), 2012 IEEE Symposium on, IEEE (2012), 1--5.Google Scholar
- Niebles, J. C., Wang, H., and Fei-Fei, L. Unsupervised learning of human action categories using spatial-temporal words. International journal of computer vision 79, 3 (2008), 299--318. Google ScholarDigital Library
- Park, S., Shoemark, P., and Morency, L.-P. Toward crowdsourcing micro-level behavior annotations: the challenges of interface, training, and generalization. In Proceedings of the 19th international conference on Intelligent User Interfaces, ACM (2014), 37--46. Google ScholarDigital Library
- Pfister, T., and Robinson, P. Real-time recognition of affective states from nonverbal features of speech and its application for public speaking skill analysis. Affective Computing, IEEE Transactions on 2, 2 (2011), 66--78. Google ScholarDigital Library
- Ranganath, R., Jurafsky, D., and McFarland, D. It's not you, it's me: detecting flirting and its misperception in speed-dates. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, Association for Computational Linguistics (2009), 334--342. Google ScholarDigital Library
- Schreiber, L. M., Paul, G. D., and Shibley, L. R. The development and test of the public speaking competence rubric. Communication Education 61, 3 (2012), 205--233.Google ScholarCross Ref
- Shim, H. S., Park, S., Chatterjee, M., Scherer, S., Sagae, K., and Morency, L.-P. Acoustic and para-verbal indicators of persuasiveness in social multimedia. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, IEEE (2015), 2239--2243.Google ScholarCross Ref
- Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., and Moore, R. Real-time human pose recognition in parts from single depth images. Communications of the ACM 56, 1 (2013), 116--124. Google ScholarDigital Library
- Strangert, E., and Gustafson, J. What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations. In INTERSPEECH, vol. 8 (2008), 1688--1691.Google Scholar
- Tanaka, H., Sakti, S., Neubig, G., Toda, T., Negoro, H., Iwasaka, H., and Nakamura, S. Automated social skills trainer. In Proceedings of the 20th International Conference on Intelligent User Interfaces, ACM (2015), 17--27. Google ScholarDigital Library
- Tanveer, M. I., Lin, E., and Hoque, M. E. Rhema: A real-time in-situ intelligent interface to help people with public speaking. In Proceedings of the 20th International Conference on Intelligent User Interfaces, ACM (2015), 286--295. Google ScholarDigital Library
- Tanveer, M. I., Liu, J., and Hoque, M. E. Unsupervised extraction of human-interpretable nonverbal behavioral cues in a public speaking scenario. In ACM Multimedia (ACMMM'15) (2015). Google ScholarDigital Library
- Toastmasters International. Gestures: Your body speaks. Online Document. Available at http://web.mst.edu/?toast/docs/Gestures.pdf, 2011.Google Scholar
- Vinciarelli, A., Pantic, M., and Bourlard, H. Social signal processing: Survey of an emerging domain. Image and Vision Computing 27, 12 (2009), 1743--1759. Google ScholarDigital Library
- Wilson, T. D. Strangers to ourselves. Harvard University Press, 2004.Google ScholarCross Ref
- Zhang, Z. Microsoft kinect sensor and its effect. MultiMedia, IEEE 19, 2 (2012), 4--10. Google ScholarDigital Library
- Zhou, F., et al. Aligned cluster analysis for temporal segmentation of human motion. In FG'08 (2008).Google ScholarCross Ref
Index Terms
- AutoManner: An Automated Interface for Making Public Speakers Aware of Their Mannerisms
Recommendations
A Pen-Based Prosodic User Interface for Schoolchildren
A prosodic user interface is defined as an user interface that can deal with not only what is entered by the user but also how it is entered. A pen-based user interface provides more prosodic information than a mouse-based graphical user interface. The ...
A Multimodal System for Public Speaking with Real Time Feedback
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionWe have developed a multimodal prototype for public speaking with real time feedback using the Microsoft Kinect. Effective speaking involves use of gesture, facial expression, posture, voice as well as the spoken word. These modalities combine to give ...
Mute robot: cooperative gameplay through body language communication
CHI EA '14: CHI '14 Extended Abstracts on Human Factors in Computing SystemsBody language is an expressive form of communication that transcends language barriers, and can range from subtle to outrageous. We have designed Mute Robot, a game in which 2 players must cooperate to solve a series of puzzle challenges by ...
Comments