ABSTRACT
We explore methods for managing conversational engagement in open-world, physically situated dialog systems. We investigate a self-supervised methodology for constructing forecasting models that aim to anticipate when participants are about to terminate their interactions with a situated system. We study how these models can be leveraged to guide a disengagement policy that uses linguistic hesitation actions, such as filled and non-filled pauses, when uncertainty about the continuation of engagement arises. The hesitations allow for additional time for sensing and inference, and convey the system's uncertainty. We report results from a study of the proposed approach with a directions-giving robot deployed in the wild.
- Kendon, A. 1990. Spatial organization in social encounters: the F-formation system, Conducting Interaction: Patterns of behavior in focused encounters, Studies in International Sociolinguistics, Cambridge University Press.Google Scholar
- Sidner, C.L., Lee, C., Kidd, C.D., Lesh, N. and Rich, C., 2005. Explorations in engagement for humans and robots, Artificial Intelligence, 166 (1--2), pp. 140--164. Google ScholarDigital Library
- Rich, C., Ponsler, B., Holroyd, A., and Sidner, C.L., 2010. Recognizing engagement in human-robot interaction, in Proc. of HRI'2010, Osaka, Japan. Google ScholarDigital Library
- Michalowski, M.P., Sabanovic, S., and Simmons, R., 2006. A spatial model of engagement for a social robot, in 9th IEEE Workshop on Advanced Motion Control, pp. 762--767.Google Scholar
- Bohus, D., and Horvitz, E., 2009. Models for Multiparty Engagement in Open-World Dialog, in Proc. of SIGdial'2009, London, UK. Google ScholarDigital Library
- Bohus, D., and Horvitz, E., 2009. Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings, in Proc. of SIGdial'2009, London, UK. Google ScholarDigital Library
- Clark, H.H., and Fox Tree, J.E., 2002. Using uh and um in spontaneous speaking, Cognition, 84(1):73--111, May, 2002.Google ScholarCross Ref
- Corley, M., and Stewart, O.W., 2008. Hesitation disfluencies in spontaneous speech: The meaning of um, Language and Linguistics Compass, 4, 589--602.Google ScholarCross Ref
- Goto, M., Itou, K., and Hayamizu, S., 1999. A Real-time Filled Pause Detection System for Spontaneous Speech Recognition, in Proc. of Eurospeech'99, Budapest, Hungary.Google Scholar
- An, G., Brizan, D.G., and Rosenberg, A., 2013. Detecting laughter and filled pauses using syllable-based features, in Proc. of Interspeech'2013, Lyon, France.Google Scholar
- Adell, J., Bonafonte, A., and Escudero, D., 2010. Synthesis of filled pauses based on a disfluent speech model, in Proc. of ICASSP'2010, Dallas, TX.Google Scholar
- Skantze, G., and Hjalmarsson, A., 2010. Towards incremental speech generation in dialogue systems, in Proc. of SIGDial'2010, Tokyo, Japan. Google ScholarDigital Library
- Skantze, G., Hjalmarsson, A., and Oertel, C., 2013. Exploring the effects of gaze and pauses in situated human-robot interaction, in Proc. of SIGDial'2013, Metz, France.Google Scholar
- Dethlefs, N., Hastie, H., Reiser, V., and Lemon, O., 2012. Optimizing Natural Language Generation for Decision Making for Situated Dialogue, in Proc. of INLG'2012, 49--58, Utica, IL.Google Scholar
- Bohus, D., Saw, C.W., and Horvitz, E., 2014. Directions Robot: In-the-Wild Experiences and Lessons Learned, in Proc. of AAMAS'2014, Paris, France. Google ScholarDigital Library
- Bohus, D., and Horvitz, E., 2009. Dialog in the Open World: Platform and Applications, in Proc. of ICMI'2009, Boston, MA. Google ScholarDigital Library
- Gouaillier, D., Hugel, V., Blazevic, P., Kilner, C., Monceaux, J., Lafourcade, P., Marnier, B., Serre, J., Maisonnier, B. 2009. Mechatronic design of NAO humanoid. In Proceedings of ICRA'09, Kobe, Japan. Google ScholarDigital Library
Index Terms
- Managing Human-Robot Engagement with Forecasts and... um... Hesitations
Recommendations
Enabling Multimodal Human–Robot Interaction for the Karlsruhe Humanoid Robot
In this paper, we present our work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes ...
Timing multimodal turn-taking for human-robot cooperation
ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interactionIn human cooperation, the concurrent usage of multiple social modalities such as speech, gesture, and gaze results in robust and efficient communicative acts. Such multimodality in combination with reciprocal intentions supports fluent turn-taking. I ...
Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interactionIn this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Comments