ABSTRACT
We present work in progress on an intelligent embodied conversation agent in the basic care and healthcare domain. In contrast to most of the existing agents, the presented agent is aimed to have linguistic cultural, social and emotional competence needed to interact with elderly and migrants. It is composed of an ontology-based and reasoning-driven dialogue manager, multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic.
- M. Adda-Decker and L. Lamel. Pronunciation variants across systems, languages and speaking style. Modeling Pronunciation Variation for Automatic Speech Recognition, Netherlands, 1998.Google Scholar
- J. Agenjo, A. Evans, and J. Blat. Webglstudio: A pipeline for webgl scene creation. In Proceedings of the 18th International Conference on 3D Web Technology, pages 79--82, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
- E. André. Challenges for social embodiment. In Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges, pages 35--37. ACM, 2014. Google ScholarDigital Library
- M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven deep-syntactic dependency parsing. Natural Language Engineering, pages 1--36, 2015.Google Scholar
- M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven sentence generation with non-isomorphic trees. In Proceedings of the 2015 Conference of the NAACL: Human Language Technologies, pages 387--397, Denver, Colorado, May--June 2015. ACL.Google ScholarCross Ref
- M. Ballesteros, C. Dyer, and N. A. Smith. Improved transition-based parsing by modeling characters instead of words with LSTMs. Proceedings of EMNLP, 2015.Google ScholarCross Ref
- M. Cohen and D. Massaro. Modeling Coarticulation in Synthetic Visual Speech, 1993.Google Scholar
- M. Domınguez, M. Farrús, A. Burga, and L. Wanner. Using hierarchical information structure for prosody prediction in content-to-speech application. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarCross Ref
- M. Domınguez, M. Farrús, and L. Wanner. Combining acoustic and linguistic features in phrase-oriented prosody prediction. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarCross Ref
- S. Du, Y. Tao, and A. M. Martinez. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15):E1454--E1462, 2014.Google ScholarCross Ref
- P. Ekman and E. L. Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.Google Scholar
- M. Farrús, G. Lai, and J. Moore. Paragraph-based prosodic cues for speech synthesis applications. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarCross Ref
- P. Gebhard, G. U. Mehlmann, and M. Kipp. Visual SceneMaker: A Tool for Authoring Interactive Virtual Characters. Journal of Multimodal User Interfaces: Interacting with Embodied Conversational Agents, Springer-Verlag, 6(1--2):3--11, 2012.Google Scholar
- S. W. Gilroy, M. Cavazza, M. Niranen, E. André, T. Vogt, J. Urbain, M. Benayoun, H. Seichter, and M. Billinghurst. Pad-based multimodal affective fusion. In Affective Computing and Intelligent Interaction and Workshops, 2009.Google ScholarCross Ref
- H. Gunes and B. Schuller. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing, 31(2):120--136, 2013. Google ScholarDigital Library
- D. Heckmann, T. Schwartz, B. Brandherm, M. Schmitz, and M. von Wilamowitz-Moellendorff. Gumo--the general user model ontology. In User modeling 2005. Springer, Berlin / Heidelberg, 2005. Google ScholarDigital Library
- J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Assessing naturalness and emotional intensity: a perceptual study of animated facial motion. In Proceedings of the ACM Symposium on Applied Perception, pages 15--22. ACM, 2014. Google ScholarDigital Library
- J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Using an interactive avatar's facial expressiveness to increase persuasiveness and socialness. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 1719--1728. ACM, 2015. Google ScholarDigital Library
- L. Lamel and J. Gauvain. Speech recognition. In R, pages 305--322. 2003.Google Scholar
- F. Lingenfelser, J. Wagner, E. André, G. McKeown, and W. Curran. An event driven fusion approach for enjoyment recognition in real-time. In MM, pages 377--386, 2014. Google ScholarDigital Library
- G. Mehlmann and E. André. Modeling Multimodal Integration with Event Logic Charts. In Proceedings of the 14th International Conference on Multimodal Interaction, pages 125--132. ACM, New York, NY, USA, 2012. Google ScholarDigital Library
- G. Mehlmann, K. Janowski, and E. André. Modeling Grounding for Interactive Social Companions. Journal of Artificial Intelligence: Social Companion Technologies, Springer-Verlag, 30(1):45--52, 2016.Google Scholar
- G. Mehlmann, K. Janowski, T. Baur, M. Haring, E. André, and P. Gebhard. Exploring a Model of Gaze for Grounding in HRI. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 247--254. ACM, New York, NY, USA, 2014. Google ScholarDigital Library
- I. Mel'cuk. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, 1988.Google Scholar
- S. Mille, A. Burga, and L. Wanner. AnCora-UPF: A multi-level annotation of Spanish. In Proceedings of DepLing 2013, pages 217--226, Prague, Czech Republic, 2013.Google Scholar
- M. Mori, K. F. MacDorman, and N. Kageki. The uncanny valley {from the field}. Robotics & Automation Magazine, IEEE, 19(2):98--100, 2012.Google ScholarCross Ref
- B. Motik, B. Cuenca Grau, and U. Sattler. Structured objects in owl: Representation and reasoning. In Proceedings of the 17th international conference on World Wide Web, pages 555--564. ACM, 2008. Google ScholarDigital Library
- G. S. Neubig. Towards High-Reliability Speech Translation in the Medical Domain. CNLP, 2013.Google Scholar
- M. Pantic, A. Pentland, A. Nijholt, and T. Huang. Human Computing and Machine Understanding of Human Behavior: A Survey, volume 4451, pages 47--71. 2007. Google ScholarDigital Library
- S. Pasquariello and C. Pelachaud. Greta: A simple facial animation engine. Soft Computing and Industry - Recent Applications, pages 511--525, 2002.Google ScholarCross Ref
- L. Pfeifer Vardoulakis, L. Ring, B. Barry, C. Sidner, and T. Bickmore. Designing relational agents as long term social companions for older adults. In Proceedings of the 12th International Conference on Intelligent Virtual Agents, 2012. Google ScholarDigital Library
- J. Posner, J. Russell, and B. Peterson. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development and psychopathology. Development and psychopathology, 17(3), 2005.Google Scholar
- L. Pragst, S. Ultes, M. Kraus, and W. Minker. Adaptive dialogue management in the kristina project for multicultural health care applications. In Proceedings of the 19thWorkshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 202--203, Aug. 2015.Google Scholar
- D. Riano, F. Real, F. Campana, S. Ercolani, and R. Annicchiarico. An ontology for the care of the elder at home. In Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine, AIME '09, pages 235--239, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
- A. Ruiz, J. Van de Weijer, and X. Binefa. From emotions to action units with hidden and semi-hidden-task learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 3703--3711, 2015. Google ScholarDigital Library
- S. K. Sakti. Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and a Network-based ASR System. LREC. Iceland, 2014.Google Scholar
- G. Sandbach, S. Zafeiriou, M. Pantic, and L. Yin. Static and dynamic 3d facial expression recognition: A comprehensive survey. Image and Vision Computing, 30(10):683--697, 2012. Google ScholarDigital Library
- A. Savran, B. Sankur, and M. T. Bilge. Regression-based intensity estimation of facial action units. Image and Vision Computing, 30(10):774--784, 2012. Google ScholarDigital Library
- T. Tsikrika, K. Andreadou, A. Moumtzidou, E. Schinas, S. Papadopoulos, S. Vrochidis, and Y. Kompatsiaris. A unified model for socially interconnected multimedia-enriched objects. In Proceedings of the 21st MultiMedia Modelling Conference (MMM2015),, 2015.Google ScholarCross Ref
- S. Ultes, M. Kraus, A. Schmitt, and W. Minker. Quality-adaptive spoken dialogue initiative selection and implications on reward modelling. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 374--383. ACL, Sept. 2015.Google ScholarCross Ref
- S. Ultes and W. Minker. Managing adaptive spoken dialogue for intelligent environments. Journal of Ambient Intelligence and Smart Environments, 6(5):523--539, Aug. 2014.Google ScholarCross Ref
- A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3:69--87, 2012. Google ScholarDigital Library
- A. Vlachantoni, R. Shaw, R. Willis, M. Evandrou, J. Falkingham, and R. Luff. Measuring unmet need for social care amongst older people. Population Trends, (145):1--17, 2011.Google Scholar
- J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. The social signal interpretation (SSI) framework--multimodal signal processing and recognition in real-time. In Proceedings of ACM International Conference on Multimedia, 2013. Google ScholarDigital Library
- L. Wanner, B. Bohnet, N. Bouayad-Agha, F. Lareau, and D. Nicklaß. MARQUIS: Generation of user-tailored multilingual air quality bulletins. Applied Artificial Intelligence, 24(10):914--952, 2010. Google ScholarDigital Library
- Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31:39--58, 2009. Google ScholarDigital Library
Index Terms
- Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities
Recommendations
Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination
ICMI '21: Proceedings of the 2021 International Conference on Multimodal InteractionLaughter and gaze have an important role in managing and coordi-nating social interactions. In the current work, using a multimodal corpus of dyadic taste-testing interactions, we explore whether laughs performing different pragmatic functions are ...
Design of a Knowledge-Based Agent as a Social Companion
We present work in progress on an intelligent embodied conversation agent that is supposed to act as a social companion with linguistic and emotional competence in the context of basic and health care. The core of the agent is an ontology-based ...
Interruptions in Human-Agent Interaction
IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual AgentsTurn management is one of the necessary social interactions skills. In human-human interactions, turn changes are naturally completed by interruption, "cooperatively" or "competitively". Interruptions are inherent in conversation. They can be considered ...
Comments