skip to main content
10.1145/2927006.2927011acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities

Authors Info & Claims
Published:06 June 2016Publication History

ABSTRACT

We present work in progress on an intelligent embodied conversation agent in the basic care and healthcare domain. In contrast to most of the existing agents, the presented agent is aimed to have linguistic cultural, social and emotional competence needed to interact with elderly and migrants. It is composed of an ontology-based and reasoning-driven dialogue manager, multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic.

References

  1. M. Adda-Decker and L. Lamel. Pronunciation variants across systems, languages and speaking style. Modeling Pronunciation Variation for Automatic Speech Recognition, Netherlands, 1998.Google ScholarGoogle Scholar
  2. J. Agenjo, A. Evans, and J. Blat. Webglstudio: A pipeline for webgl scene creation. In Proceedings of the 18th International Conference on 3D Web Technology, pages 79--82, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. André. Challenges for social embodiment. In Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges, pages 35--37. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven deep-syntactic dependency parsing. Natural Language Engineering, pages 1--36, 2015.Google ScholarGoogle Scholar
  5. M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven sentence generation with non-isomorphic trees. In Proceedings of the 2015 Conference of the NAACL: Human Language Technologies, pages 387--397, Denver, Colorado, May--June 2015. ACL.Google ScholarGoogle ScholarCross RefCross Ref
  6. M. Ballesteros, C. Dyer, and N. A. Smith. Improved transition-based parsing by modeling characters instead of words with LSTMs. Proceedings of EMNLP, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Cohen and D. Massaro. Modeling Coarticulation in Synthetic Visual Speech, 1993.Google ScholarGoogle Scholar
  8. M. Domınguez, M. Farrús, A. Burga, and L. Wanner. Using hierarchical information structure for prosody prediction in content-to-speech application. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Domınguez, M. Farrús, and L. Wanner. Combining acoustic and linguistic features in phrase-oriented prosody prediction. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Du, Y. Tao, and A. M. Martinez. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15):E1454--E1462, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Ekman and E. L. Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.Google ScholarGoogle Scholar
  12. M. Farrús, G. Lai, and J. Moore. Paragraph-based prosodic cues for speech synthesis applications. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  13. P. Gebhard, G. U. Mehlmann, and M. Kipp. Visual SceneMaker: A Tool for Authoring Interactive Virtual Characters. Journal of Multimodal User Interfaces: Interacting with Embodied Conversational Agents, Springer-Verlag, 6(1--2):3--11, 2012.Google ScholarGoogle Scholar
  14. S. W. Gilroy, M. Cavazza, M. Niranen, E. André, T. Vogt, J. Urbain, M. Benayoun, H. Seichter, and M. Billinghurst. Pad-based multimodal affective fusion. In Affective Computing and Intelligent Interaction and Workshops, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  15. H. Gunes and B. Schuller. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing, 31(2):120--136, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Heckmann, T. Schwartz, B. Brandherm, M. Schmitz, and M. von Wilamowitz-Moellendorff. Gumo--the general user model ontology. In User modeling 2005. Springer, Berlin / Heidelberg, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Assessing naturalness and emotional intensity: a perceptual study of animated facial motion. In Proceedings of the ACM Symposium on Applied Perception, pages 15--22. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Using an interactive avatar's facial expressiveness to increase persuasiveness and socialness. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 1719--1728. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Lamel and J. Gauvain. Speech recognition. In R, pages 305--322. 2003.Google ScholarGoogle Scholar
  20. F. Lingenfelser, J. Wagner, E. André, G. McKeown, and W. Curran. An event driven fusion approach for enjoyment recognition in real-time. In MM, pages 377--386, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Mehlmann and E. André. Modeling Multimodal Integration with Event Logic Charts. In Proceedings of the 14th International Conference on Multimodal Interaction, pages 125--132. ACM, New York, NY, USA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Mehlmann, K. Janowski, and E. André. Modeling Grounding for Interactive Social Companions. Journal of Artificial Intelligence: Social Companion Technologies, Springer-Verlag, 30(1):45--52, 2016.Google ScholarGoogle Scholar
  23. G. Mehlmann, K. Janowski, T. Baur, M. Haring, E. André, and P. Gebhard. Exploring a Model of Gaze for Grounding in HRI. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 247--254. ACM, New York, NY, USA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Mel'cuk. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, 1988.Google ScholarGoogle Scholar
  25. S. Mille, A. Burga, and L. Wanner. AnCora-UPF: A multi-level annotation of Spanish. In Proceedings of DepLing 2013, pages 217--226, Prague, Czech Republic, 2013.Google ScholarGoogle Scholar
  26. M. Mori, K. F. MacDorman, and N. Kageki. The uncanny valley {from the field}. Robotics & Automation Magazine, IEEE, 19(2):98--100, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  27. B. Motik, B. Cuenca Grau, and U. Sattler. Structured objects in owl: Representation and reasoning. In Proceedings of the 17th international conference on World Wide Web, pages 555--564. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. S. Neubig. Towards High-Reliability Speech Translation in the Medical Domain. CNLP, 2013.Google ScholarGoogle Scholar
  29. M. Pantic, A. Pentland, A. Nijholt, and T. Huang. Human Computing and Machine Understanding of Human Behavior: A Survey, volume 4451, pages 47--71. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Pasquariello and C. Pelachaud. Greta: A simple facial animation engine. Soft Computing and Industry - Recent Applications, pages 511--525, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  31. L. Pfeifer Vardoulakis, L. Ring, B. Barry, C. Sidner, and T. Bickmore. Designing relational agents as long term social companions for older adults. In Proceedings of the 12th International Conference on Intelligent Virtual Agents, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Posner, J. Russell, and B. Peterson. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development and psychopathology. Development and psychopathology, 17(3), 2005.Google ScholarGoogle Scholar
  33. L. Pragst, S. Ultes, M. Kraus, and W. Minker. Adaptive dialogue management in the kristina project for multicultural health care applications. In Proceedings of the 19thWorkshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 202--203, Aug. 2015.Google ScholarGoogle Scholar
  34. D. Riano, F. Real, F. Campana, S. Ercolani, and R. Annicchiarico. An ontology for the care of the elder at home. In Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine, AIME '09, pages 235--239, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Ruiz, J. Van de Weijer, and X. Binefa. From emotions to action units with hidden and semi-hidden-task learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 3703--3711, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. K. Sakti. Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and a Network-based ASR System. LREC. Iceland, 2014.Google ScholarGoogle Scholar
  37. G. Sandbach, S. Zafeiriou, M. Pantic, and L. Yin. Static and dynamic 3d facial expression recognition: A comprehensive survey. Image and Vision Computing, 30(10):683--697, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Savran, B. Sankur, and M. T. Bilge. Regression-based intensity estimation of facial action units. Image and Vision Computing, 30(10):774--784, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Tsikrika, K. Andreadou, A. Moumtzidou, E. Schinas, S. Papadopoulos, S. Vrochidis, and Y. Kompatsiaris. A unified model for socially interconnected multimedia-enriched objects. In Proceedings of the 21st MultiMedia Modelling Conference (MMM2015),, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  40. S. Ultes, M. Kraus, A. Schmitt, and W. Minker. Quality-adaptive spoken dialogue initiative selection and implications on reward modelling. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 374--383. ACL, Sept. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  41. S. Ultes and W. Minker. Managing adaptive spoken dialogue for intelligent environments. Journal of Ambient Intelligence and Smart Environments, 6(5):523--539, Aug. 2014.Google ScholarGoogle ScholarCross RefCross Ref
  42. A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3:69--87, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Vlachantoni, R. Shaw, R. Willis, M. Evandrou, J. Falkingham, and R. Luff. Measuring unmet need for social care amongst older people. Population Trends, (145):1--17, 2011.Google ScholarGoogle Scholar
  44. J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. The social signal interpretation (SSI) framework--multimodal signal processing and recognition in real-time. In Proceedings of ACM International Conference on Multimedia, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Wanner, B. Bohnet, N. Bouayad-Agha, F. Lareau, and D. Nicklaß. MARQUIS: Generation of user-tailored multilingual air quality bulletins. Applied Artificial Intelligence, 24(10):914--952, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31:39--58, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction
          June 2016
          46 pages
          ISBN:9781450343626
          DOI:10.1145/2927006

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 June 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          MARMI '16 Paper Acceptance Rate6of7submissions,86%Overall Acceptance Rate6of7submissions,86%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader