research-article

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities

Authors:
Leo Wanner

ICREA and Pompeu Fabra University, Barcelona, Spain

ICREA and Pompeu Fabra University, Barcelona, Spain
View Profile

,
Josep Blat

Pompeu Fabra University, Barcelona, Spain

Pompeu Fabra University, Barcelona, Spain
View Profile

,
Stamatia Dasiopoulou

Pompeu Fabra University, Barcelona, Spain

Pompeu Fabra University, Barcelona, Spain
View Profile

,
Mónica Domínguez

Pompeu Fabra University, Barcelona, Spain

Pompeu Fabra University, Barcelona, Spain
View Profile

,
Gerard Llorach

Pompeu Fabra University, Barcelona, Spain

Pompeu Fabra University, Barcelona, Spain
View Profile

,
Simon Mille

Pompeu Fabra University, Barcelona, Spain

Pompeu Fabra University, Barcelona, Spain
View Profile

,
Federico Sukno

Pompeu Fabra University, Barcelona, Spain

Pompeu Fabra University, Barcelona, Spain
View Profile

,
Eleni Kamateri

Centre for Research and Technology, Thessaloniki, Greece

Centre for Research and Technology, Thessaloniki, Greece
View Profile

,
Stefanos Vrochidis

Centre for Research and Technology, Thessaloniki, Greece

Centre for Research and Technology, Thessaloniki, Greece
View Profile

,
Ioannis Kompatsiaris

Centre for Research and Technology, Thessaloniki, Greece

Centre for Research and Technology, Thessaloniki, Greece
View Profile

,
Elisabeth André

University of Augsburg, Augsburg, Germany

University of Augsburg, Augsburg, Germany
View Profile

,
Florian Lingenfelser

University of Augsburg, Augsburg, Germany

University of Augsburg, Augsburg, Germany
View Profile

,
Gregor Mehlmann

University of Augsburg, Augsburg, Germany

University of Augsburg, Augsburg, Germany
View Profile

,
Andries Stam

Almende, Rotterdam, Netherlands

Almende, Rotterdam, Netherlands
View Profile

,
Ludo Stellingwerff

Almende, Rotterdam, Netherlands

Almende, Rotterdam, Netherlands
View Profile

,
Bianca Vieru

Vocapia Research, Paris, France

Vocapia Research, Paris, France
View Profile

,
Lori Lamel

Vocapia Research, Paris, France

Vocapia Research, Paris, France
View Profile

,
Wolfgang Minker

University of Ulm, Ulm, Germany

University of Ulm, Ulm, Germany
View Profile

,
Louisa Pragst

University of Ulm, Ulm, Germany

University of Ulm, Ulm, Germany
View Profile

,
Stefan Ultes

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal InteractionJune 2016Pages 21–26https://doi.org/10.1145/2927006.2927011

Published:06 June 2016Publication History

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

Pages 21–26

ABSTRACT

We present work in progress on an intelligent embodied conversation agent in the basic care and healthcare domain. In contrast to most of the existing agents, the presented agent is aimed to have linguistic cultural, social and emotional competence needed to interact with elderly and migrants. It is composed of an ontology-based and reasoning-driven dialogue manager, multimodal communication analysis and generation modules and a search engine for the retrieval of multimedia background content from the web needed for conducting a conversation on a given topic.

References

M. Adda-Decker and L. Lamel. Pronunciation variants across systems, languages and speaking style. Modeling Pronunciation Variation for Automatic Speech Recognition, Netherlands, 1998.Google Scholar
J. Agenjo, A. Evans, and J. Blat. Webglstudio: A pipeline for webgl scene creation. In Proceedings of the 18th International Conference on 3D Web Technology, pages 79--82, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
E. André. Challenges for social embodiment. In Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges, pages 35--37. ACM, 2014. Google ScholarDigital Library
M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven deep-syntactic dependency parsing. Natural Language Engineering, pages 1--36, 2015.Google Scholar
M. Ballesteros, B. Bohnet, S. Mille, and L. Wanner. Data-driven sentence generation with non-isomorphic trees. In Proceedings of the 2015 Conference of the NAACL: Human Language Technologies, pages 387--397, Denver, Colorado, May--June 2015. ACL.Google ScholarCross Ref
M. Ballesteros, C. Dyer, and N. A. Smith. Improved transition-based parsing by modeling characters instead of words with LSTMs. Proceedings of EMNLP, 2015.Google ScholarCross Ref
M. Cohen and D. Massaro. Modeling Coarticulation in Synthetic Visual Speech, 1993.Google Scholar
M. Domınguez, M. Farrús, A. Burga, and L. Wanner. Using hierarchical information structure for prosody prediction in content-to-speech application. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarCross Ref
M. Domınguez, M. Farrús, and L. Wanner. Combining acoustic and linguistic features in phrase-oriented prosody prediction. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarCross Ref
S. Du, Y. Tao, and A. M. Martinez. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15):E1454--E1462, 2014.Google ScholarCross Ref
P. Ekman and E. L. Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.Google Scholar
M. Farrús, G. Lai, and J. Moore. Paragraph-based prosodic cues for speech synthesis applications. In Proceedings of the 8thInternational Conference on Speech Prosody (SP 2016), Boston, MA, 2016.Google ScholarCross Ref
P. Gebhard, G. U. Mehlmann, and M. Kipp. Visual SceneMaker: A Tool for Authoring Interactive Virtual Characters. Journal of Multimodal User Interfaces: Interacting with Embodied Conversational Agents, Springer-Verlag, 6(1--2):3--11, 2012.Google Scholar
S. W. Gilroy, M. Cavazza, M. Niranen, E. André, T. Vogt, J. Urbain, M. Benayoun, H. Seichter, and M. Billinghurst. Pad-based multimodal affective fusion. In Affective Computing and Intelligent Interaction and Workshops, 2009.Google ScholarCross Ref
H. Gunes and B. Schuller. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image and Vision Computing, 31(2):120--136, 2013. Google ScholarDigital Library
D. Heckmann, T. Schwartz, B. Brandherm, M. Schmitz, and M. von Wilamowitz-Moellendorff. Gumo--the general user model ontology. In User modeling 2005. Springer, Berlin / Heidelberg, 2005. Google ScholarDigital Library
J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Assessing naturalness and emotional intensity: a perceptual study of animated facial motion. In Proceedings of the ACM Symposium on Applied Perception, pages 15--22. ACM, 2014. Google ScholarDigital Library
J. Hyde, E. J. Carter, S. Kiesler, and J. K. Hodgins. Using an interactive avatar's facial expressiveness to increase persuasiveness and socialness. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 1719--1728. ACM, 2015. Google ScholarDigital Library
L. Lamel and J. Gauvain. Speech recognition. In R, pages 305--322. 2003.Google Scholar
F. Lingenfelser, J. Wagner, E. André, G. McKeown, and W. Curran. An event driven fusion approach for enjoyment recognition in real-time. In MM, pages 377--386, 2014. Google ScholarDigital Library
G. Mehlmann and E. André. Modeling Multimodal Integration with Event Logic Charts. In Proceedings of the 14th International Conference on Multimodal Interaction, pages 125--132. ACM, New York, NY, USA, 2012. Google ScholarDigital Library
G. Mehlmann, K. Janowski, and E. André. Modeling Grounding for Interactive Social Companions. Journal of Artificial Intelligence: Social Companion Technologies, Springer-Verlag, 30(1):45--52, 2016.Google Scholar
G. Mehlmann, K. Janowski, T. Baur, M. Haring, E. André, and P. Gebhard. Exploring a Model of Gaze for Grounding in HRI. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 247--254. ACM, New York, NY, USA, 2014. Google ScholarDigital Library
I. Mel'cuk. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, 1988.Google Scholar
S. Mille, A. Burga, and L. Wanner. AnCora-UPF: A multi-level annotation of Spanish. In Proceedings of DepLing 2013, pages 217--226, Prague, Czech Republic, 2013.Google Scholar
M. Mori, K. F. MacDorman, and N. Kageki. The uncanny valley {from the field}. Robotics & Automation Magazine, IEEE, 19(2):98--100, 2012.Google ScholarCross Ref
B. Motik, B. Cuenca Grau, and U. Sattler. Structured objects in owl: Representation and reasoning. In Proceedings of the 17th international conference on World Wide Web, pages 555--564. ACM, 2008. Google ScholarDigital Library
G. S. Neubig. Towards High-Reliability Speech Translation in the Medical Domain. CNLP, 2013.Google Scholar
M. Pantic, A. Pentland, A. Nijholt, and T. Huang. Human Computing and Machine Understanding of Human Behavior: A Survey, volume 4451, pages 47--71. 2007. Google ScholarDigital Library
S. Pasquariello and C. Pelachaud. Greta: A simple facial animation engine. Soft Computing and Industry - Recent Applications, pages 511--525, 2002.Google ScholarCross Ref
L. Pfeifer Vardoulakis, L. Ring, B. Barry, C. Sidner, and T. Bickmore. Designing relational agents as long term social companions for older adults. In Proceedings of the 12th International Conference on Intelligent Virtual Agents, 2012. Google ScholarDigital Library
J. Posner, J. Russell, and B. Peterson. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development and psychopathology. Development and psychopathology, 17(3), 2005.Google Scholar
L. Pragst, S. Ultes, M. Kraus, and W. Minker. Adaptive dialogue management in the kristina project for multicultural health care applications. In Proceedings of the 19thWorkshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 202--203, Aug. 2015.Google Scholar
D. Riano, F. Real, F. Campana, S. Ercolani, and R. Annicchiarico. An ontology for the care of the elder at home. In Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine, AIME '09, pages 235--239, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
A. Ruiz, J. Van de Weijer, and X. Binefa. From emotions to action units with hidden and semi-hidden-task learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 3703--3711, 2015. Google ScholarDigital Library
S. K. Sakti. Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and a Network-based ASR System. LREC. Iceland, 2014.Google Scholar
G. Sandbach, S. Zafeiriou, M. Pantic, and L. Yin. Static and dynamic 3d facial expression recognition: A comprehensive survey. Image and Vision Computing, 30(10):683--697, 2012. Google ScholarDigital Library
A. Savran, B. Sankur, and M. T. Bilge. Regression-based intensity estimation of facial action units. Image and Vision Computing, 30(10):774--784, 2012. Google ScholarDigital Library
T. Tsikrika, K. Andreadou, A. Moumtzidou, E. Schinas, S. Papadopoulos, S. Vrochidis, and Y. Kompatsiaris. A unified model for socially interconnected multimedia-enriched objects. In Proceedings of the 21st MultiMedia Modelling Conference (MMM2015),, 2015.Google ScholarCross Ref
S. Ultes, M. Kraus, A. Schmitt, and W. Minker. Quality-adaptive spoken dialogue initiative selection and implications on reward modelling. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pages 374--383. ACL, Sept. 2015.Google ScholarCross Ref
S. Ultes and W. Minker. Managing adaptive spoken dialogue for intelligent environments. Journal of Ambient Intelligence and Smart Environments, 6(5):523--539, Aug. 2014.Google ScholarCross Ref
A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3:69--87, 2012. Google ScholarDigital Library
A. Vlachantoni, R. Shaw, R. Willis, M. Evandrou, J. Falkingham, and R. Luff. Measuring unmet need for social care amongst older people. Population Trends, (145):1--17, 2011.Google Scholar
J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. The social signal interpretation (SSI) framework--multimodal signal processing and recognition in real-time. In Proceedings of ACM International Conference on Multimedia, 2013. Google ScholarDigital Library
L. Wanner, B. Bohnet, N. Bouayad-Agha, F. Lareau, and D. Nicklaß. MARQUIS: Generation of user-tailored multilingual air quality bulletins. Applied Artificial Intelligence, 24(10):914--952, 2010. Google ScholarDigital Library
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31:39--58, 2009. Google ScholarDigital Library

Index Terms

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities
1. Computing methodologies
  1. Artificial intelligence
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
  2. Information systems applications

Recommendations

Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination
ICMI '21: Proceedings of the 2021 International Conference on Multimodal Interaction

Laughter and gaze have an important role in managing and coordi-nating social interactions. In the current work, using a multimodal corpus of dyadic taste-testing interactions, we explore whether laughs performing different pragmatic functions are ...
Read More
Design of a Knowledge-Based Agent as a Social Companion

We present work in progress on an intelligent embodied conversation agent that is supposed to act as a social companion with linguistic and emotional competence in the context of basic and health care. The core of the agent is an ontology-based ...
Read More
Interruptions in Human-Agent Interaction
IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents

Turn management is one of the necessary social interactions skills. In human-human interactions, turn changes are naturally completed by interruption, "cooperatively" or "competitively". Interruptions are inherent in conversation. They can be considered ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction
June 2016
46 pages
ISBN:9781450343626
DOI:10.1145/2927006
General Chairs:
Stefanos Vrochidis
CERTH-ITI, Greece
,
Leo Wanner
ICREA-UPF, Spain
,
Elisabeth André
University of Augsburg, Germany
,
Stephanie Elzer Schwartz
Millersville University, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dialogue
embodied agent
multimodal communication
retrieval
Qualifiers
- research-article
Conference

Acceptance Rates
MARMI '16 Paper Acceptance Rate6of7submissions,86%Overall Acceptance Rate6of7submissions,86%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 215
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination

Design of a Knowledge-Based Agent as a Social Companion

Interruptions in Human-Agent Interaction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards a Multimedia Knowledge-Based Agent with Social Competence and Human Interaction Capabilities

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Looking for Laughs: Gaze Interaction with Laughter Pragmatics and Coordination

Design of a Knowledge-Based Agent as a Social Companion

Interruptions in Human-Agent Interaction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media