research-article

Siri, Echo and Performance: You have to Suffer Darling

Authors:
Matthew P. Aylett

CereProc Ltd., Edinburgh, United Kingdom

CereProc Ltd., Edinburgh, United Kingdom
View Profile

,
Benjamin R. Cowan

University College Dublin, Dublin, Ireland

University College Dublin, Dublin, Ireland
View Profile

,
Leigh Clark

University College Dublin, Dublin, Ireland

University College Dublin, Dublin, Ireland
View Profile

CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing SystemsMay 2019Paper No.: alt08Pages 1–10https://doi.org/10.1145/3290607.3310422

Published:02 May 2019Publication History

CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

Pages 1–10

ABSTRACT

Don't ignore this because its about speech technology. VUIs (voice user interfaces) won a best paper in CHI 2018. Did that get your attention? Good. Siri, Ivona, Google Home, and most speech synthesis systems have voices which are based on imitating a neutral citation style of speech and making it sound natural. But, in the real world, darling, people have to act, to perform! In this paper we will talk about speech synthesis as performance, why the uncanny valley is a bankrupt concept, and how academics can escape from studying corporate speech technology as if it's been bestowed by God.

References

Matthew P Aylett, Per Ola Kristensson, Steve Whittaker, and Yolanda Vazquez-Alvarez. 2014. None of a CHInd: relationship counselling for HCI and speech technology. In CHI'14. ACM, 749--760. Google ScholarDigital Library
Leigh Clark, Phillip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew P. Aylett, João P. Cabral, Cosmin Munteanu, and Benjamin R. Cowan. 2018. The State of Speech in HCI: Trends, Themes and Challenges. CoRR abs/1810.06828 (2018). arXiv:1810.06828 http://arxiv.org/abs/1810.06828Google Scholar
Benjamin R Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. What can I help you with?: infrequent users' experiences of intelligent personal assistants. In Human-Computer Interaction with Mobile Devices and Services. ACM, 43. Google ScholarDigital Library
Benoit Favre, Kyla Cheung, Siavash Kazemian, Adam Lee, Yang Liu, Cosmin Munteanu, Ani Nenkova, Dennis Ochei, Gerald Penn, Stephen Tratz, et al. 2013. Automatic human utility evaluation of ASR systems: Does WER really predict performance?. In INTERSPEECH. 3463--3467.Google Scholar
Erving Goffman. 1959. The Presentation of Self in Everyday Life.Google Scholar
Pierre Lison and Casey Kennington. 2016. OpenDial: A toolkit for developing spoken dialogue systems with probabilistic rules. Proceedings of ACL-2016 System Demonstrations (2016), 67--72.Google ScholarCross Ref
Ewa Luger and Abigail Sellen. 2016. Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In CHI '16. ACM, 5286--5297. Siri, Echo and Performance: You have to Suffer Darling CHI'19 Extended Abstracts, May 4--9, 2019, Glasgow, Scotland Uk Google ScholarDigital Library
Scott McCloud. 1993. Understanding comics: The invisible art. Northampton, Mass (1993).Google Scholar
Joseph Mendelson and Matthew Aylett. 2017. Beyond the Listening Test: An Interactive Approach to TTS Evaluation. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (Interspeech 2017), Stockholm, Sweden. 20--24.Google ScholarCross Ref
Roger K Moore. 2012. A Bayesian explanation of the 'Uncanny Valley' effect and related psychological phenomena. Scientific reports 2 (2012), 864.Google Scholar
Roger K Moore. 2017. Is spoken language all-or-nothing? Implications for future speech-based human-machine interaction. In Dialogues with Social Robots. Springer, 281--291.Google Scholar
Jussi Palomäki, Anton Kunnari, Marianna Drosinou, Mika Koverola, Noora Lehtonen, Juho Halonen, Marko Repo, and Michael Laakasuo. 2018. Evaluating the replicability of the uncanny valley effect. Heliyon 4, 11 (2018), e00939.Google ScholarCross Ref
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In CHI '18. ACM, New York, NY, USA, Article 640, 12 pages. Google ScholarDigital Library
Blaise Potard, Matthew P Aylett, David A Baude, and Petr Motlicek. 2016. Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN.. In INTERSPEECH. 2293--2297.Google Scholar
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukás Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlícek, Yanmin Qian, Petr Schwarz, Jan Silovský, Georg Stemmer, and Karel Veselý. 2011. The Kaldi speech recognition toolkit. Proc. IEEE ASRU (2011).Google Scholar

Index Terms

Siri, Echo and Performance: You have to Suffer Darling
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces

Recommendations

Prosody analysis of Thai emotion utterances
NLDB'11: Proceedings of the 16th international conference on Natural language processing and information systems

Emotion speech synthesis is the most important process to generate the naturalness of utterances in text-to-speech system. The interjection utterances in Thai language are employed in express a number of emotions. This paper presents a study of the ...
Read More
Conversational speech synthesis and the need for some laughter

This paper reports progress in the synthesis of conversational speech, from the viewpoint of work carried out on the analysis of a very large corpus of expressive speech in normal everyday situations. With recent developments in concatenative techniques,...
Read More
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems
May 2019
3673 pages
ISBN:9781450359719
DOI:10.1145/3290607
General Chairs:
Stephen Brewster
University of Glasgow, Scotland, UK
,
Geraldine Fitzpatrick
TU Wien, Austria
,
Program Chairs:
Anna Cox
University College London, UK
,
Vassilis Kostakos
University of Melbourne, Australia
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
emotion
personal assistants
personality
speech synthesis
voice interaction
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6,164of23,696submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 1,112
  Total Downloads
- Downloads (Last 12 months)96
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Siri, Echo and Performance: You have to Suffer Darling

CHI EA '19: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prosody analysis of Thai emotion utterances

Conversational speech synthesis and the need for some laughter

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System