ABSTRACT
Don't ignore this because its about speech technology. VUIs (voice user interfaces) won a best paper in CHI 2018. Did that get your attention? Good. Siri, Ivona, Google Home, and most speech synthesis systems have voices which are based on imitating a neutral citation style of speech and making it sound natural. But, in the real world, darling, people have to act, to perform! In this paper we will talk about speech synthesis as performance, why the uncanny valley is a bankrupt concept, and how academics can escape from studying corporate speech technology as if it's been bestowed by God.
- Matthew P Aylett, Per Ola Kristensson, Steve Whittaker, and Yolanda Vazquez-Alvarez. 2014. None of a CHInd: relationship counselling for HCI and speech technology. In CHI'14. ACM, 749--760. Google ScholarDigital Library
- Leigh Clark, Phillip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew P. Aylett, João P. Cabral, Cosmin Munteanu, and Benjamin R. Cowan. 2018. The State of Speech in HCI: Trends, Themes and Challenges. CoRR abs/1810.06828 (2018). arXiv:1810.06828 http://arxiv.org/abs/1810.06828Google Scholar
- Benjamin R Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. What can I help you with?: infrequent users' experiences of intelligent personal assistants. In Human-Computer Interaction with Mobile Devices and Services. ACM, 43. Google ScholarDigital Library
- Benoit Favre, Kyla Cheung, Siavash Kazemian, Adam Lee, Yang Liu, Cosmin Munteanu, Ani Nenkova, Dennis Ochei, Gerald Penn, Stephen Tratz, et al. 2013. Automatic human utility evaluation of ASR systems: Does WER really predict performance?. In INTERSPEECH. 3463--3467.Google Scholar
- Erving Goffman. 1959. The Presentation of Self in Everyday Life.Google Scholar
- Pierre Lison and Casey Kennington. 2016. OpenDial: A toolkit for developing spoken dialogue systems with probabilistic rules. Proceedings of ACL-2016 System Demonstrations (2016), 67--72.Google ScholarCross Ref
- Ewa Luger and Abigail Sellen. 2016. Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In CHI '16. ACM, 5286--5297. Siri, Echo and Performance: You have to Suffer Darling CHI'19 Extended Abstracts, May 4--9, 2019, Glasgow, Scotland Uk Google ScholarDigital Library
- Scott McCloud. 1993. Understanding comics: The invisible art. Northampton, Mass (1993).Google Scholar
- Joseph Mendelson and Matthew Aylett. 2017. Beyond the Listening Test: An Interactive Approach to TTS Evaluation. In Proceedings of the 18th Annual Conference of the International Speech Communication Association (Interspeech 2017), Stockholm, Sweden. 20--24.Google ScholarCross Ref
- Roger K Moore. 2012. A Bayesian explanation of the 'Uncanny Valley' effect and related psychological phenomena. Scientific reports 2 (2012), 864.Google Scholar
- Roger K Moore. 2017. Is spoken language all-or-nothing? Implications for future speech-based human-machine interaction. In Dialogues with Social Robots. Springer, 281--291.Google Scholar
- Jussi Palomäki, Anton Kunnari, Marianna Drosinou, Mika Koverola, Noora Lehtonen, Juho Halonen, Marko Repo, and Michael Laakasuo. 2018. Evaluating the replicability of the uncanny valley effect. Heliyon 4, 11 (2018), e00939.Google ScholarCross Ref
- Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In CHI '18. ACM, New York, NY, USA, Article 640, 12 pages. Google ScholarDigital Library
- Blaise Potard, Matthew P Aylett, David A Baude, and Petr Motlicek. 2016. Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN.. In INTERSPEECH. 2293--2297.Google Scholar
- Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukás Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlícek, Yanmin Qian, Petr Schwarz, Jan Silovský, Georg Stemmer, and Karel Veselý. 2011. The Kaldi speech recognition toolkit. Proc. IEEE ASRU (2011).Google Scholar
Index Terms
- Siri, Echo and Performance: You have to Suffer Darling
Recommendations
Prosody analysis of Thai emotion utterances
NLDB'11: Proceedings of the 16th international conference on Natural language processing and information systemsEmotion speech synthesis is the most important process to generate the naturalness of utterances in text-to-speech system. The interjection utterances in Thai language are employed in express a number of emotions. This paper presents a study of the ...
Conversational speech synthesis and the need for some laughter
This paper reports progress in the synthesis of conversational speech, from the viewpoint of work carried out on the analysis of a very large corpus of expressive speech in normal everyday situations. With recent developments in concatenative techniques,...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System
Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Comments