research-article

AlterEgo: A Personalized Wearable Silent Speech Interface

Authors:
Arnav Kapur

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Shreyas Kapur

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

,
Pattie Maes

Massachusetts Institute of Technology, Cambridge, MA, USA

Massachusetts Institute of Technology, Cambridge, MA, USA
View Profile

IUI '18: Proceedings of the 23rd International Conference on Intelligent User InterfacesMarch 2018Pages 43–53https://doi.org/10.1145/3172944.3172977

Published:05 March 2018Publication History

IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces

Pages 43–53

ABSTRACT

We present a wearable interface that allows a user to silently converse with a computing device without any voice or any discernible movements - thereby enabling the user to communicate with devices, AI assistants, applications or other people in a silent, concealed and seamless manner. A user's intention to speak and internal speech is characterized by neuromuscular signals in internal speech articulators that are captured by the AlterEgo system to reconstruct this speech. We use this to facilitate a natural language user interface, where users can silently communicate in natural language and receive aural output (e.g - bone conduction headphones), thereby enabling a discreet, bi-directional interface with a computing device, and providing a seamless form of intelligence augmentation. The paper describes the architecture, design, implementation and operation of the entire system. We demonstrate robustness of the system through user studies and report 92% median word accuracy levels.

References

Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2015. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. arXiv {cs.CL}. Retrieved from http://arxiv.org/abs/1512.02595Google Scholar
W. Ross Ashby. 1956. Design for an intelligence-amplifier. Automata studies 400: 215--233.Google Scholar
W. Ross Ashby. 1957. An introduction to cybernetics. Retrieved from http://dspace.utalca.cl/handle/1950/6344Google Scholar
Alan Baddeley, Marge Eldridge, and Vivien Lewis. 1981. The role of subvocalisation in reading. The Quarterly Journal of Experimental Psychology Section A 33, 4: 439--454.Google ScholarCross Ref
Richard A. Bolt. 1980. Put-that-there: Voice and Gesture at the Graphics Interface. SIGGRAPH Comput. Graph. 14, 3: 262--270. Google ScholarDigital Library
Jonathan S. Brumberg, Alfonso Nieto-Castanon, Philip R. Kennedy, and Frank H. Guenther. 2010. Brain-Computer Interfaces for Speech Communication. Speech communication 52, 4: 367--379. Google ScholarDigital Library
Douglas C. Engelbart. 2001. Augmenting human intellect: a conceptual framework (1962). PACKER, Randall and JORDAN, Ken. Multimedia. From Wagner to Virtual Reality. New York: WW Norton & Company: 64--90.Google ScholarCross Ref
Douglas C. Engelbart and William K. English. 1968. A Research Center for Augmenting Human Intellect. In Proceedings of the December 9--11, 1968, Fall Joint Computer Conference, Part I (AFIPS '68 (Fall, part I)), 395--410. Google ScholarDigital Library
M. J. Fagan, S. R. Ell, J. M. Gilbert, E. Sarrazin, and P. M. Chapman. 2008. Development of a (silent) speech recognition system for patients following laryngectomy. Medical engineering & physics 30, 4: 419--425.Google Scholar
Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel, Cédric Gendrot, and Sophie Quattrocchi. 2010. Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. In Eleventh Annual Conference of the International Speech Communication Association. Retrieved from http://www.gipsa-lab.inpg.fr/~thomas.hueber/mes_documents/florescu_etal_interspeech_2010.PDFGoogle ScholarCross Ref
Carl Benedikt Frey and Michael A. Osborne. 2017. The future of employment: How susceptible are jobs to computerisation? Technological forecasting and social change 114, Supplement C: 254--280.Google Scholar
A. Graves, A. r. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 6645--6649.Google Scholar
Jefferson Y. Han. 2005. Low-cost Multi-touch Sensing Through Frustrated Total Internal Reflection. In Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology (UIST '05), 115--118. Google ScholarDigital Library
William J. Hardcastle. 1976. Physiology of speech production: an introduction for speech scientists. Academic Press.Google Scholar
Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Nakamura, Yoshitaka Nakajima, and Kiyohiro Shikano. 2010. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech communication 52, 4: 301--313. Google ScholarDigital Library
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, and Sergey I. Rybchenko. 2013. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech communication 55, 1: 22--32. Google ScholarDigital Library
Thomas Hueber, Gérard Chollet, Bruce Denby, and Maureen Stone. 2008. Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application. Proc. of ISSP: 365--369.Google Scholar
Jorgensen, C., & Binsted, K. (2005, January). Web browser control using EMG based sub vocal speech recognition. In System Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on (pp. 294c-294c). IEEE. Google ScholarDigital Library
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv {cs.LG}. Retrieved from http://arxiv.org/abs/1412.6980Google Scholar
J. C. R. Licklider. 1960. Man-Computer Symbiosis. IRE Transactions on Human Factors in Electronics HFE-1, 1: 4--11.Google ScholarCross Ref
S. Mitra and T. Acharya. 2007. Gesture Recognition: A Survey. IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews: a publication of the IEEE Systems, Man, and Cybernetics Society 37, 3: 311--324. Google ScholarDigital Library
A. Nijholt, D. Tan, G. Pfurtscheller, C. Brunner, J. d. R. Millán, B. Allison, B. Graimann, F. Popescu, B. Blankertz, and K. R. Müller. 2008. Brain-Computer Interfacing for Intelligent Systems. IEEE intelligent systems 23, 3: 72--79. Google ScholarDigital Library
Sharon Oviatt, Phil Cohen, Lizhong Wu, Lisbeth Duncan, Bernhard Suhm, Josh Bers, Thomas Holzman, Terry Winograd, James Landay, Jim Larson, and David Ferro. 2000. Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Human--Computer Interaction 15, 4: 263--322. Google ScholarDigital Library
Anne Porbadnigk, Marek Wester, Jan-P Calliess, and Tanja Schultz. 2009. EEG-BASED SPEECH RECOGNITION Impact of Temporal Effects.Google Scholar
Michael Wand and Tanja Schultz. 2011. Session-independent EMG-based Speech Recognition. In Biosignals, 295--300.Google Scholar
M. Wand, J. Koutník, and J. Schmidhuber. 2016. Lipreading with long short-term memory. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6115--6119.Google Scholar
Nicole Yankelovich, Gina-Anne Levow, and Matt Marx. 1995. Designing SpeechActs: Issues in Speech User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '95), 369--376. Google ScholarDigital Library
iOS - Siri. Apple. Retrieved October 9, 2017 from https://www.apple.com/ios/siri/Google Scholar
Alexa. Retrieved from https://developer.amazon.com/alexaGoogle Scholar
Cortana | Your Intelligent Virtual & Personal Assistant | Microsoft. Retrieved October 9, 2017 from https://www.microsoft.com/en-us/windows/cortanaGoogle Scholar
Google Home. Google Store. Retrieved October 9, 2017 from https://store.google.com/us/product/google_home?hl=en-USGoogle Scholar
Echo. Retrieved from https://www.amazon.com/Amazon-Echo-And-Alexa-Devices/b?ie=UTF8&node=9818047011Google Scholar

Index Terms

AlterEgo: A Personalized Wearable Silent Speech Interface
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output
    2. Interaction paradigms
      1. Natural language interfaces
  2. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing theory, concepts and paradigms
      1. Mobile computing
    2. Ubiquitous and mobile devices
      1. Personal digital assistants

Recommendations

TongueBoard: An Oral Interface for Subtle Input
AH2019: Proceedings of the 10th Augmented Human International Conference 2019

We present TongueBoard, a retainer form-factor device for recognizing non-vocalized speech. TongueBoard enables absolute position tracking of the tongue by placing capacitive touch sensors on the roof of the mouth. We collect a dataset of 21 common ...
Read More
Statistical conversion of silent articulation into audible speech using full-covariance HMM

Conversion of silent articulation captured by ultrasound and video to modal speech.Comparison of GMM and full-covariance phonetic HMM without vocabulary limitation.HMM-based approach allows the use of linguistic information for regularization.Objective ...
Read More
Improvement to a NAM-captured whisper-to-speech system

Exploiting a tissue-conductive sensor - a stethoscopic microphone - the system developed at NAIST which converts non-audible murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces
March 2018
698 pages
ISBN:9781450349451
DOI:10.1145/3172944
General Chairs:
Shlomo Berkovsky
CSIRO, Australia
,
Yoshinori Hijikata
Kwansei Gakuin University, Japan
,
Jun Rekimoto
University of Tokyo, Japan
,
Program Chairs:
Margaret Burnett
Oregon State University, USA
,
Mark Billinghurst
University of South Australia, Australia
,
Aaron Quigley
University of St Andrews, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 March 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
human-machine symbiosis
intelligence augmentation
peripheral nerve interface
silent speech interface
Qualifiers
- research-article
Conference

Acceptance Rates
IUI '18 Paper Acceptance Rate43of299submissions,14%Overall Acceptance Rate746of2,811submissions,27%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 104
  Total Citations
  View Citations
- 3,483
  Total Downloads
- Downloads (Last 12 months)344
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AlterEgo: A Personalized Wearable Silent Speech Interface

IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

TongueBoard: An Oral Interface for Subtle Input

Statistical conversion of silent articulation into audible speech using full-covariance HMM

Improvement to a NAM-captured whisper-to-speech system