skip to main content
10.1145/3172944.3172977acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

AlterEgo: A Personalized Wearable Silent Speech Interface

Authors Info & Claims
Published:05 March 2018Publication History

ABSTRACT

We present a wearable interface that allows a user to silently converse with a computing device without any voice or any discernible movements - thereby enabling the user to communicate with devices, AI assistants, applications or other people in a silent, concealed and seamless manner. A user's intention to speak and internal speech is characterized by neuromuscular signals in internal speech articulators that are captured by the AlterEgo system to reconstruct this speech. We use this to facilitate a natural language user interface, where users can silently communicate in natural language and receive aural output (e.g - bone conduction headphones), thereby enabling a discreet, bi-directional interface with a computing device, and providing a seamless form of intelligence augmentation. The paper describes the architecture, design, implementation and operation of the entire system. We demonstrate robustness of the system through user studies and report 92% median word accuracy levels.

References

  1. Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2015. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. arXiv {cs.CL}. Retrieved from http://arxiv.org/abs/1512.02595Google ScholarGoogle Scholar
  2. W. Ross Ashby. 1956. Design for an intelligence-amplifier. Automata studies 400: 215--233.Google ScholarGoogle Scholar
  3. W. Ross Ashby. 1957. An introduction to cybernetics. Retrieved from http://dspace.utalca.cl/handle/1950/6344Google ScholarGoogle Scholar
  4. Alan Baddeley, Marge Eldridge, and Vivien Lewis. 1981. The role of subvocalisation in reading. The Quarterly Journal of Experimental Psychology Section A 33, 4: 439--454.Google ScholarGoogle ScholarCross RefCross Ref
  5. Richard A. Bolt. 1980. Put-that-there: Voice and Gesture at the Graphics Interface. SIGGRAPH Comput. Graph. 14, 3: 262--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jonathan S. Brumberg, Alfonso Nieto-Castanon, Philip R. Kennedy, and Frank H. Guenther. 2010. Brain-Computer Interfaces for Speech Communication. Speech communication 52, 4: 367--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Douglas C. Engelbart. 2001. Augmenting human intellect: a conceptual framework (1962). PACKER, Randall and JORDAN, Ken. Multimedia. From Wagner to Virtual Reality. New York: WW Norton & Company: 64--90.Google ScholarGoogle ScholarCross RefCross Ref
  8. Douglas C. Engelbart and William K. English. 1968. A Research Center for Augmenting Human Intellect. In Proceedings of the December 9--11, 1968, Fall Joint Computer Conference, Part I (AFIPS '68 (Fall, part I)), 395--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. J. Fagan, S. R. Ell, J. M. Gilbert, E. Sarrazin, and P. M. Chapman. 2008. Development of a (silent) speech recognition system for patients following laryngectomy. Medical engineering & physics 30, 4: 419--425.Google ScholarGoogle Scholar
  10. Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia Colazo-Simon, Claire Pillot-Loiseau, Pierre Roussel, Cédric Gendrot, and Sophie Quattrocchi. 2010. Silent vs vocalized articulation for a portable ultrasound-based silent speech interface. In Eleventh Annual Conference of the International Speech Communication Association. Retrieved from http://www.gipsa-lab.inpg.fr/~thomas.hueber/mes_documents/florescu_etal_interspeech_2010.PDFGoogle ScholarGoogle ScholarCross RefCross Ref
  11. Carl Benedikt Frey and Michael A. Osborne. 2017. The future of employment: How susceptible are jobs to computerisation? Technological forecasting and social change 114, Supplement C: 254--280.Google ScholarGoogle Scholar
  12. A. Graves, A. r. Mohamed, and G. Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 6645--6649.Google ScholarGoogle Scholar
  13. Jefferson Y. Han. 2005. Low-cost Multi-touch Sensing Through Frustrated Total Internal Reflection. In Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology (UIST '05), 115--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. William J. Hardcastle. 1976. Physiology of speech production: an introduction for speech scientists. Academic Press.Google ScholarGoogle Scholar
  15. Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Nakamura, Yoshitaka Nakajima, and Kiyohiro Shikano. 2010. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech communication 52, 4: 301--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, and Sergey I. Rybchenko. 2013. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech communication 55, 1: 22--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thomas Hueber, Gérard Chollet, Bruce Denby, and Maureen Stone. 2008. Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application. Proc. of ISSP: 365--369.Google ScholarGoogle Scholar
  18. Jorgensen, C., & Binsted, K. (2005, January). Web browser control using EMG based sub vocal speech recognition. In System Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on (pp. 294c-294c). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv {cs.LG}. Retrieved from http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  20. J. C. R. Licklider. 1960. Man-Computer Symbiosis. IRE Transactions on Human Factors in Electronics HFE-1, 1: 4--11.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Mitra and T. Acharya. 2007. Gesture Recognition: A Survey. IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews: a publication of the IEEE Systems, Man, and Cybernetics Society 37, 3: 311--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Nijholt, D. Tan, G. Pfurtscheller, C. Brunner, J. d. R. Millán, B. Allison, B. Graimann, F. Popescu, B. Blankertz, and K. R. Müller. 2008. Brain-Computer Interfacing for Intelligent Systems. IEEE intelligent systems 23, 3: 72--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sharon Oviatt, Phil Cohen, Lizhong Wu, Lisbeth Duncan, Bernhard Suhm, Josh Bers, Thomas Holzman, Terry Winograd, James Landay, Jim Larson, and David Ferro. 2000. Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. Human--Computer Interaction 15, 4: 263--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Anne Porbadnigk, Marek Wester, Jan-P Calliess, and Tanja Schultz. 2009. EEG-BASED SPEECH RECOGNITION Impact of Temporal Effects.Google ScholarGoogle Scholar
  25. Michael Wand and Tanja Schultz. 2011. Session-independent EMG-based Speech Recognition. In Biosignals, 295--300.Google ScholarGoogle Scholar
  26. M. Wand, J. Koutník, and J. Schmidhuber. 2016. Lipreading with long short-term memory. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6115--6119.Google ScholarGoogle Scholar
  27. Nicole Yankelovich, Gina-Anne Levow, and Matt Marx. 1995. Designing SpeechActs: Issues in Speech User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '95), 369--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. iOS - Siri. Apple. Retrieved October 9, 2017 from https://www.apple.com/ios/siri/Google ScholarGoogle Scholar
  29. Alexa. Retrieved from https://developer.amazon.com/alexaGoogle ScholarGoogle Scholar
  30. Cortana | Your Intelligent Virtual & Personal Assistant | Microsoft. Retrieved October 9, 2017 from https://www.microsoft.com/en-us/windows/cortanaGoogle ScholarGoogle Scholar
  31. Google Home. Google Store. Retrieved October 9, 2017 from https://store.google.com/us/product/google_home?hl=en-USGoogle ScholarGoogle Scholar
  32. Echo. Retrieved from https://www.amazon.com/Amazon-Echo-And-Alexa-Devices/b?ie=UTF8&node=9818047011Google ScholarGoogle Scholar

Index Terms

  1. AlterEgo: A Personalized Wearable Silent Speech Interface

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              IUI '18: Proceedings of the 23rd International Conference on Intelligent User Interfaces
              March 2018
              698 pages
              ISBN:9781450349451
              DOI:10.1145/3172944

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 5 March 2018

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              IUI '18 Paper Acceptance Rate43of299submissions,14%Overall Acceptance Rate746of2,811submissions,27%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader