ABSTRACT
This paper describes the results of our experiments in building speaker-adaptive recognizers for talkers with spastic dysarthria. We study two modifications -- (a) MAP adaptation of speaker-independent systems trained on normal speech and, (b) using a transition probability matrix that is a linear interpolation between fully ergodic and (exclusively) left-to-right structures, for both speaker-dependent and speaker-adapted systems. The experiments indicate that (1) for speaker-dependent systems, left-to-right HMMs have lower word error rate than transition-interpolated HMMs, (2) adapting all parameters other than transition probabilities results in the highest recognition accuracy compared to adapting any subset of these parameters or adapting all parameters including transition probabilities, (3) performing both transition-interpolation and adaptation gives higher word error rate than performing adaptation alone and, (4) dysarthria severity is not a sufficient indicator of the relative performance of speaker-dependent and speaker-adapted systems.
- }}Gloria S. Carlson and Jared Bernstein. 1987. Speech Recognition of Impaired Speech. Proceedings of RESNA 10th Annual Conference on Rehabilitation Technology, 165--167.Google Scholar
- }}Colette L. Coleman and Lawrence S. Meyers. 1991. Computer Recognition of the Speech of Adults with Cerebral Palsy and Dysarthria. AAC: Augmentative and Alternative Communication, 7(1):34--42.Google ScholarCross Ref
- }}John R. Deller, D. Frank Hsu and Linda J. Ferrier. 1988. Encouraging Results in the Automated Recognition of Cerebral Palsy Speech. IEEE Transactions on Biomedical Engineering, 35(3):218--220.Google ScholarCross Ref
- }}John R. Deller, D. Frank Hsu and Linda J. Ferrier. 1991. On the use of Hidden Markov modelling for Recognition of Dysarthric Speech. Computer Methods and Programs in Biomedicine, 35(2):125--139.Google ScholarCross Ref
- }}Melanie Fried-Oken. 1985. Voice Recognition Device as a Computer Interface for Motor and Speech Impaired People. Archives of Physical Medicine and Rehabilitation, 66:678--681.Google Scholar
- }}Jean-luc Gauvain and Chin-hui Lee. 1991. Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models. Proceedings of DARPA Speech and Natural Language Workshop, 272--277. Google ScholarDigital Library
- }}Jean-luc Gauvain and Chin-hui Lee. 1992. MAP Estimation of Continuous Density HMM: Theory and Applications. Proceedings of DARPA Speech and Natural Language Workshop, 185--190. Google ScholarDigital Library
- }}John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren and Victor Zue. 1993. TIMIT Acoustic-Phonetic Continuous Speech Corpus. http://www.ldc.upenn.edu/Catalog/LDC93S1.html.Google Scholar
- }}Hynek Hermansky. 1990. Perceptual Linear Predictive (PLP) Analysis of Speech. Journal of the Acoustical Society of America, 87(4):1738--1752.Google ScholarCross Ref
- }}Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas Huang, Kenneth Watkin and Simone Frame. 2008. Dysarthric Speech Database for Universal Access Research. Proceedings of Interspeech, Brisbane, Australia, 22--26.Google Scholar
- }}Xavier Menendez-Pidal, James B. Polikoff, Shirley M. Peters, Jennie E. Leonzio, H. T. Bunnell. 1996. The Nemours Database of Dysarthric Speech. Proceedings of the Fourth International Conference on Spoken Language Processing, Philadelphia, PA, USA.Google ScholarCross Ref
- }}Prasad D. Polur and Gerald E. Miller. 2005a. Effect of High-Frequency Spectral Components in Computer Recognition of Dysarthric Speech based on a Mel-Cepstral Stochastic Model. Journal of Rehabilitation Research & Development, 42(3):363--372.Google ScholarCross Ref
- }}Prasad D. Polur and Gerald E. Miller. 2005b. Experiments with Fast Fourier Transform, Linear Predictive and Cepstral Coefficients in Dysarthric Speech Recognition Algorithms using Hidden Markov Model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4):558--561.Google ScholarCross Ref
- }}Parimala Raghavendra, Elisabet Rosengren and Sheri Hunnicutt. 2001. An Investigation of Different Degrees of Dysarthric Speech as Input to Speaker-Adaptive and Speaker-Dependent Recognition Systems. AAC: Augmentative and Alternative Communication, 17(4):265--275.Google ScholarCross Ref
- }}Frank Rudzicz. 2007. Comparing Speaker-Dependent and Speaker-Adaptive Acoustic Models for Recognizing Dysarthric Speech. Proceedings of ASSETS'07, Tempe, AZ, USA. Google ScholarDigital Library
- }}Harsh Vardhan Sharma and Mark Hasegawa-Johnson. 2009. Universal Access: Speech Recognition for Talkers with Spastic Dysarthria. Proceedings of Inter-speech, Brighton, UK, 1451--1454.Google Scholar
Recommendations
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System
Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Comparing humans and automatic speech recognition systems in recognizing dysarthric speech
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligenceSpeech is a complex process that requires control and coordination of articulation, breathing, voicing, and prosody. Dysarthria is a manifestation of an inability to control and coordinate one or more of these aspects, which results in poorly ...
Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach
Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often ...
Comments