research-article

Free Access

State-transition interpolation and MAP adaptation for HMM-based dysarthric speech recognition

Authors:
Harsh Vardhan Sharma

Beckman Institute, Urbana, IL

Beckman Institute, Urbana, IL
View Profile

,
Mark Hasegawa-Johnson

Beckman Institute, Urbana, IL

Beckman Institute, Urbana, IL
View Profile

SLPAT '10: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive TechnologiesJune 2010Pages 72–79

Published:05 June 2010Publication History

SLPAT '10: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies

Pages 72–79

ABSTRACT

This paper describes the results of our experiments in building speaker-adaptive recognizers for talkers with spastic dysarthria. We study two modifications -- (a) MAP adaptation of speaker-independent systems trained on normal speech and, (b) using a transition probability matrix that is a linear interpolation between fully ergodic and (exclusively) left-to-right structures, for both speaker-dependent and speaker-adapted systems. The experiments indicate that (1) for speaker-dependent systems, left-to-right HMMs have lower word error rate than transition-interpolated HMMs, (2) adapting all parameters other than transition probabilities results in the highest recognition accuracy compared to adapting any subset of these parameters or adapting all parameters including transition probabilities, (3) performing both transition-interpolation and adaptation gives higher word error rate than performing adaptation alone and, (4) dysarthria severity is not a sufficient indicator of the relative performance of speaker-dependent and speaker-adapted systems.

References

}}Gloria S. Carlson and Jared Bernstein. 1987. Speech Recognition of Impaired Speech. Proceedings of RESNA 10th Annual Conference on Rehabilitation Technology, 165--167.Google Scholar
}}Colette L. Coleman and Lawrence S. Meyers. 1991. Computer Recognition of the Speech of Adults with Cerebral Palsy and Dysarthria. AAC: Augmentative and Alternative Communication, 7(1):34--42.Google ScholarCross Ref
}}John R. Deller, D. Frank Hsu and Linda J. Ferrier. 1988. Encouraging Results in the Automated Recognition of Cerebral Palsy Speech. IEEE Transactions on Biomedical Engineering, 35(3):218--220.Google ScholarCross Ref
}}John R. Deller, D. Frank Hsu and Linda J. Ferrier. 1991. On the use of Hidden Markov modelling for Recognition of Dysarthric Speech. Computer Methods and Programs in Biomedicine, 35(2):125--139.Google ScholarCross Ref
}}Melanie Fried-Oken. 1985. Voice Recognition Device as a Computer Interface for Motor and Speech Impaired People. Archives of Physical Medicine and Rehabilitation, 66:678--681.Google Scholar
}}Jean-luc Gauvain and Chin-hui Lee. 1991. Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models. Proceedings of DARPA Speech and Natural Language Workshop, 272--277. Google ScholarDigital Library
}}Jean-luc Gauvain and Chin-hui Lee. 1992. MAP Estimation of Continuous Density HMM: Theory and Applications. Proceedings of DARPA Speech and Natural Language Workshop, 185--190. Google ScholarDigital Library
}}John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren and Victor Zue. 1993. TIMIT Acoustic-Phonetic Continuous Speech Corpus. http://www.ldc.upenn.edu/Catalog/LDC93S1.html.Google Scholar
}}Hynek Hermansky. 1990. Perceptual Linear Predictive (PLP) Analysis of Speech. Journal of the Acoustical Society of America, 87(4):1738--1752.Google ScholarCross Ref
}}Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas Huang, Kenneth Watkin and Simone Frame. 2008. Dysarthric Speech Database for Universal Access Research. Proceedings of Interspeech, Brisbane, Australia, 22--26.Google Scholar
}}Xavier Menendez-Pidal, James B. Polikoff, Shirley M. Peters, Jennie E. Leonzio, H. T. Bunnell. 1996. The Nemours Database of Dysarthric Speech. Proceedings of the Fourth International Conference on Spoken Language Processing, Philadelphia, PA, USA.Google ScholarCross Ref
}}Prasad D. Polur and Gerald E. Miller. 2005a. Effect of High-Frequency Spectral Components in Computer Recognition of Dysarthric Speech based on a Mel-Cepstral Stochastic Model. Journal of Rehabilitation Research & Development, 42(3):363--372.Google ScholarCross Ref
}}Prasad D. Polur and Gerald E. Miller. 2005b. Experiments with Fast Fourier Transform, Linear Predictive and Cepstral Coefficients in Dysarthric Speech Recognition Algorithms using Hidden Markov Model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4):558--561.Google ScholarCross Ref
}}Parimala Raghavendra, Elisabet Rosengren and Sheri Hunnicutt. 2001. An Investigation of Different Degrees of Dysarthric Speech as Input to Speaker-Adaptive and Speaker-Dependent Recognition Systems. AAC: Augmentative and Alternative Communication, 17(4):265--275.Google ScholarCross Ref
}}Frank Rudzicz. 2007. Comparing Speaker-Dependent and Speaker-Adaptive Acoustic Models for Recognizing Dysarthric Speech. Proceedings of ASSETS'07, Tempe, AZ, USA. Google ScholarDigital Library
}}Harsh Vardhan Sharma and Mark Hasegawa-Johnson. 2009. Universal Access: Speech Recognition for Talkers with Spastic Dysarthria. Proceedings of Inter-speech, Brighton, UK, 1451--1454.Google Scholar

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Read More
Comparing humans and automatic speech recognition systems in recognizing dysarthric speech
Canadian AI'11: Proceedings of the 24th Canadian conference on Advances in artificial intelligence

Speech is a complex process that requires control and coordination of articulation, breathing, voicing, and prosody. Dysarthria is a manifestation of an inability to control and coordinate one or more of these aspects, which results in poorly ...
Read More
Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Dysarthria is a neurological impairment of controlling the motor speech articulators that compromises the speech signal. Automatic Speech Recognition (ASR) can be very helpful for speakers with dysarthria because the disabled persons are often ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SLPAT '10: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies
June 2010
119 pages
Program Chairs:
Melanie Fried-Oken
Oregon Health & Science University
,
Kathleen F. McCoy
University of Delaware
,
Brian Roark
Oregon Health & Science University
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 5 June 2010
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 353
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

State-transition interpolation and MAP adaptation for HMM-based dysarthric speech recognition

SLPAT '10: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies

ABSTRACT

References

Cited By

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Comparing humans and automatic speech recognition systems in recognizing dysarthric speech

Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

State-transition interpolation and MAP adaptation for HMM-based dysarthric speech recognition

SLPAT '10: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies

ABSTRACT

References

Cited By

Recommendations

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Comparing humans and automatic speech recognition systems in recognizing dysarthric speech

Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media