skip to main content
10.5555/1867750.1867763dlproceedingsArticle/Chapter ViewAbstractPublication PagesslpatConference Proceedingsconference-collections
research-article
Free Access

State-transition interpolation and MAP adaptation for HMM-based dysarthric speech recognition

Published:05 June 2010Publication History

ABSTRACT

This paper describes the results of our experiments in building speaker-adaptive recognizers for talkers with spastic dysarthria. We study two modifications -- (a) MAP adaptation of speaker-independent systems trained on normal speech and, (b) using a transition probability matrix that is a linear interpolation between fully ergodic and (exclusively) left-to-right structures, for both speaker-dependent and speaker-adapted systems. The experiments indicate that (1) for speaker-dependent systems, left-to-right HMMs have lower word error rate than transition-interpolated HMMs, (2) adapting all parameters other than transition probabilities results in the highest recognition accuracy compared to adapting any subset of these parameters or adapting all parameters including transition probabilities, (3) performing both transition-interpolation and adaptation gives higher word error rate than performing adaptation alone and, (4) dysarthria severity is not a sufficient indicator of the relative performance of speaker-dependent and speaker-adapted systems.

References

  1. }}Gloria S. Carlson and Jared Bernstein. 1987. Speech Recognition of Impaired Speech. Proceedings of RESNA 10th Annual Conference on Rehabilitation Technology, 165--167.Google ScholarGoogle Scholar
  2. }}Colette L. Coleman and Lawrence S. Meyers. 1991. Computer Recognition of the Speech of Adults with Cerebral Palsy and Dysarthria. AAC: Augmentative and Alternative Communication, 7(1):34--42.Google ScholarGoogle ScholarCross RefCross Ref
  3. }}John R. Deller, D. Frank Hsu and Linda J. Ferrier. 1988. Encouraging Results in the Automated Recognition of Cerebral Palsy Speech. IEEE Transactions on Biomedical Engineering, 35(3):218--220.Google ScholarGoogle ScholarCross RefCross Ref
  4. }}John R. Deller, D. Frank Hsu and Linda J. Ferrier. 1991. On the use of Hidden Markov modelling for Recognition of Dysarthric Speech. Computer Methods and Programs in Biomedicine, 35(2):125--139.Google ScholarGoogle ScholarCross RefCross Ref
  5. }}Melanie Fried-Oken. 1985. Voice Recognition Device as a Computer Interface for Motor and Speech Impaired People. Archives of Physical Medicine and Rehabilitation, 66:678--681.Google ScholarGoogle Scholar
  6. }}Jean-luc Gauvain and Chin-hui Lee. 1991. Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models. Proceedings of DARPA Speech and Natural Language Workshop, 272--277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Jean-luc Gauvain and Chin-hui Lee. 1992. MAP Estimation of Continuous Density HMM: Theory and Applications. Proceedings of DARPA Speech and Natural Language Workshop, 185--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren and Victor Zue. 1993. TIMIT Acoustic-Phonetic Continuous Speech Corpus. http://www.ldc.upenn.edu/Catalog/LDC93S1.html.Google ScholarGoogle Scholar
  9. }}Hynek Hermansky. 1990. Perceptual Linear Predictive (PLP) Analysis of Speech. Journal of the Acoustical Society of America, 87(4):1738--1752.Google ScholarGoogle ScholarCross RefCross Ref
  10. }}Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas Huang, Kenneth Watkin and Simone Frame. 2008. Dysarthric Speech Database for Universal Access Research. Proceedings of Interspeech, Brisbane, Australia, 22--26.Google ScholarGoogle Scholar
  11. }}Xavier Menendez-Pidal, James B. Polikoff, Shirley M. Peters, Jennie E. Leonzio, H. T. Bunnell. 1996. The Nemours Database of Dysarthric Speech. Proceedings of the Fourth International Conference on Spoken Language Processing, Philadelphia, PA, USA.Google ScholarGoogle ScholarCross RefCross Ref
  12. }}Prasad D. Polur and Gerald E. Miller. 2005a. Effect of High-Frequency Spectral Components in Computer Recognition of Dysarthric Speech based on a Mel-Cepstral Stochastic Model. Journal of Rehabilitation Research & Development, 42(3):363--372.Google ScholarGoogle ScholarCross RefCross Ref
  13. }}Prasad D. Polur and Gerald E. Miller. 2005b. Experiments with Fast Fourier Transform, Linear Predictive and Cepstral Coefficients in Dysarthric Speech Recognition Algorithms using Hidden Markov Model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4):558--561.Google ScholarGoogle ScholarCross RefCross Ref
  14. }}Parimala Raghavendra, Elisabet Rosengren and Sheri Hunnicutt. 2001. An Investigation of Different Degrees of Dysarthric Speech as Input to Speaker-Adaptive and Speaker-Dependent Recognition Systems. AAC: Augmentative and Alternative Communication, 17(4):265--275.Google ScholarGoogle ScholarCross RefCross Ref
  15. }}Frank Rudzicz. 2007. Comparing Speaker-Dependent and Speaker-Adaptive Acoustic Models for Recognizing Dysarthric Speech. Proceedings of ASSETS'07, Tempe, AZ, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}Harsh Vardhan Sharma and Mark Hasegawa-Johnson. 2009. Universal Access: Speech Recognition for Talkers with Spastic Dysarthria. Proceedings of Inter-speech, Brighton, UK, 1451--1454.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    SLPAT '10: Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies
    June 2010
    119 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 5 June 2010

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader