skip to main content
Error-responsive feedback mechanisms for speech recognizers
Publisher:
  • Carnegie Mellon University
  • Schenley Park Pittsburgh, PA
  • United States
ISBN:978-0-591-91865-6
Order Number:AAI9838206
Pages:
287
Bibliometrics
Skip Abstract Section
Abstract

This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal interfaces, multi-media databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement of word error rate.

The speech recognizer's functionality is extended to include confidence annotations, which are "meta-level" markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally defined error conditions back to the recoginizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally defined error conditions.

The measuring and modeling techniques used for confidence annotation are extended to create a blame assignment system for utterances whose actual transcripts are known. Errors are classified into a set of categories, some of which are directly useful in automatic adaptation schemes while others are more suited for human interpretation.

This classification approach is enhanced when used in conjunction with a visual error analysis tool that was developed during the thesis project.

Cited By

  1. ACM
    Marge M and Rudnicky A (2019). Miscommunication Detection and Recovery in Situated Human–Robot Dialogue, ACM Transactions on Interactive Intelligent Systems, 9:1, (1-40), Online publication date: 31-Mar-2019.
  2. Hermansky H Dealing with unexpected words in automatic recognition of speech Proceedings of the 14th international conference on Text, speech and dialogue, (1-15)
  3. Blackwood G, de Gispert A and Byrne W Fluency constraints for minimum Bayes-risk decoding of statistical machine translation lattices Proceedings of the 23rd International Conference on Computational Linguistics, (71-79)
  4. Cerňak M Diagnostics for debugging speech recognition systems Proceedings of the 13th international conference on Text, speech and dialogue, (251-258)
  5. Hirsimäki T and Kurimo M Analysing recognition errors in unlimited-vocabulary speech recognition Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, (193-196)
  6. Pervouchine V, Li H and Lin B Transliteration alignment Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, (136-144)
  7. Zhou L, Shi Y, Zhang D and Sears A (2006). Discovering Cues to Error Detection in Speech Recognition Output, Journal of Management Information Systems, 22:4, (237-270), Online publication date: 1-Apr-2006.
  8. Shi Y and Zhou L Error detection using linguistic features Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, (41-48)
  9. Stemmer G, Steidl S, Nöth E, Niemann H and Batliner A Comparison and Combination of Confidence Measures Proceedings of the 5th International Conference on Text, Speech and Dialogue, (181-188)
  10. Zechner K and Waibel A Using chunk based partial parsing of spontaneous speech in unrestricted domains for reducing word error rate in speech recognition Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2, (1453-1459)
Contributors
  • Carnegie Mellon University

Recommendations