research-article

Free Access

A historical perspective of speech recognition

Authors:
Xuedong Huang

Microsoft Corp., Redmond, WA

Microsoft Corp., Redmond, WA
View Profile

,
James Baker

Dragon Systems in Newton, MA

Dragon Systems in Newton, MA
View Profile

,
Raj Reddy

Moza Bint Nasser University

Moza Bint Nasser University
View Profile

Authors Info & Claims

Communications of the ACM Volume 57 Issue 1January 2014pp 94–103https://doi.org/10.1145/2500887

Published:01 January 2014Publication History

Communications of the ACM

Abstract

What do we know now that we did not know 40 years ago?

References

Bahl, L. et al. Maximum mutual information estimation of HMM parameters. In Proceedings of ICASSP (1986), 49--52.Google Scholar
Baker, J. Stochastic modeling for ASR. Speech Recognition. D.R. Reddy, ed. Academic Press, 1975.Google Scholar
Baum, L. Statistical Estimation for Probabilistic Functions of a Markov Process. Inequalities III, (1972), 1--8.Google Scholar
Chen, X., et al. Pipelined back-propagation for context-dependent deep neural networks. In Proceedings of Interspeech, 2012.Google Scholar
Dahl, G., et al. Context-dependent pre-trained deep neural networks for LVSR. In IEEE Trans. ASLP 20, 1 (2012), 30--42. Google ScholarDigital Library
Davis, S. et al. Comparison of parametric representations. IEEE Trans ASSP 28, 4 (1980), 357--366.Google ScholarCross Ref
Dean, J. et al. Large scale distributed deep networks. In Proceedings of NIPS (Lake Tahoe, NV, 2012).Google Scholar
Dempster, et al. Maximum likelihood from incomplete data via the EM algorithm. JRSS 39, 1 (1977), 1--38.Google Scholar
De Mori, R. Spoken Dialogue with Computers. Academic Press, 1998. Google ScholarDigital Library
Deng, L. and Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM 47, 1 (Jan. 2004), 69--75. Google ScholarDigital Library
Deng, L. et al. Binary coding of speech spectrograms using a deep auto-encoder. In Proceedings of Interspeech, 2010.Google Scholar
Fiscus, J. Recognizer output voting error reduction (ROVER). In Proceedings of IEEE ASRU Workshop (1997), 347--354.Google Scholar
He, X., et al. Discriminative learning in sequential pattern recognition. IEEE Signal Processing 25, 5 (2008), 14--36.Google Scholar
Hinton, G., et al. Deep neural networks for acoustic modeling in SR. IEEE Signal Processing 29, 11 (2012).Google Scholar
Huang, X., Acero, A., and Hon, H. Spoken Language Processing. Prentice Hall, Upper Saddle River, NJ, 2001. Google ScholarDigital Library
Huang, X. et al. MiPad: A multimodal interaction prototype. In Proceedings of ICASSP (Salt Lake City, UT, 2001).Google Scholar
Huang, J. et al. Cross-language knowledge transfer using multilingual DNN. In Proceedings of ICASSP (2013), 7304--7308.Google Scholar
Hwang, M., and Huang, X. Shared-distribution HMMs for speech. IEEE Trans S&AP 1, 4 (1993), 414--420.Google Scholar
Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997. Google ScholarDigital Library
Jelinek, F. Continuous speech recognition by statistical methods. In Proceedings of the IEEE 64, 4 (1976), 532--557.Google ScholarCross Ref
Katagiri, S. et al. Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method. In Proceedings of the IEEE 86, 11 (1998), 2345--2373.Google ScholarCross Ref
Kingsbury, B. et al. Scalable minimum Bayes risk training of deep neural network acoustic models. In Proceedings of Interspeech 2012.Google Scholar
Klatt, D.H. Review of the ARPA speech understanding project. JASA 62, 6 (1977), 1345--1366.Google ScholarCross Ref
Lee, C. and Huo, Q. On adaptive decision rules and decision parameters adaption for ASR. In Proceedings of the IEEE 88, 8 (2000), 1241--1269.Google Scholar
Lee, K. ASR: The Development of the Sphinx Recognition System. Springer-Verlag, 1988. Google ScholarDigital Library
Lowerre, B. The Harpy Speech Recognition System. Ph.D. Thesis (1976). Carnegie Mellon University. Google ScholarDigital Library
Mikolov, T. et al. Extensions of recurrent neural network language model. In Proceedings of ICASSP (2011), 5528--5531.Google Scholar
Mohri, M. et al. Weighted finite state transducers in speech recognition. Computer Speech & Language 16 (2002), 69--88.Google ScholarDigital Library
Morgan, N. et al. Continuous speech recognition using mulitlayer perceptions with Hidden Markov Models. In Proceedings of ICASSP (1990).Google ScholarCross Ref
Pieraccini R. et al. A speech understanding system based on statistical representation. In Proceedings of ICASSP (1992), 193--196. Google ScholarDigital Library
Potter, R., Kopp, G. and Green, H. Visible Speech. Van Nostrand, New York, NY, 1947.Google Scholar
Price, P. Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop, (Hidden Valley, PA, 1990).Google ScholarDigital Library
Rabiner L. and Juang, B. Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993. Google ScholarDigital Library
Reddy, R. Speech recognition by machine: A review. In Proceedings of the IEEE 64, 4 (1976), 501--531; http://www.rr.cs.cmu.edu/sr.pdf.Google ScholarCross Ref
Seneff S. Tina: A NL system for spoken language application. Computational Linguistics 18, 1 (1992), 61--86. Google ScholarDigital Library
Tur, G., and De Mori, R. SLU: Systems for Extracting Semantic Information from Speech. Wiley, U.K., 2011.Google Scholar
Yan, Z., Huo, Q., and Xu, J. A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. In Proceedings of Interspeech (2013).Google Scholar
Yao, K. et al. Recurrent neural networks for language understanding. In Proceedings of Interspeech (2013), 104--108.Google Scholar
Yu, D. et al. Feature learning in DNN---Studies on speech recognition tasks. ICLR (2013).Google Scholar
Waibel, A. Phone recognition using time-delay neural networks. IEEE Trans. on ASSP 37, 3 (1989), 328--339.Google ScholarCross Ref
Ward, W. et al. Recent improvements in the CMU SUS. In Proceedings of ARPA Human Language Technology (1994), 213--216. Google ScholarDigital Library
Williams, J. and Young, S. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21, 2 (2007), 393--422. Google ScholarDigital Library
Zue, V. The use of speech knowledge in speech recognition. In Proceedings of the IEEE 73, 11 (1985), 1602--1615.Google ScholarCross Ref

Index Terms

A historical perspective of speech recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval

Recommendations

Speech and audio in window systems: when will they happen?
SIGGRAPH '89: ACM SIGGRAPH 89 Panel Proceedings

Good afternoon. Boy, I can't see anything out there. I assume you all can see me -- thats why these lights are here. My name is Chris Schmandt from the Media Lab at MIT. I'm co-chairing this panel with Barry Arons, who is sitting over here. It's actually ...
Read More
Speech and audio in window systems: when will they happen?

Good afternoon. Boy, I can't see anything out there. I assume you all can see me -- thats why these lights are here. My name is Chris Schmandt from the Media Lab at MIT. I'm co-chairing this panel with Barry Arons, who is sitting over here. It's actually ...
Read More
Speech recognition system: Hidden Markov Model Based Tigrigna Speech Recognizer
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 57, Issue 1
January 2014
107 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2541883
Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 122
  Total Citations
  View Citations
- 15,405
  Total Downloads
- Downloads (Last 12 months)547
- Downloads (Last 6 weeks)181
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF Chinese translation

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A historical perspective of speech recognition

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Speech and audio in window systems: when will they happen?

Speech and audio in window systems: when will they happen?

Speech recognition system: Hidden Markov Model Based Tigrigna Speech Recognizer

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

A historical perspective of speech recognition

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Speech and audio in window systems: when will they happen?

Speech and audio in window systems: when will they happen?

Speech recognition system: Hidden Markov Model Based Tigrigna Speech Recognizer

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media