skip to main content
research-article
Free Access

A historical perspective of speech recognition

Published:01 January 2014Publication History
Skip Abstract Section

Abstract

What do we know now that we did not know 40 years ago?

References

  1. Bahl, L. et al. Maximum mutual information estimation of HMM parameters. In Proceedings of ICASSP (1986), 49--52.Google ScholarGoogle Scholar
  2. Baker, J. Stochastic modeling for ASR. Speech Recognition. D.R. Reddy, ed. Academic Press, 1975.Google ScholarGoogle Scholar
  3. Baum, L. Statistical Estimation for Probabilistic Functions of a Markov Process. Inequalities III, (1972), 1--8.Google ScholarGoogle Scholar
  4. Chen, X., et al. Pipelined back-propagation for context-dependent deep neural networks. In Proceedings of Interspeech, 2012.Google ScholarGoogle Scholar
  5. Dahl, G., et al. Context-dependent pre-trained deep neural networks for LVSR. In IEEE Trans. ASLP 20, 1 (2012), 30--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Davis, S. et al. Comparison of parametric representations. IEEE Trans ASSP 28, 4 (1980), 357--366.Google ScholarGoogle ScholarCross RefCross Ref
  7. Dean, J. et al. Large scale distributed deep networks. In Proceedings of NIPS (Lake Tahoe, NV, 2012).Google ScholarGoogle Scholar
  8. Dempster, et al. Maximum likelihood from incomplete data via the EM algorithm. JRSS 39, 1 (1977), 1--38.Google ScholarGoogle Scholar
  9. De Mori, R. Spoken Dialogue with Computers. Academic Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Deng, L. and Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM 47, 1 (Jan. 2004), 69--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Deng, L. et al. Binary coding of speech spectrograms using a deep auto-encoder. In Proceedings of Interspeech, 2010.Google ScholarGoogle Scholar
  12. Fiscus, J. Recognizer output voting error reduction (ROVER). In Proceedings of IEEE ASRU Workshop (1997), 347--354.Google ScholarGoogle Scholar
  13. He, X., et al. Discriminative learning in sequential pattern recognition. IEEE Signal Processing 25, 5 (2008), 14--36.Google ScholarGoogle Scholar
  14. Hinton, G., et al. Deep neural networks for acoustic modeling in SR. IEEE Signal Processing 29, 11 (2012).Google ScholarGoogle Scholar
  15. Huang, X., Acero, A., and Hon, H. Spoken Language Processing. Prentice Hall, Upper Saddle River, NJ, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Huang, X. et al. MiPad: A multimodal interaction prototype. In Proceedings of ICASSP (Salt Lake City, UT, 2001).Google ScholarGoogle Scholar
  17. Huang, J. et al. Cross-language knowledge transfer using multilingual DNN. In Proceedings of ICASSP (2013), 7304--7308.Google ScholarGoogle Scholar
  18. Hwang, M., and Huang, X. Shared-distribution HMMs for speech. IEEE Trans S&AP 1, 4 (1993), 414--420.Google ScholarGoogle Scholar
  19. Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jelinek, F. Continuous speech recognition by statistical methods. In Proceedings of the IEEE 64, 4 (1976), 532--557.Google ScholarGoogle ScholarCross RefCross Ref
  21. Katagiri, S. et al. Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method. In Proceedings of the IEEE 86, 11 (1998), 2345--2373.Google ScholarGoogle ScholarCross RefCross Ref
  22. Kingsbury, B. et al. Scalable minimum Bayes risk training of deep neural network acoustic models. In Proceedings of Interspeech 2012.Google ScholarGoogle Scholar
  23. Klatt, D.H. Review of the ARPA speech understanding project. JASA 62, 6 (1977), 1345--1366.Google ScholarGoogle ScholarCross RefCross Ref
  24. Lee, C. and Huo, Q. On adaptive decision rules and decision parameters adaption for ASR. In Proceedings of the IEEE 88, 8 (2000), 1241--1269.Google ScholarGoogle Scholar
  25. Lee, K. ASR: The Development of the Sphinx Recognition System. Springer-Verlag, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lowerre, B. The Harpy Speech Recognition System. Ph.D. Thesis (1976). Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mikolov, T. et al. Extensions of recurrent neural network language model. In Proceedings of ICASSP (2011), 5528--5531.Google ScholarGoogle Scholar
  28. Mohri, M. et al. Weighted finite state transducers in speech recognition. Computer Speech & Language 16 (2002), 69--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Morgan, N. et al. Continuous speech recognition using mulitlayer perceptions with Hidden Markov Models. In Proceedings of ICASSP (1990).Google ScholarGoogle ScholarCross RefCross Ref
  30. Pieraccini R. et al. A speech understanding system based on statistical representation. In Proceedings of ICASSP (1992), 193--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Potter, R., Kopp, G. and Green, H. Visible Speech. Van Nostrand, New York, NY, 1947.Google ScholarGoogle Scholar
  32. Price, P. Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop, (Hidden Valley, PA, 1990).Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Rabiner L. and Juang, B. Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Reddy, R. Speech recognition by machine: A review. In Proceedings of the IEEE 64, 4 (1976), 501--531; http://www.rr.cs.cmu.edu/sr.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  35. Seneff S. Tina: A NL system for spoken language application. Computational Linguistics 18, 1 (1992), 61--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tur, G., and De Mori, R. SLU: Systems for Extracting Semantic Information from Speech. Wiley, U.K., 2011.Google ScholarGoogle Scholar
  37. Yan, Z., Huo, Q., and Xu, J. A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. In Proceedings of Interspeech (2013).Google ScholarGoogle Scholar
  38. Yao, K. et al. Recurrent neural networks for language understanding. In Proceedings of Interspeech (2013), 104--108.Google ScholarGoogle Scholar
  39. Yu, D. et al. Feature learning in DNN---Studies on speech recognition tasks. ICLR (2013).Google ScholarGoogle Scholar
  40. Waibel, A. Phone recognition using time-delay neural networks. IEEE Trans. on ASSP 37, 3 (1989), 328--339.Google ScholarGoogle ScholarCross RefCross Ref
  41. Ward, W. et al. Recent improvements in the CMU SUS. In Proceedings of ARPA Human Language Technology (1994), 213--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Williams, J. and Young, S. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21, 2 (2007), 393--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zue, V. The use of speech knowledge in speech recognition. In Proceedings of the IEEE 73, 11 (1985), 1602--1615.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A historical perspective of speech recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 57, Issue 1
        January 2014
        107 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/2541883
        • Editor:
        • Moshe Y. Vardi
        Issue’s Table of Contents

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 January 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Popular
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDFChinese translation

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format