- Bahl, L. et al. Maximum mutual information estimation of HMM parameters. In Proceedings of ICASSP (1986), 49--52.Google Scholar
- Baker, J. Stochastic modeling for ASR. Speech Recognition. D.R. Reddy, ed. Academic Press, 1975.Google Scholar
- Baum, L. Statistical Estimation for Probabilistic Functions of a Markov Process. Inequalities III, (1972), 1--8.Google Scholar
- Chen, X., et al. Pipelined back-propagation for context-dependent deep neural networks. In Proceedings of Interspeech, 2012.Google Scholar
- Dahl, G., et al. Context-dependent pre-trained deep neural networks for LVSR. In IEEE Trans. ASLP 20, 1 (2012), 30--42. Google ScholarDigital Library
- Davis, S. et al. Comparison of parametric representations. IEEE Trans ASSP 28, 4 (1980), 357--366.Google ScholarCross Ref
- Dean, J. et al. Large scale distributed deep networks. In Proceedings of NIPS (Lake Tahoe, NV, 2012).Google Scholar
- Dempster, et al. Maximum likelihood from incomplete data via the EM algorithm. JRSS 39, 1 (1977), 1--38.Google Scholar
- De Mori, R. Spoken Dialogue with Computers. Academic Press, 1998. Google ScholarDigital Library
- Deng, L. and Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM 47, 1 (Jan. 2004), 69--75. Google ScholarDigital Library
- Deng, L. et al. Binary coding of speech spectrograms using a deep auto-encoder. In Proceedings of Interspeech, 2010.Google Scholar
- Fiscus, J. Recognizer output voting error reduction (ROVER). In Proceedings of IEEE ASRU Workshop (1997), 347--354.Google Scholar
- He, X., et al. Discriminative learning in sequential pattern recognition. IEEE Signal Processing 25, 5 (2008), 14--36.Google Scholar
- Hinton, G., et al. Deep neural networks for acoustic modeling in SR. IEEE Signal Processing 29, 11 (2012).Google Scholar
- Huang, X., Acero, A., and Hon, H. Spoken Language Processing. Prentice Hall, Upper Saddle River, NJ, 2001. Google ScholarDigital Library
- Huang, X. et al. MiPad: A multimodal interaction prototype. In Proceedings of ICASSP (Salt Lake City, UT, 2001).Google Scholar
- Huang, J. et al. Cross-language knowledge transfer using multilingual DNN. In Proceedings of ICASSP (2013), 7304--7308.Google Scholar
- Hwang, M., and Huang, X. Shared-distribution HMMs for speech. IEEE Trans S&AP 1, 4 (1993), 414--420.Google Scholar
- Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997. Google ScholarDigital Library
- Jelinek, F. Continuous speech recognition by statistical methods. In Proceedings of the IEEE 64, 4 (1976), 532--557.Google ScholarCross Ref
- Katagiri, S. et al. Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method. In Proceedings of the IEEE 86, 11 (1998), 2345--2373.Google ScholarCross Ref
- Kingsbury, B. et al. Scalable minimum Bayes risk training of deep neural network acoustic models. In Proceedings of Interspeech 2012.Google Scholar
- Klatt, D.H. Review of the ARPA speech understanding project. JASA 62, 6 (1977), 1345--1366.Google ScholarCross Ref
- Lee, C. and Huo, Q. On adaptive decision rules and decision parameters adaption for ASR. In Proceedings of the IEEE 88, 8 (2000), 1241--1269.Google Scholar
- Lee, K. ASR: The Development of the Sphinx Recognition System. Springer-Verlag, 1988. Google ScholarDigital Library
- Lowerre, B. The Harpy Speech Recognition System. Ph.D. Thesis (1976). Carnegie Mellon University. Google ScholarDigital Library
- Mikolov, T. et al. Extensions of recurrent neural network language model. In Proceedings of ICASSP (2011), 5528--5531.Google Scholar
- Mohri, M. et al. Weighted finite state transducers in speech recognition. Computer Speech & Language 16 (2002), 69--88.Google ScholarDigital Library
- Morgan, N. et al. Continuous speech recognition using mulitlayer perceptions with Hidden Markov Models. In Proceedings of ICASSP (1990).Google ScholarCross Ref
- Pieraccini R. et al. A speech understanding system based on statistical representation. In Proceedings of ICASSP (1992), 193--196. Google ScholarDigital Library
- Potter, R., Kopp, G. and Green, H. Visible Speech. Van Nostrand, New York, NY, 1947.Google Scholar
- Price, P. Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop, (Hidden Valley, PA, 1990).Google ScholarDigital Library
- Rabiner L. and Juang, B. Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993. Google ScholarDigital Library
- Reddy, R. Speech recognition by machine: A review. In Proceedings of the IEEE 64, 4 (1976), 501--531; http://www.rr.cs.cmu.edu/sr.pdf.Google ScholarCross Ref
- Seneff S. Tina: A NL system for spoken language application. Computational Linguistics 18, 1 (1992), 61--86. Google ScholarDigital Library
- Tur, G., and De Mori, R. SLU: Systems for Extracting Semantic Information from Speech. Wiley, U.K., 2011.Google Scholar
- Yan, Z., Huo, Q., and Xu, J. A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. In Proceedings of Interspeech (2013).Google Scholar
- Yao, K. et al. Recurrent neural networks for language understanding. In Proceedings of Interspeech (2013), 104--108.Google Scholar
- Yu, D. et al. Feature learning in DNN---Studies on speech recognition tasks. ICLR (2013).Google Scholar
- Waibel, A. Phone recognition using time-delay neural networks. IEEE Trans. on ASSP 37, 3 (1989), 328--339.Google ScholarCross Ref
- Ward, W. et al. Recent improvements in the CMU SUS. In Proceedings of ARPA Human Language Technology (1994), 213--216. Google ScholarDigital Library
- Williams, J. and Young, S. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21, 2 (2007), 393--422. Google ScholarDigital Library
- Zue, V. The use of speech knowledge in speech recognition. In Proceedings of the IEEE 73, 11 (1985), 1602--1615.Google ScholarCross Ref
Index Terms
- A historical perspective of speech recognition
Recommendations
Speech and audio in window systems: when will they happen?
SIGGRAPH '89: ACM SIGGRAPH 89 Panel ProceedingsGood afternoon. Boy, I can't see anything out there. I assume you all can see me -- thats why these lights are here. My name is Chris Schmandt from the Media Lab at MIT. I'm co-chairing this panel with Barry Arons, who is sitting over here. It's actually ...
Speech and audio in window systems: when will they happen?
Good afternoon. Boy, I can't see anything out there. I assume you all can see me -- thats why these lights are here. My name is Chris Schmandt from the Media Lab at MIT. I'm co-chairing this panel with Barry Arons, who is sitting over here. It's actually ...
Comments