skip to main content
article

Spoken dialogue technology: enabling the conversational user interface

Published:01 March 2002Publication History
Skip Abstract Section

Abstract

Spoken dialogue systems allow users to interact with computer-based applications such as databases and expert systems by using natural spoken language. The origins of spoken dialogue systems can be traced back to Artificial Intelligence research in the 1950s concerned with developing conversational interfaces. However, it is only within the last decade or so, with major advances in speech technology, that large-scale working systems have been developed and, in some cases, introduced into commercial environments. As a result many major telecommunications and software companies have become aware of the potential for spoken dialogue technology to provide solutions in newly developing areas such as computer-telephony integration. Voice portals, which provide a speech-based interface between a telephone user and Web-based services, are the most recent application of spoken dialogue technology. This article describes the main components of the technology---speech recognition, language understanding, dialogue management, communication with an external source such as a database, language generation, speech synthesis---and shows how these component technologies can be integrated into a spoken dialogue system. The article describes in detail the methods that have been adopted in some well-known dialogue systems, explores different system architectures, considers issues of specification, design, and evaluation, reviews some currently available dialogue development toolkits, and outlines prospects for future development.

References

  1. ABNEY, S. 1997. Part-of-speech tagging and partial parsing. In Corpus-Based Methods in Language and Speech Processing, S. Young and G. Bloothooft, Eds. Kluwer Academic Publishers, Dordrecht, The Netherlands, 118-136.Google ScholarGoogle Scholar
  2. ALLEN, J. 1983. Recognising intentions from natural language utterances. In Computational Models of Discourse, M. Brady and R. Berwick, Eds. MIT Press, Cambridge, MA, 107-166.Google ScholarGoogle Scholar
  3. ALLEN, J. 1995. Natural Language Processing, 2nd ed. Benjamin Cummings Publishing Company Inc., Redwood, CA. Google ScholarGoogle Scholar
  4. ALLEN, J., BYRON, D., DZIKOVSKA, M., FERGUSON,G., GALESCU, L., AND STENT, A. 2000. An architecture for a generic dialogue shell. Natural Language Engineering 6, 3, 1-16. Google ScholarGoogle Scholar
  5. ALLEN, J., MILLER, B., RINGGER, E., AND SIKORSKI,T. 1996. A robust system for natural spoken dialogue. In Proceedings of the 34th Annual Meeting of the ACL (Santa Cruz, CA). ACL, 62-70. Google ScholarGoogle Scholar
  6. ALLEN,J.AND PERRAULT, C. 1980. Analysing intention in utterances. Artificial Intelligence 15, 143- 178.Google ScholarGoogle Scholar
  7. ALLEN, J., SCHUBERT, L., FERGUSON, G., HWANG, C., KATO, T. , LIGHT, M., MILLER, B., POESIO, M., AND TRAUM, D. 1995. The TRAINS project: a case study in building a conversational planning agent. Journal of Experimental and Theoretical Artificial Intelligence 7, 7-48.Google ScholarGoogle Scholar
  8. ARETOULAKI,M.AND LUDWIG, B. 1999. Automatondescriptions and theorem-proving: a marriage made in heaven? In Proceedings of IJCAI'99 Workshop on Knowledge and Reasoning in Practical Dialogue Systems (Stockholm, Sweden), IJCAI.Google ScholarGoogle Scholar
  9. AUST,H.AND OERDER, M. 1995. Dialogue control in automatic inquiry systems. In Proceedings of the ESCA Workshop on Spoken Dialogue Systems, P. Dalsgaard, L. Larsen, L. Boves, and I. Thomsen, Eds. ESCA, Vigso, Denmark, 121-124.Google ScholarGoogle Scholar
  10. AUST, H., OERDER, M., SEIDE,F.,AND STEINBISS,V. 1995. The Philips automatic train timetable information system. Speech Communication 17, 249-262. Google ScholarGoogle Scholar
  11. AUSTIN, J. L. 1962. How to Do Things with Words. Oxford University Press, Oxford, UK.Google ScholarGoogle Scholar
  12. BAGGIA, P. 1996. Evaluation of spoken dialogue systems. Turorial, The 14th European Summer School on Language and Speech Communication.Google ScholarGoogle Scholar
  13. BERNSEN, N. 1993. The structure of the design space. In Computers, Communication, and Usability: Design Issues, Research and Methods for Integrated Services, P. Byerley, P. Barnard, and J. May, Eds. North Holland, Amsterdam, The Netherlands, 221-244.Google ScholarGoogle Scholar
  14. BERNSEN, N. 1994. Foundations of multimodal representations: a taxonomy of representational modalities. Interacting with Computers 6,4, 347-371.Google ScholarGoogle Scholar
  15. BERNSEN, N., DYBKJ~R, H., AND DYBKJ~R, L. 1996. Co-operativity in human-machine and human-human spoken dialogue. Discourse Processes 21, 2, 213-236.Google ScholarGoogle Scholar
  16. BERNSEN, N., DYBKJ~R, H., AND DYBKJ~R, L. 1998. Designing Interactive Speech Systems: From First Ideas to User Testing. Springer Verlag, New York, NY. Google ScholarGoogle Scholar
  17. BILLI, R., CASTAGNERI,G.,AND DANIELI, M. 1996. Field trial evaluation of two different information inquiry systems. In IVTTA. IEEE, Basking Ridge, NJ, 129-132.Google ScholarGoogle Scholar
  18. BOROS, M., ECKERT, W., GALLWITZ, F., GORZ,G., HANRIEDER,G.,AND NIEMANN, H. 1996. Towards understanding spontaneous speech: word accuracy vs. concept accuracy. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP96), Philadephia, PA). ICSLP, 1005-1008.Google ScholarGoogle Scholar
  19. BRATMAN, M., ISRAEL,D.,AND POLLACK, M. 1988. Plans and resource-bounded practical reasoning. Computational Intelligence 4, 2, 349- 355.Google ScholarGoogle Scholar
  20. CARBERRY, S. 1986. The use of inferred knowledge in handling pragmatically ill-formed queries. In Communication Failure in Dialogue, R. Reilly, Ed. Elsevier Science Publishers North Holland, Amsterdam, The Netherlands, 187-200.Google ScholarGoogle Scholar
  21. CARBERRY, S. 1989. Plan recognition and its use in understanding dialogue. In User Models in Dialog Systems, A. Kobsa and W. Wahlster Eds. Springer Verlag, London, UK, 133-162.Google ScholarGoogle Scholar
  22. CARBERRY,S.AND LAMBERT, L. 1999. A process model for recognising communicative acts and modeling negotiation subdialogues. Computational Linguistics 25, 1, 1-54. Google ScholarGoogle Scholar
  23. CARLETTA, J. 1996. Assessing the reliability of subjective codings. Computational Linguistics 22,2, 249-254. Google ScholarGoogle Scholar
  24. CARLSON,R.AND GRANSTR~M, B. 1997. Speech synthesis. In The Handbook of Phonetic Science, W. J. Hardcastle and J. Laver, Eds. Blackwell, Oxford, UK, 768-788.Google ScholarGoogle Scholar
  25. CHIN, D. 1989. KNOME: Modeling what the user knows in UC. In User Models in Dialog Systems, A. Kobsa and W. Wahlster, Eds. Springer Verlag, London, UK, 74-107.Google ScholarGoogle Scholar
  26. CLARK, H. 1992. Arenas of Language Use. University of Chicago Press, Chicago, IL.Google ScholarGoogle Scholar
  27. COHEN, P. 1994. Models of dialogue. In Proceedings of the 4th NEC Research Symposium, M. Nagao, Ed. SIAM Press Philadephia, PA.Google ScholarGoogle Scholar
  28. COHEN,P.AND LEVESQUE, H. 1990. Rational interaction as the basis for communication. In Intentions in Communication, P. Cohen, J. Morgan and M. Pollack, Eds. MIT Press, Cambridge, MA, 221-256.Google ScholarGoogle Scholar
  29. COHEN,P.AND OVIATT, S. 1995. The role of voice in human-machine communication. In Voice Communication Between Humans and Machines, D. Roe and J. Wilpon, Eds. National Academy Press, Washington, DC, 34-75. Google ScholarGoogle Scholar
  30. COLE, R., MASSARO,D.,DE VILLIERS,J.RUNDLE,B., SHOBAKI,K.WOUTERS, J., COHEN, M., BESKOW, J., STONE, P., CONNORS, P., TARACHOW, A., AND SOLCHER, D. 1999a. New tools for interactive speech and language training: Using animated conversational agents in the classrooms of profoundly deaf children. In Proceedings of ESCA/SOCRATES Workshop on Method and Tool Innovations for Speech Science Education (London). 45-52.Google ScholarGoogle Scholar
  31. COLE, R., SERRIDGE, B., HOSOM,J.P.,CRONK, A., AND KAISER, E. 1999b. A platform for multilingual research in spoken dialogue systems. In Multi- Lingual Interoperability in Speech Technology (MIST) (Leusden, The Netherlands).Google ScholarGoogle Scholar
  32. COLE, R., NOVICK, D., VERMEULEN, P., SUTTON,S., FANTY, M., WESSELS, L., DE VILLIERS, J., SCHALKWYK, J., HANSEN,B.,AND BURNETT, D. 1997. Experiments with a spoken dialogue system for taking the U.S. census. In Speech Communication 23,3, 243-260. Google ScholarGoogle Scholar
  33. CONSTANTINIDES, P., HANSMA, S., TCHOU,C.,AND RUDNICKY, A. 1998. A schema based approach to dialog control. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98, Sydney, Australia), Vol. 2. ICSLP, 409-412.Google ScholarGoogle Scholar
  34. DAHLBACK,N.AND JONSSON A. 1999. Knowledge sources in spoken dialogue systems. In Proceedings of 6th European Conf. on Speech Com-munication and Technology (Eurospeech'99, Budapest, Hungary). ESCA.Google ScholarGoogle Scholar
  35. DALE,R.AND REITER, E. 1995. Computational interpretations of the Gricean maxims in the generation of referring expressions. Cognitive Science 19, 233-263.Google ScholarGoogle Scholar
  36. DANIELI,M.AND GERBINO, E. 1995. Metrics for evaluating dialogue strategies in a spoken language system. In Working Notes of the AAAI Spring Symposium on Empirical Methods on Discourse Interpretation and Generation. AAAI, Stanford, CA, 34-39.Google ScholarGoogle Scholar
  37. DENECKE,M.AND WAIBEL, A. 1997. Dialogue strategies guiding users to their communicative goals. In Proceedings of 5th European Conf. on Speech Communication and Technology (Eurospeech'97, Rhodes, Greece). ESCA.Google ScholarGoogle Scholar
  38. DOWDING, J., GANRON, J. M., APPELT, D., BEAR,J.CHERNY, L., MOORE, R., AND MORAN, D. 1993. Gemini: a natural language system for spoken language understanding. In Proceedings of the 31st Annual Meeting of the ACL. ACL Columbus, OH, 54-61. Google ScholarGoogle Scholar
  39. DYBKJ~R, L., BERNSEN,N.O.,AND DYBKJ~R,H. 1996. Evaluation of spoken dialogue systems. In Proceedings of the Eleventh Twente Workshop on Language Technology (TWLT 11): Dialogue Management in Natural Language Systems, S. LuperFoy and A. Nijholt and G. V. van Zanten, Eds. Universiteit Twente, Enschede, The Netherlands.Google ScholarGoogle Scholar
  40. DYBKJ~R, L., BERNSEN,N.O.,AND DYBKJ~R,H. 1997. Generality and objectivity: central issues in putting a dialogue evaluation tool into practical use. In Interactive Spoken Dialog Systems: Bringing Speech and NLP Together in Real Applications. Proceedings of a Workshop Sponsored by the Association for Computational Linguistics (Madrid, Spain), J. Hirschberg, C. Kamm and M. Walker, Eds. ACL, 17-24. Google ScholarGoogle Scholar
  41. DYBKJ~R,L.BERNSEN.N.O.,AND DYBKJ~R, H. 1998. A methodology for diagnostic evaluation of spoken human-machine interaction. International Journal of Human-Computer Studies 48, 605- 625. Google ScholarGoogle Scholar
  42. ECKERT,W.AND NIEMANN, H. 1994. Semantic analysis in a robust spoken dialog system. In Proceedings of the 3rd International Conference on Spoken Language Processing (Yokohama, Japan). ICSLP, 107-110.Google ScholarGoogle Scholar
  43. ECKERT, W., N~TH, E., NIEMANN, H., AND SCHUKAT- TALAMAZZANI, E. G. 1995. Real users behave weird experiences made collecting large humanmachine dialog corpora. In Proceedings of the ESCA Workshop on Spoken Dialogue Systems,P. Dalsgaard, L. Larsen, L. Boves, and I. Thomsen, Eds. ESCA, Vigso, Denmark, 193-196.Google ScholarGoogle Scholar
  44. EDGINGTON, M., LOWRY, A., JACKSON, P., BREEN,A.P., AND MINNIS, S. 1996a. Overview of current text-to-speech synthesis techniques: Part I-text and linguistic analysis. BT Technology Journal 14, 1, 68-83.Google ScholarGoogle Scholar
  45. EDGINGTON, M., LOWRY, A., JACKSON, P., BREEN,A.P., AND MINNIS, S. 1996b. Overview of current text-to-speech synthesis techniques: Part II- prosody and speech generation. BT Technology Journal 14, 1, 84-99.Google ScholarGoogle Scholar
  46. FERGUSON, G., ALLEN,J.F.,AND MILLER, B. 1996. Trains-95: Towards a mixed-initiative planning assistant. Proceedings of the 3rd International Conference on AI Planning Systems (AIPS-96, Edinburgh, Scotland, UK). 70-77.Google ScholarGoogle Scholar
  47. FRASER N. 1997. Assessment of interactive systems. In Handbook of Standards and Resources for Spoken Language Systems, D. Gibbon, R. Moore, and R. Winski, Eds. Mouton de Gruyter, New York, NY, 564-614.Google ScholarGoogle Scholar
  48. FRASER,N.AND GILBERT, G. N. 1991. Simulating speech systems. Computer Speech and Language 5 81-99.Google ScholarGoogle Scholar
  49. GERBINO,E.AND DANIELI, M. 1993. Managing dialogue in a continuous speech understanding system. In Proceedings of 3rd European Conference on Speech Communication and Technology (Eurospeech'93, Berlin, Germany). ESCA, 1661- 1664.Google ScholarGoogle Scholar
  50. GIACHIN,E.AND MCGLASHAN, S. 1997. Spoken language dialogue systems. In Corpus-Based Methods in Language and Speech Processing, S. Young and G. Bloothooft, Eds. Kluwer Academic Publishers, Dordrecht, The Netherlands, 69-117.Google ScholarGoogle Scholar
  51. GIBBON,D.MOORE, R., AND WINSKI R., EDS. 1997. Handbook of Standards and Resources for Spoken Language Systems. Mouton de Gruyter, New York, NY.Google ScholarGoogle Scholar
  52. GODDEAU, D., MENG, H., POLIFRONI, J., SENEFF,S.,AND BUSAYAPONGCHAI. 1996. A form-based dialogue manager for spoken language applications. In Proceedings of 4th International Conference on Spoken Language Processing (ICSLP'96, Pittsburgh, PA). ICSLP, 701-704.Google ScholarGoogle Scholar
  53. GRICE, P. 1975. Logic and conversation. In Syntax and Semantics Vol. 3: Speech Acts, P. Cole and J. Morgan, Eds. Academic Press, New York, NY, 41-58.Google ScholarGoogle Scholar
  54. GROSZ,B.J.,JOSHI,A.K.,AND WEINSTEIN, S. 1983. Providing a unified account of definite noun phrases in discourse. In Proceedings of the 21st Annual Meeting of the ACL. ACL, Boston, MA, 44-50. Google ScholarGoogle Scholar
  55. GROSZ,B.J.AND SIDNER, C. 1986. Attention, intention, and the structure of discourse. In Computational Linguistics 12, 3, 175-204. Google ScholarGoogle Scholar
  56. HANSEN, B., NOVICK,D.G.,AND SUTTON, S. 1996. Systematic design of spoken prompts. In CHI'96 (Vancouver, B.C., Canada). ACM Press, New York, NY, 157-164. Google ScholarGoogle Scholar
  57. HEEMAN,P.A.AND ALLEN, J. F. 1997. Intonational boundaries, speech repairs, and discourse markers: modeling spoken dialog. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the Association for Computational Linguistics (Madrid, Spain). ACL, 254-261. Google ScholarGoogle Scholar
  58. HEISTERKAMP,P.AND MCGLASHAN, S. 1996. Units of dialogue management: an example. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP'96, Philadelphia, PA). ICSLP, 200-203.Google ScholarGoogle Scholar
  59. HIRSCHMAN, L. 1995. The roles of language processing in a spoken language interface. In Voice Communication Between Humans and Machines, D. Roe and J. Wilpon, Eds. National Academy Press Washington, DC, 217-237. Google ScholarGoogle Scholar
  60. HONE,K.S.AND BABER, C. 1995. Using a simulation method to predict the transaction time effects of applying alternative levels of constraint to utterances within speech interactive dialogues. In Proceedings of the ESCA Work-shop on Spoken Dialogue Systems, P. Dalsgaard, L. Larsen, L. Boves, and I. Thomsen, Eds. ESCA, Vigso, Denmark, 209-212.Google ScholarGoogle Scholar
  61. JAMESON, A. 1989. But what will the user think? Belief ascription and image maintenance in dialog. In User Models in Dialog Systems, A. Kobsa and W. Wahlster, Eds. Springer Verlag, London, UK, 255-312.Google ScholarGoogle Scholar
  62. JURAFSKY,D.AND MARTIN, J. 2000. Speech and Language Processing: An Introduction to Natural Language Processing. Prentice Hall, Englewood Cliffs, NJ. Google ScholarGoogle Scholar
  63. KAISER,E.C.,JOHNSTON, M., AND HEEMAN, P. A. 1999. Profer: Predictive, robust finite-state parsing for spoken language. In Proceedings of ICASSP (Phoenix, AZ), Vol. 2. IEEE, 629-632. Google ScholarGoogle Scholar
  64. KAMM, C. 1995. User interfaces for voice applications. In Voice Communication Between Humans and Machines, D. Roe and J. Wilpon, Eds. National Academy Press, Washington, DC, 34-75. Google ScholarGoogle Scholar
  65. KAMM, C., WALKER, M. A., AND J. LITMAN, D. 1999. Evaluating spoken language systems. In Proceedings of American Voice Input/Output Society (AVIOS). AVIOS.Google ScholarGoogle Scholar
  66. KAPLAN, J. 1983. Cooperative responses from a portable natural language database query system. In Computational Models of Discourse, M. Brady and R. Berwick, Eds. MIT Press, Cambridge, MA, 167-208.Google ScholarGoogle Scholar
  67. KUBALA, F., BARRY, C., BATES, M., BOBROW, R., FUNG, P., INGRIA, R., MAKHOUL, J., NGUYEN, L., SCHWARTZ, R., AND STALLARD, D. 1992. BBNBy-blos and HARC February 1992 ATIS benchmark results. In Proceedings of the DARPA Speech and Natural Language Workshop, Harriman, N.Y. Morgan Kaufmann Publishers, San Mateo, CA, 72-77. Google ScholarGoogle Scholar
  68. LAMEL,L.F.,BENNACEF, S. K., BONNEAU-MAYNARD, H., ROSSETN,S.,AND GAUVAIN, J. L. 1995. Recent developments in spoken language systems for information retrieval. In Proceedings of the ESCA Workshop on Spoken Dialogue Systems, P. Dalsgaard, L. Larsen, L. Boves, and I. Thomsen, Eds. ESCA, Vigso, Denmark, 17- 20.Google ScholarGoogle Scholar
  69. LARSEN,L.B.,AND BAEEKGAARD, A. 1994. Rapid prototyping of a dialogue system using a generic dialogue development platform. In Proceedings of ICSLP'94 (Yokohama, Japan). ICSLP, 919-922.Google ScholarGoogle Scholar
  70. LENNIG, M., BIELBY,G.,AND MASSICOTTE, J. 1995. Directory assistance automation in Bell Canada: Trial results. Speech Communication 17, 227- 234. Google ScholarGoogle Scholar
  71. LITMAN,D.J.AND ALLEN, J. F. 1987. A plan recognition model for subdialogues in conversation. Cognitive Science 11, 163-200.Google ScholarGoogle Scholar
  72. LITMAN,D.J.,KEARNS,M.S.,SINGH,S.,AND WALKER, M. A. 2000. Automatic optimization of dialogue management. In Proceedings of 18th International Conference on Computational Linguistics (COLING-2000, Saarbrucken, Germany). ACL. Google ScholarGoogle Scholar
  73. MAKHOUL,J.AND SCHWARTZ, R. 1995. State of the art in continuous speech recognition. In Voice Communication Between Humans and Machines, D. Roe and J. Wilpon, Eds. National Academy Press, Washington, DC, 165-197. Google ScholarGoogle Scholar
  74. MANN,W.AND THOMPSON, S. 1988. Rhetorical structure theory: toward a functional theory of text organisation. Text 3, 243-281.Google ScholarGoogle Scholar
  75. MARCUS, M. 1995. New trends in natural language processing: statistical natural language processing. In Voice Communication Between Humans and Machines, D. Roe and J. Wilpon, Eds. National Academy Press, Washington, DC, 482- 504. Google ScholarGoogle Scholar
  76. MARTIN, P., CRABBE, F., ADAMS, S., BAATZ, E., AND YANKELOVICH, N. 1996. SpeechActs: a spoken language framework. IEEE Computer 29, 7, 33- 40. Google ScholarGoogle Scholar
  77. MAYBURY, M. T., Ed. 1993. Intelligent Multimedia Interfaces. MIT Press, Cambridge, MA. Google ScholarGoogle Scholar
  78. MCCOY, K. F. 1986. Generating responses to property misconceptions using perspective. In Communication Failure in Dialogue,R. Reilly, Ed. Elsevier Science Publishers North Holland, Amsterdam, The Netherlands, 149- 160.Google ScholarGoogle Scholar
  79. MCGLASHAN, S., BILANGE, E., FRASER, N., GILBERT,N., HEISTERKAMP,P.,AND YOUD, N. 1990. Managing oral dialogues. Research Report, Social and Computer Sciences Research Group, University of Surrey, Surrey, UK.Google ScholarGoogle Scholar
  80. MCKEOWN, K. 1985. Text Generation. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  81. MCROY, S. 1996. Detecting, repairing, and preventing human-machine miscommunication. In AAAI '96 Workshop (Portland, OR).Google ScholarGoogle Scholar
  82. MCTEAR, M. 1998. Modelling spoken dialogues with state transition diagrams: experiences with the CSLU toolkit. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98, Sydney, Australia). ICSLP, 1223-1226.Google ScholarGoogle Scholar
  83. MCTEAR, M. 1999. Using the CSLU toolkit for practicals in spoken dialogue technology. In Proceedings of ESCA/SOCRATES Workshop on Method and Tool Innovations for Speech Science Education (London, UK). ESCA, 113- 116.Google ScholarGoogle Scholar
  84. MCTEAR, M., ALLEN, S., CLATWORTHY, L., ELLISON,N., LAVELLE,C.,AND MCCAFFERY, H. 2000. Integrating flexibility into a structured dialogue model: Some design considerations. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP'2000, Beijing, China), Vol. 1, ICSLP, 110-113.Google ScholarGoogle Scholar
  85. MOORE, R. C. 1995. Integration of speech with natural language understanding. In Voice Communication Between Humans and Machines,D.Roe and J. Wilpon, Eds. National Academy Press, Washington, DC, 254-271. Google ScholarGoogle Scholar
  86. NAGATA,M.AND MORIMOTO, T. 1994. First steps toward statistical modeling of dialogue to predict the speech act type of the next utterance. Speech Communication 15, 193-203. Google ScholarGoogle Scholar
  87. PAGE,J.H.AND BREEN, A. P. 1996. The Laureate text-to-speech system-architecture and applications. BT Technology Journal 14, 1, 57-67.Google ScholarGoogle Scholar
  88. PARIS, C. L. 1989. The use of explicit user models in a generation system for tailoring answers to the user's level of expertise. In User Models in Dialog Systems, A. Kobsa and W. Wahlster, Eds. Springer Verlag, London, UK, 200-232.Google ScholarGoogle Scholar
  89. PECKHAM, J. 1993. A new generation of spoken dialogue systems: Results and lessons from the SUNDIAL project. In Proceedings of 3rd European Conference on Speech Communication and Technology (Eurospeech'93, Berlin, Germany). ESCA, 33-40.Google ScholarGoogle Scholar
  90. PECKHAM, J. n.d. Vocalis develops directory enquiries service with speech recognition for Telia. http://www.callcentres.com.au/ speechr1.htm.Google ScholarGoogle Scholar
  91. PHILIPS SPEECH PROCESSING. 1997. Hddl v2.0- dialog description language-user's guide. Philips Speech Processing, Aachen, Germany.Google ScholarGoogle Scholar
  92. POLLACK, M. 1986. Some requirements for a model of the plan-inference process in conversation. In Communication Failure in Dialogue,R. Reilly, Ed. Elsevier Science Publishers North Holland, Amsterdam, The Netherlands, 245- 256.Google ScholarGoogle Scholar
  93. POTJER, J., RUSSEL, A., BOVES, L., AND OS, E. D. 1996. Subjective and objective evaluation of two types of dialogues in a call assistance service. In IVTTA. IEEE, Basking Ridge, NJ, 121-124.Google ScholarGoogle Scholar
  94. POWER, K. J. 1996. The listening telephoneautomating speech recognition over the PSTN. BT Technology Journal 14, 1, 112-126.Google ScholarGoogle Scholar
  95. PRICE, P. 1996. Spoken language understanding. In Survey of the State of the Art in Human Language Technology, R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue, Eds. Cambridge University Press, Cambridge, UK. Online version: http://cslu.cse.ogi.edu/ HLTsurvey/. Google ScholarGoogle Scholar
  96. PULMAN, S. G. 1996. Semantics. In Survey of the State of the Art in Human Language Technology, R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue, Eds. Cambridge University Press, Cambridge, UK. Online version: http://cslu.cse.ogi.edu/HLTsurvey/. Google ScholarGoogle Scholar
  97. RABINER,L.R.AND JUANG, B. H. 1993. Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ. Google ScholarGoogle Scholar
  98. REITER,E.AND DALE, R. 1997. Building applied natural language generation systems. Natural Language Engineering 3, 1, 57-87. Google ScholarGoogle Scholar
  99. RUDNICKY, A. I., THAYER, E., CONSTANTINIDES,P., TCHOU, C., SHERN, R., LENZO, K., XU,W., AND OH, A. 1999. Creating natural dialogs in the Carnegie Mellon Communicator system. In Proceedings of 6th European Conference on Speech Communication and Technology (Eurospeech'99, Budapest, Hungary). ESCA.Google ScholarGoogle Scholar
  100. SADEK,M.D.,BRETIER,P.,AND PANAGET, F. 1997. ARTIMIS: Natural dialogue meets rational agency. In Proceedings of 15th International Joint Conference on Artificial Intelligence, (IJCAI-97). Morgan Kaufmann Publishers, San Francisco, CA, 1030-1035. Google ScholarGoogle Scholar
  101. SADEK,M.D.AND DE MORI, R. 1998. Dialogue systems. In Spoken Dialogues with Computers, R. de Mori, Ed. Academic Press, London, UK, 523-561.Google ScholarGoogle Scholar
  102. SEARLE, J. R. 1969. Speech Acts: An Essay in the Philosophy of Language. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  103. SENEFF, S. 1992. TINA: A natural language system for spoken language applications. Computational Linguistics 18, 1, 61-86. Google ScholarGoogle Scholar
  104. SHIEBER, S. M. 1986. An Introduction to Unification- based Approaches to Grammar. CSLI Lecture Notes, CSLI, Stanford, CA.Google ScholarGoogle Scholar
  105. SIMPSON,A.AND FRASER, N. 1993. Black box and glass box evaluation of the SUNDIAL system. In Proceedings of 3rd European Conference on Speech Communication and Technology (Eurospeech'93, Berlin, Germany). ESCA, 33- 40.Google ScholarGoogle Scholar
  106. SJOLANDER, K., BESKOW, J., GUSTAFSON, J., LEWIN, E., CARLSON, R., AND GRANSTR~M, B. 1998. Web-based educational tools for speech technology. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98, Sydney, Australia). ICSLP, 3217- 3220.Google ScholarGoogle Scholar
  107. SMITH,R.AND GORDON, S. A. 1997. Effects of variable initiative on linguistic behavior in human-computer spoken natural language dialogue. Computational Linguistics 23, 1, 141- 168. Google ScholarGoogle Scholar
  108. SMITH R. AND HIPP D. R. 1994 Spoken Natural Language Dialog Systems: A Practical Approach. Oxford University Press, New York, NY. Google ScholarGoogle Scholar
  109. SMITH, R. W. 1997. Performance measures for the next generation of spoken natural language dialog systems. In Interactive Spoken Dialog Systems: Bringing Speech and NLP Together in Real Applications. Proceedings of a Workshop Sponsored by the Association for Computational Linguistics (Madrid, Spain), J. Hirschberg, C. Kamm, and M. Walker, Eds. ACL, 37-40. Google ScholarGoogle Scholar
  110. STALLARD,D.AND BOBROW, R. 1992. Fragment processing in the DELPHI system. In Proceedings of the Speech and Natural Language Workshop, Harriman, N.Y. Morgan Kaufmann Publishers, San Mateo, CA, 305-310. Google ScholarGoogle Scholar
  111. STRIK, H., RUSSEL, A., VAN DEN HEUVEL, H., CUCCHIARINI, C., AND BOVES, L. 1996. Localizing an au-tomatic inquiry system for public transport information. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP'96, Philadephia, PA), Vol. 2, ICSLP, 24-31.Google ScholarGoogle Scholar
  112. SUTTON, S., COLE, R., DE VILLIERS, J., SCHALKWYK,J., VERMEULEN, P., MACON, M., YAN, Y., KAISER, E., RUNDLE, B., SHOBAKI, K., HOSOM,J.P.,KAIN, A., WOUTERS, J., MASSARO,D.,AND COHEN,M. 1998. Universal speech tools: The CSLU toolkit. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98, Sydney, Australia). ICSLP, 3221- 3224.Google ScholarGoogle Scholar
  113. SUTTON, S., HANSEN, B., LANDER, T., NOVICK,D.G., AND COLE, R. 1995. Evaluating the effectiveness of dialogue for an automated spoken questionnaire. Tech. Rep. CS/E95-12, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology.Google ScholarGoogle Scholar
  114. TRAUM, D. R. 1996. Conversational agency: the TRAINS-93 dialogue manager. In Proceedings of the eleventh Twente Workshop on Language Technology (TWLT 11): Dialogue Management in Natural Language Systems, S. LuperFoy, A. Nijholt, and G. V. van Zanten, Eds. Universiteit Twente, Enschede, The Netherlands.Google ScholarGoogle Scholar
  115. TRAUM,D.R.AND ALLEN, J. F. 1994. Discourse obligations in dialogue processing. In Proceedings of the 32nd Annual General Meeting of the Association for Computational Linguistics (Las Cruces, NM). ACL, 1-8. Google ScholarGoogle Scholar
  116. USZKOREIT,H.AND ZAENEN, A. 1996. Grammar formalisms. In Survey of the State of the Art in Human Language Technology,R.A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, and V. Zue, Eds. Cambridge University Press, Cambridge, UK. Online version: http://cslu. cse.ogi.edu/HLTsurvey/. Google ScholarGoogle Scholar
  117. VERGEYNST, N. A., EDWARDS, K., FOSTER,J.C.,AND JACK, M. A. 1993. Spoken dialogues for humancomputer interaction over the telephone: complexity measures. In Proceedings of 3rd European Conference on Speech Communication and Technology (Eurospeech'93, Berlin, Germany). ESCA, 1415-1418.Google ScholarGoogle Scholar
  118. VOICEXMLFORUM. n.d. http://www.voicexml.org.Google ScholarGoogle Scholar
  119. WAHLSTER,W.AND KOBSA, A. 1989. User models in dialog systems. User Models in Dialog Systems, A. Kobsa and W. Wahlster, Eds. Springer Verlag, London, UK, 4-24. Google ScholarGoogle Scholar
  120. WALKER, M. A. 1989. Evaluating discourse processing algorithms. In Proceedings of the 27th Annual General Meeting of the Association for Computational Linguistics (Vancouver, B.C., Canada). ACL, 251-261. Google ScholarGoogle Scholar
  121. WALKER, M. A., LITMAN, D., KAMM,C.,AND ABELLA,A. 1997. PARADISE: a general framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual General Meeting of the Association for Computational Linguistics, ACL/EACL (Madrid, Spain). ACL, 271-280. Google ScholarGoogle Scholar
  122. WALKER, M. A., LITMAN, D., KAMM,C.,AND ABELLA,A. 1998. Evaluating spoken dialogue agents with PARADISE: Two case studies. Computer Speech and Language 12, 3, 317-347.Google ScholarGoogle Scholar
  123. WARD,W.AND PELLOM, B. 1999. The CU Communicator system. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding IEEE, (Keystone, CO). IEEE.Google ScholarGoogle Scholar
  124. WHITTAKER,S.J.AND ATTWATER, D. J. 1996. The design of complex telephony applications using large vocabulary speech technology. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP-96, Philadephia, PA). ICSLP, 705-708.Google ScholarGoogle Scholar
  125. WRIGHT, J., GORIN, A., AND ABELLA, A. 1998. Spoken language understanding within dialogs using a graphical model of task structure. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98, Sydney, Australia), Vol. 5. ECSLP.Google ScholarGoogle Scholar
  126. WYARD,P.J.,SIMONS,A.D.,APPELBY, S., KANEEN, E., WILLIAMS,S.H.,AND PRESTON, K. R. 1996. Spoken language systems-beyond prompt and response. BT Technology Journal 14, 1, 187- 205.Google ScholarGoogle Scholar
  127. YANKELOVICH, N. n.d. Using natural dialogues as the basis for speech interface design. In Automated Spoken Dialog Systems, S. Luperfoy, Ed. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  128. YANKELOVICH, N., LEVOW,G.,AND MARX,M. 1995. Designing SpeechActs: issues in speech user interfaces. In Proceedings of CHI95. Addison-Wesley, Reading, MA. 369- 375. Google ScholarGoogle Scholar
  129. YOUNG,S.AND BLOOTHOOFT,G.EDS. 1997. Corpus- Based Methods in Language and Speech Processing. Kluwer Academic Publishers, Dordrecht, The Netherlands.Google ScholarGoogle Scholar
  130. YOUNG, S. R., HAUPTMANN,A.G.,WARD, W. H., SMITH, E. T., AND WERNER, P. 1989. High level knowledge sources in usable speech recognition systems. Communications of the ACM 32, 2, 183- 194. Google ScholarGoogle Scholar

Recommendations

Reviews

D.C. Charles Hair

A comprehensive survey of current research and development in the area of spoken dialogue technology is presented in this paper. Included as components of that technology are speech recognition; language understanding; dialogue management; communication with external sources such as databases; language generation; and speech synthesis. The paper emphasizes recent advances both within spoken dialogue technology and in underlying computer technology. There is a focus on surveying underlying technologies rather than existing spoken dialogue systems. Section 1 of the paper gives an overview and an introduction to the subject. Section 2 is used to define the subject and to set forth a classification system. Three examples are given in section 3 to illustrate control strategies used in spoken dialogue systems. In section 4, there is an overview of the components used in spoken dialogue systems. It is noted that spoken dialogue systems are complicated systems that need to integrate these components, where each component itself is a major area of research interest. Section 5 describes methods of dialogue control. Section 6 reviews current trends in the specification, design, and evaluation of spoken dialogue systems. Section 7 describes toolkits that can be used in developing these complex systems. Finally, section 8 discusses future directions. The paper also includes a comprehensive list of related Web sites, and an extensive reference list. The technology surveyed is important and interesting. The paper itself is well organized and clearly written. This work should be invaluable to anyone with an interest in spoken dialogue systems. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader