ABSTRACT
Dialogue systems help various real applications interact with humans in an intelligent natural way. In dialogue systems, the task of dialogue generation aims to generate utterances given previous utterances as contexts. Among various spectrums of dialogue generation approaches, end-to-end neural generation models have received an increase of attention. These end-to-end neural generation models are capable of generating natural-sounding sentences with a unified neural encoder-decoder network structure. The end-to-end structure sequentially encodes each word in an input context and generates the response word-by-word deterministically during decoding. However, lack of variation and limited ability in capturing long-term dependencies between utterances still challenge existing approaches. In this paper, we propose a novel hierarchical variational memory network (HVMN), by adding the hierarchical structure and the variational memory network into a neural encoder-decoder network. By emulating human-to-human dialogues, our proposed method can capture both the high-level abstract variations and long-term memories during dialogue tracking, which enables the random access of relevant dialogue histories. Extensive experiments conducted on three large real-world datasets verify a significant improvement of our proposed model against state-of-the-art baselines for dialogue generation.
- D. Ameixa, L. Coheur, P. Fialho, and P. Quaresma. Luke, I am Your Father: Dealing with Out-of-Domain Requests by Using Movies Subtitles. Springer International Publishing, 2014.Google Scholar
- R. E. Banchs and H. Li. Iris: a chat-oriented dialogue system based on the vector space model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 37--42, 2013. Google ScholarDigital Library
- J. Bayer and C. Osendorfer. Learning stochastic recurrent networks. In NIPS, Workshop on Advances in Variational Inference, 2014.Google Scholar
- A. Bordes and J. Weston. Learning end-to-end goal-oriented dialog. In Proceedings of the 5th International Conference on Learning Representations, 2017.Google Scholar
- S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio. Generating sentences from a continuous space. In Proceedings of 20th SIGNLL Conference on Computational Natural Language Learning, pages 10--21, 2015.Google Scholar
- K. Cao and S. Clark. Latent variable dialogue models and their diversity. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 182--187, 2017.Google ScholarCross Ref
- H. Chen, X. Liu, D. Yin, and J. Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 19 (2), 2017. Google ScholarDigital Library
- J. Cheng, L. Dong, and M. Lapata. Long short-term memory-networks for machine reading. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 551--561, 2016.Google ScholarCross Ref
- K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724--1734, Doha, Qatar, October 2014. Association for Computational Linguistics.Google ScholarCross Ref
- J. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio. A recurrent latent variable model for sequential data. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2980--2988, 2015. Google ScholarDigital Library
- G. Forgues, J. Pineau, J.-M. Larchevêque, and R. Tremblay. Bootstrapping dialog systems with word embeddings. In NIPS, Modern Machine Learning and Natural Language Processing Workshop, 2014.Google Scholar
- M. Ghazvininejad, C. Brockett, M.-W. Chang, B. Dolan, J. Gao, W.-t. Yih, and M. Galley. A knowledge-grounded neural conversation model. arXiv preprint arXiv:1702.01932, 2017.Google Scholar
- D. Graff and K. Chen. Chinese gigaword. LDC Catalog No.: LDC2003T09, ISBN, 1: 58563--58230, 2005.Google Scholar
- A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.Google Scholar
- K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra. Draw: A recurrent neural network for image generation. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, pages 1462--1471, 2015. Google ScholarDigital Library
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9 (8): 1735--1780, 1997. Google ScholarDigital Library
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.Google Scholar
- D. P. Kingma and M. Welling. Auto-encoding variational bayes. ICLR, 2014.Google Scholar
- D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling. Semi-supervised learning with deep generative models. Advances in Neural Information Processing Systems, 4: 3581--3589, 2014. Google ScholarDigital Library
- J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, pages 110--119, 2016 a.Google ScholarCross Ref
- Li, Galley, Brockett, Spithourakis, Gao, and Dolan}li2016bJ. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, and B. Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 994--1003, 2016 b.Google ScholarCross Ref
- J. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, and B. Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 994--1003, 2016.Google ScholarCross Ref
- P. Li, Z. Wang, W. Lam, Z. Ren, and L. Bing. Salience estimation via variational auto-encoders for multi-document summarization. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pages 3497--3503, 2017.Google Scholar
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In S. S. Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, Barcelona, Spain, July 2004. Association for Computational Linguistics.Google Scholar
- C. W. Liu, R. Lowe, I. Serban, M. Noseworthy, L. Charlin, and J. Pineau. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Conference on Empirical Methods in Natural Language Processing, pages 2122--2132, 2016.Google ScholarCross Ref
- R. Lowe, N. Pow, I. Serban, and J. Pineau. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285--294, 2015.Google ScholarCross Ref
- J. Mitchell and M. Lapata. Vector-based models of semantic composition. In Proceedings of The 46th Annual Meeting of the Association for Computational Linguistics, pages 236--244, 2008.Google Scholar
- K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Meeting on Association for Computational Linguistics, pages 311--318, 2002. Google ScholarDigital Library
- Z. Ren, H. Song, P. Li, S. Liang, J. Ma, and M. de Rijke. Using sparse coding for answer summarization in non-factoid community question-answering. In SIGIR Workshop: Web Question Answering, Beyond Factoids, 2016.Google Scholar
- A. Ritter, C. Cherry, and W. B. Dolan. Data-driven response generation in social media. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 583--593, 2011. Google ScholarDigital Library
- V. Rus and M. Lintean. A comparison of greedy and optimal assessment of natural language student input using word-to-word similarity metrics. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 157--162, 2012. Google ScholarDigital Library
- I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017.Google Scholar
- I. V. Serban, A. Sordoni, Y. Bengio, A. C. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 3776--3784, 2016. Google ScholarDigital Library
- L. Shang, Z. Lu, and H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 1577--1586, 2015.Google ScholarCross Ref
- B. A. Shawar and E. Atwell. Chatbots: are they really useful? Ldv Forum, 22 (1): 29--49, 2007.Google Scholar
- H. Song, Z. Ren, S. Liang, P. Li, J. Ma, and M. de Rijke. Summarizing answers in non-factoid community question-answering. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pages 405--414, 2017. Google ScholarDigital Library
- A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Simonsen, and J.-Y. Nie. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 553--562, 2015 a. Google ScholarDigital Library
- A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J.-Y. Nie, J. Gao, and B. Dolan. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, pages 196--205, 2015.Google ScholarCross Ref
- Sukhbaatar, Weston, Fergus, et al.}sukhbaatar2015endS. Sukhbaatar, J. Weston, R. Fergus, et al. End-to-end memory networks. In Advances in neural information processing systems, pages 2440--2448, 2015. Google ScholarDigital Library
- S. Sukhbaatar, J. Weston, R. Fergus, et al. End-to-end memory networks. In Advances in neural information processing systems, pages 2440--2448, 2015. Google ScholarDigital Library
- O. Vinyals and Q. Le. A neural conversational model. In ICML Deep Learning Workshop, 2015.Google Scholar
- M. Wang, Z. Lu, H. Li, and Q. Liu. Memory-enhanced decoder for neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 278--286, Austin, Texas, November 2016. Association for Computational Linguistics.Google ScholarCross Ref
- Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 496--505, 2017.Google ScholarCross Ref
- S. Young, M. Gavsić, B. Thomson, and J. D. Williams. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101 (5): 1160--1179, 2013.Google ScholarCross Ref
- Y. Zhang and S. Clark. Syntactic processing using the generalized perceptron and beam search. Computational linguistics, 37 (1): 105--151, 2011. Google ScholarDigital Library
Index Terms
- Hierarchical Variational Memory Network for Dialogue Generation
Recommendations
Explicit State Tracking with Semi-Supervisionfor Neural Dialogue Generation
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementThe task of dialogue generation aims to automatically provide responses given previous utterances. Tracking dialogue states is an important ingredient in dialogue generation for estimating users' intention. However, the expensive nature of state ...
Ranking Enhanced Dialogue Generation
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementHow to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation. Previous works usually employ various neural network architectures (e.g., recurrent neural networks, attention mechanisms, and hierarchical ...
Interpretation and generation of dialogue with multidimensional context models
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issuesThis paper presents a context-based approach to the analysis and computational modeling of communicative behaviour in dialogue. This approach, known as Dynamic Interpretation Theory (DIT), claims that dialogue behaviour is multifunctional, i.e. ...
Comments