Hierarchical Variational Memory Network for Dialogue Generation

Authors:
Hongshen Chen

JD.com, Beijing, China

JD.com, Beijing, China
View Profile

,
Zhaochun Ren

JD.com, Beijing, China

JD.com, Beijing, China
View Profile

,
Jiliang Tang

Michigan State University, East Lansing, MI, USA

Michigan State University, East Lansing, MI, USA
View Profile

,
Yihong Eric Zhao

JD.com, Beijing, CA, China

JD.com, Beijing, CA, China
View Profile

,
Dawei Yin

JD.com, Beijing, China

JD.com, Beijing, China
View Profile

WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018Pages 1653–1662https://doi.org/10.1145/3178876.3186077

Published:10 April 2018Publication History

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 1653–1662

ABSTRACT

Dialogue systems help various real applications interact with humans in an intelligent natural way. In dialogue systems, the task of dialogue generation aims to generate utterances given previous utterances as contexts. Among various spectrums of dialogue generation approaches, end-to-end neural generation models have received an increase of attention. These end-to-end neural generation models are capable of generating natural-sounding sentences with a unified neural encoder-decoder network structure. The end-to-end structure sequentially encodes each word in an input context and generates the response word-by-word deterministically during decoding. However, lack of variation and limited ability in capturing long-term dependencies between utterances still challenge existing approaches. In this paper, we propose a novel hierarchical variational memory network (HVMN), by adding the hierarchical structure and the variational memory network into a neural encoder-decoder network. By emulating human-to-human dialogues, our proposed method can capture both the high-level abstract variations and long-term memories during dialogue tracking, which enables the random access of relevant dialogue histories. Extensive experiments conducted on three large real-world datasets verify a significant improvement of our proposed model against state-of-the-art baselines for dialogue generation.

References

D. Ameixa, L. Coheur, P. Fialho, and P. Quaresma. Luke, I am Your Father: Dealing with Out-of-Domain Requests by Using Movies Subtitles. Springer International Publishing, 2014.Google Scholar
R. E. Banchs and H. Li. Iris: a chat-oriented dialogue system based on the vector space model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 37--42, 2013. Google ScholarDigital Library
J. Bayer and C. Osendorfer. Learning stochastic recurrent networks. In NIPS, Workshop on Advances in Variational Inference, 2014.Google Scholar
A. Bordes and J. Weston. Learning end-to-end goal-oriented dialog. In Proceedings of the 5th International Conference on Learning Representations, 2017.Google Scholar
S. R. Bowman, L. Vilnis, O. Vinyals, A. M. Dai, R. Jozefowicz, and S. Bengio. Generating sentences from a continuous space. In Proceedings of 20th SIGNLL Conference on Computational Natural Language Learning, pages 10--21, 2015.Google Scholar
K. Cao and S. Clark. Latent variable dialogue models and their diversity. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 182--187, 2017.Google ScholarCross Ref
H. Chen, X. Liu, D. Yin, and J. Tang. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explorations Newsletter, 19 (2), 2017. Google ScholarDigital Library
J. Cheng, L. Dong, and M. Lapata. Long short-term memory-networks for machine reading. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 551--561, 2016.Google ScholarCross Ref
K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder--decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724--1734, Doha, Qatar, October 2014. Association for Computational Linguistics.Google ScholarCross Ref
J. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio. A recurrent latent variable model for sequential data. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 2980--2988, 2015. Google ScholarDigital Library
G. Forgues, J. Pineau, J.-M. Larchevêque, and R. Tremblay. Bootstrapping dialog systems with word embeddings. In NIPS, Modern Machine Learning and Natural Language Processing Workshop, 2014.Google Scholar
M. Ghazvininejad, C. Brockett, M.-W. Chang, B. Dolan, J. Gao, W.-t. Yih, and M. Galley. A knowledge-grounded neural conversation model. arXiv preprint arXiv:1702.01932, 2017.Google Scholar
D. Graff and K. Chen. Chinese gigaword. LDC Catalog No.: LDC2003T09, ISBN, 1: 58563--58230, 2005.Google Scholar
A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.Google Scholar
K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra. Draw: A recurrent neural network for image generation. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, pages 1462--1471, 2015. Google ScholarDigital Library
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9 (8): 1735--1780, 1997. Google ScholarDigital Library
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.Google Scholar
D. P. Kingma and M. Welling. Auto-encoding variational bayes. ICLR, 2014.Google Scholar
D. P. Kingma, D. J. Rezende, S. Mohamed, and M. Welling. Semi-supervised learning with deep generative models. Advances in Neural Information Processing Systems, 4: 3581--3589, 2014. Google ScholarDigital Library
J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, pages 110--119, 2016 a.Google ScholarCross Ref
Li, Galley, Brockett, Spithourakis, Gao, and Dolan}li2016bJ. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, and B. Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 994--1003, 2016 b.Google ScholarCross Ref
J. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, and B. Dolan. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 994--1003, 2016.Google ScholarCross Ref
P. Li, Z. Wang, W. Lam, Z. Ren, and L. Bing. Salience estimation via variational auto-encoders for multi-document summarization. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pages 3497--3503, 2017.Google Scholar
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In S. S. Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, Barcelona, Spain, July 2004. Association for Computational Linguistics.Google Scholar
C. W. Liu, R. Lowe, I. Serban, M. Noseworthy, L. Charlin, and J. Pineau. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Conference on Empirical Methods in Natural Language Processing, pages 2122--2132, 2016.Google ScholarCross Ref
R. Lowe, N. Pow, I. Serban, and J. Pineau. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285--294, 2015.Google ScholarCross Ref
J. Mitchell and M. Lapata. Vector-based models of semantic composition. In Proceedings of The 46th Annual Meeting of the Association for Computational Linguistics, pages 236--244, 2008.Google Scholar
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. Bleu: a method for automatic evaluation of machine translation. In Meeting on Association for Computational Linguistics, pages 311--318, 2002. Google ScholarDigital Library
Z. Ren, H. Song, P. Li, S. Liang, J. Ma, and M. de Rijke. Using sparse coding for answer summarization in non-factoid community question-answering. In SIGIR Workshop: Web Question Answering, Beyond Factoids, 2016.Google Scholar
A. Ritter, C. Cherry, and W. B. Dolan. Data-driven response generation in social media. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 583--593, 2011. Google ScholarDigital Library
V. Rus and M. Lintean. A comparison of greedy and optimal assessment of natural language student input using word-to-word similarity metrics. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pages 157--162, 2012. Google ScholarDigital Library
I. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, 2017.Google Scholar
I. V. Serban, A. Sordoni, Y. Bengio, A. C. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 3776--3784, 2016. Google ScholarDigital Library
L. Shang, Z. Lu, and H. Li. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 1577--1586, 2015.Google ScholarCross Ref
B. A. Shawar and E. Atwell. Chatbots: are they really useful? Ldv Forum, 22 (1): 29--49, 2007.Google Scholar
H. Song, Z. Ren, S. Liang, P. Li, J. Ma, and M. de Rijke. Summarizing answers in non-factoid community question-answering. In Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pages 405--414, 2017. Google ScholarDigital Library
A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Simonsen, and J.-Y. Nie. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 553--562, 2015 a. Google ScholarDigital Library
A. Sordoni, M. Galley, M. Auli, C. Brockett, Y. Ji, M. Mitchell, J.-Y. Nie, J. Gao, and B. Dolan. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics, pages 196--205, 2015.Google ScholarCross Ref
Sukhbaatar, Weston, Fergus, et al.}sukhbaatar2015endS. Sukhbaatar, J. Weston, R. Fergus, et al. End-to-end memory networks. In Advances in neural information processing systems, pages 2440--2448, 2015. Google ScholarDigital Library
S. Sukhbaatar, J. Weston, R. Fergus, et al. End-to-end memory networks. In Advances in neural information processing systems, pages 2440--2448, 2015. Google ScholarDigital Library
O. Vinyals and Q. Le. A neural conversational model. In ICML Deep Learning Workshop, 2015.Google Scholar
M. Wang, Z. Lu, H. Li, and Q. Liu. Memory-enhanced decoder for neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 278--286, Austin, Texas, November 2016. Association for Computational Linguistics.Google ScholarCross Ref
Y. Wu, W. Wu, C. Xing, M. Zhou, and Z. Li. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 496--505, 2017.Google ScholarCross Ref
S. Young, M. Gavsić, B. Thomson, and J. D. Williams. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101 (5): 1160--1179, 2013.Google ScholarCross Ref
Y. Zhang and S. Clark. Syntactic processing using the generalized perceptron and beam search. Computational linguistics, 37 (1): 105--151, 2011. Google ScholarDigital Library

Index Terms

Hierarchical Variational Memory Network for Dialogue Generation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics

Recommendations

Explicit State Tracking with Semi-Supervisionfor Neural Dialogue Generation
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

The task of dialogue generation aims to automatically provide responses given previous utterances. Tracking dialogue states is an important ingredient in dialogue generation for estimating users' intention. However, the expensive nature of state ...
Read More
Ranking Enhanced Dialogue Generation
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

How to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation. Previous works usually employ various neural network architectures (e.g., recurrent neural networks, attention mechanisms, and hierarchical ...
Read More
Interpretation and generation of dialogue with multidimensional context models
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues

This paper presents a context-based approach to the analysis and computational modeling of communicative behaviour in dialogue. This approach, known as Dynamic Interpretation Theory (DIT), claims that dialogue behaviour is multifunctional, i.e. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 10 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dialogue generation
hierarchical variational memory network
recurrent encoder-decoder model
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 2,846
  Total Downloads
- Downloads (Last 12 months)136
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Hierarchical Variational Memory Network for Dialogue Generation

WWW '18: Proceedings of the 2018 World Wide Web Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Explicit State Tracking with Semi-Supervisionfor Neural Dialogue Generation

Ranking Enhanced Dialogue Generation

Interpretation and generation of dialogue with multidimensional context models