ABSTRACT
Dialogue Act Recognition (DAR) is a challenging problem in dialogue interpretation, which aims to associate semantic labels to utterances and characterize the speaker's intention. Currently, many existing approaches formulate the DAR problem ranging from multi-classification to structured prediction, which suffer from handcrafted feature extensions and attentive contextual dependencies. In this paper, we tackle the problem of DAR from the viewpoint of extending richer Conditional Random Field (CRF) structured dependencies without abandoning end-to-end training. We incorporate hierarchical semantic inference with memory mechanism on the utterance modeling at multiple levels. We then utilize the structured attention network on the linear-chain CRF to dynamically separate the utterances into cliques. The extensive experiments on two primary benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder Dialogue Act (MRDA) datasets show that our method achieves better performance than other state-of-the-art solutions to the problem.
- Miltiadis Allamanis, Hao Peng, and Charles Sutton . 2016. A convolutional attention network for extreme summarization of source code International Conference on Machine Learning. 2091--2100.Google Scholar
- Jeremy Ang, Yang Liu, and Elizabeth Shriberg . 2005. Automatic Dialog Act Segmentation and Classification in Multiparty Meetings. In ICASSP.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2014. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science (2014).Google Scholar
- Phil Blunsom and Nal Kalchbrenner . 2013. Recurrent Convolutional Neural Networks for Discourse Compositionality Proceedings of the 2013 Workshop on Continuous Vector Space Models and their Compositionality. Proceedings of the 2013 Workshop on Continuous Vector Space Models and their Compositionality.Google Scholar
- Kristy Elizabeth Boyer, Eunyoung Ha, Michael D Wallis, Robert Phillips, Mladen A Vouk, and James C Lester . 2009. Discovering Tutorial Dialogue Strategies with Hidden Markov Models. AIED. 141--148. Google ScholarDigital Library
- Yun-Nung Chen, William Yang Wang, and Alexander I Rudnicky . 2013. An empirical investigation of sparse log-linear models for improved dialogue act classification. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 8317--8321.Google Scholar
- Zheqian Chen, Rongqin Yang, Bin Cao, Zhou Zhao, Deng Cai, and Xiaofei He . 2017. Smarnet: Teaching Machines to Read and Comprehend Like Human. (2017).Google Scholar
- Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio . 2014. On the Properties of Neural Machine Translation: Encoder--Decoder Approaches. Syntax, Semantics and Structure in Statistical Translation (2014), 103.Google Scholar
- Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov . {n. d.}. Gated-Attention Readers for Text Comprehension. (. {n. d.}).Google Scholar
- Orhan Firat, Kyunghyun Cho, and Yoshua Bengio . 2016. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism Proceedings of NAACL-HLT. 866--875.Google Scholar
- Michel Galley . 2006. A skip-chain conditional random field for ranking meeting utterances by importance Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 364--372. Google ScholarDigital Library
- Jeroen Geertzen, Volha Petukhova, and Harry Bunt . 2007. A multidimensional approach to utterance segmentation and dialogue act classification. In Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, Antwerp. 140--149.Google Scholar
- Sergio Grau, Emilio Sanchis, Maria Jose Castro, and David Vilar . 2004. Dialogue act classification using a Bayesian approach 9th Conference Speech and Computer.Google Scholar
- Ryuichiro Higashinaka, Kenji Imamura, Toyomi Meguro, Chiaki Miyazaki, Nozomi Kobayashi, Hiroaki Sugiyama, Toru Hirano, Toshiro Makino, and Yoshihiro Matsuo . 2014. Towards an open-domain conversational system fully based on natural language processing.. In COLING. 928--939.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber . 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Zhiheng Huang, Wei Xu, and Kai Yu . {n. d.}. Bidirectional LSTM-CRF Models for Sequence Tagging. (. {n. d.}).Google Scholar
- Yangfeng Ji, Gholamreza Haffari, and Jacob Eisenstein . 2016 a. A Latent Variable Recurrent Neural Network for Discourse-Driven Language Models HLT-NAACL.Google Scholar
- Yangfeng Ji, Gholamreza Haffari, and Jacob Eisenstein . 2016 b. A Latent Variable Recurrent Neural Network for Discourse Relation Language Models Proceedings of NAACL-HLT. 332--342.Google Scholar
- Nal Kalchbrenner and Phil Blunsom . 2013. Recurrent Convolutional Neural Networks for Discourse Compositionality. ACL 2013 (2013), 119.Google Scholar
- Hamed Khanpour, Nishitha Guntakandla, and Rodney Nielsen . 2016. Dialogue Act Classification in Domain-Independent Conversations Using a Deep Recurrent Neural Network. In COLING.Google Scholar
- Yoon Kim, Carl Denton, and Luong Hoang Alexander M Rush . {n. d.}. Structured Attention Networks. (. {n. d.}).Google Scholar
- Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush . 2016. Character-Aware Neural Language Models.. In AAAI. 2741--2749. Google ScholarDigital Library
- Harshit Kumar, Arvind Agarwal, Riddhiman Dasgupta, Sachindra Joshi, and Arun Kumar . 2017 a. Dialogue Act Sequence Labeling using Hierarchical encoder with CRF. (2017).Google Scholar
- Harshit Kumar, Arvind Agarwal, Riddhiman Dasgupta, Sachindra Joshi, and Arun Kumar . 2017 b. Dialogue Act Sequence Labeling using Hierarchical encoder with CRF. CoRR Vol. abs/1709.04250 (2017).Google Scholar
- Ji Young Lee and Franck Dernoncourt . 2016 a. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks HLT-NAACL.Google Scholar
- Ji Young Lee and Franck Dernoncourt . 2016 b. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks Proceedings of NAACL-HLT. 515--520.Google Scholar
- Piroska Lendvai and Jeroen Geertzen . 2007. Token-based chunking of turn-internal dialogue act sequences Proceedings of the 8th SIGDIAL Workshop on Discourse and Dialogue. 174--181.Google Scholar
- Fei Liu, Timothy Baldwin, and Trevor Cohn . 2017. Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields. (2017).Google Scholar
- Minh-Thang Luong, Hieu Pham, and Christopher D Manning . {n. d.}. Effective Approaches to Attention-based Neural Machine Translation. (. {n. d.}).Google Scholar
- Xuezhe Ma and Eduard Hovy . {n. d.}. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. (. {n. d.}).Google Scholar
- Dmitrijs Milajevs and Matthew Purver . 2014. Investigating the contribution of distributional semantic information for dialogue act classification. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC). 40--47.Google ScholarCross Ref
- Boyuan Pan, Hao Li, Zhou Zhao, Bin Cao, Deng Cai, and Xiaofei He . 2017. MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension. (2017).Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning . 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
- Norbert Reithinger and Martin Klesen . 1997. Dialogue act classification using language models. EuroSpeech.Google Scholar
- Alexander M Rush, SEAS Harvard, Sumit Chopra, and Jason Weston . {n. d.}. A Neural Attention Model for Sentence Summarization. (. {n. d.}).Google Scholar
- Riccardo Serafin, Barbara Di Eugenio, and Michael Glass . 2003. Latent Semantic Analysis for dialogue act classification Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers-Volume 2. Association for Computational Linguistics, 94--96. Google ScholarDigital Library
- Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov . 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research Vol. 15, 1 (2014), 1929--1958. Google ScholarDigital Library
- Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin, Carol Van Ess-Dykema, and Marie Meteer . 2006. Dialogue act modeling for automatic tagging and recognition of conversational speech. Dialogue Vol. 26, 3 (2006).Google Scholar
- Dinoj Surendran and Gina-Anne Levow . 2006. Dialog act tagging with support vector machines and hidden Markov models. Interspeech.Google Scholar
- Maryam Tavafi, Yashar Mehdad, Shafiq Joty, Giuseppe Carenini, and Raymond Ng . {n. d.}. Dialogue Act Recognition in Synchronous and Asynchronous Conversations. (. {n. d.}).Google Scholar
- A. Viterbi . 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans.informat.theory Vol. 13, 2 (1967), 260--269. Google ScholarDigital Library
- Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan Liu . 2016. Relation Classification via Multi-Level Attention CNNs. ACL (1).Google Scholar
- Nick Webb, Mark Hepple, and Yorick Wilks . 2005. Dialogue act classification based on intra-utterance features Proceedings of the AAAI Workshop on Spoken Language Understanding, Vol. Vol. 4. 5.Google Scholar
- Hongyang Xue, Zhou Zhao, and Deng Cai . 2017. Unifying the Video and Question Attentions for Open-Ended Video Question Answering. IEEE Transactions on Image Processing Vol. 26 (2017), 5656--5666.Google ScholarDigital Library
- Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J Smola, and Eduard H Hovy . 2016. Hierarchical Attention Networks for Document Classification. HLT-NAACL. 1480--1489.Google Scholar
- Matthew D Zeiler . {n. d.}. ADADELTA: AN ADAPTIVE LEARNING RATE METHOD. (. {n. d.}).Google Scholar
- Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi . 2017 b. Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders. In ACL.Google Scholar
- Zhou Zhao, Hanqing Lu, Vincent Wenchen Zheng, Deng Cai, Xiaofei He, and Yueting Zhuang . 2017 a. Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning AAAI.Google Scholar
- Zhou Zhao, Qifan Yang, Hanqing Lu, Tim Weninger, Deng Cai, Xiaofei He, and Yueting Zhuang . 2018. Social-Aware Movie Recommendation via Multimodal Network Learning. IEEE Transactions on Multimedia Vol. 20 (2018), 430--440. Google ScholarDigital Library
- Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu . 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. Vol. 2. 207--212.Google ScholarCross Ref
- Yucan Zhou, Qinghua Hu, Jie Liu, and Yuan Jia . 2015. Combining heterogeneous deep neural networks with conditional random fields for Chinese dialogue act recognition. Neurocomputing Vol. 168 (2015), 408--417. Google ScholarDigital Library
- Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai . 2017. What to Do Next: Modeling User Behaviors by Time-LSTM IJCAI. Google ScholarDigital Library
- Matthias Zimmermann . 2009. Joint segmentation and classification of dialog acts using conditional random fields Tenth Annual Conference of the International Speech Communication Association.Google Scholar
Index Terms
- Dialogue Act Recognition via CRF-Attentive Structured Network
Recommendations
User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems
WWW '22: Proceedings of the ACM Web Conference 2022User Satisfaction Estimation (USE) is an important yet challenging task in goal-oriented conversational systems. Whether the user is satisfied with the system largely depends on the fulfillment of the user’s needs, which can be implicitly reflected by ...
Contextual Dialogue Act Classification for Open-Domain Conversational Agents
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalClassifying the general intent of the user utterance in a conversation, also known as Dialogue Act (DA), e.g., open-ended question, statement of opinion, or request for an opinion, is a key step in Natural Language Understanding (NLU) for conversational ...
Dialogue act sequence labeling using hierarchical encoder with CRF
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial IntelligenceDialogue Act recognition associate dialogue acts (i.e., semantic labels) to utterances in a conversation. The problem of associating semantic labels to utterances can be treated as a sequence labeling problem. In this work, we build a hierarchical ...
Comments