research-article

Dialogue Act Recognition via CRF-Attentive Structured Network

Authors:
Zheqian Chen

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China
View Profile

,
Rongqin Yang

Zhejiang University, Hangzhou, China

Zhejiang University, Hangzhou, China
View Profile

,
Zhou Zhao

, Hangzhou, China

, Hangzhou, China
View Profile

,
Deng Cai

Alibaba-Zhejiang University Joint Institute of Frontier Technologies, Hangzhou, China

Alibaba-Zhejiang University Joint Institute of Frontier Technologies, Hangzhou, China
View Profile

,
Xiaofei He

Fabu Inc., Hangzhou, China

Fabu Inc., Hangzhou, China
View Profile

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalJune 2018Pages 225–234https://doi.org/10.1145/3209978.3209997

Published:27 June 2018Publication History

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Pages 225–234

ABSTRACT

Dialogue Act Recognition (DAR) is a challenging problem in dialogue interpretation, which aims to associate semantic labels to utterances and characterize the speaker's intention. Currently, many existing approaches formulate the DAR problem ranging from multi-classification to structured prediction, which suffer from handcrafted feature extensions and attentive contextual dependencies. In this paper, we tackle the problem of DAR from the viewpoint of extending richer Conditional Random Field (CRF) structured dependencies without abandoning end-to-end training. We incorporate hierarchical semantic inference with memory mechanism on the utterance modeling at multiple levels. We then utilize the structured attention network on the linear-chain CRF to dynamically separate the utterances into cliques. The extensive experiments on two primary benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder Dialogue Act (MRDA) datasets show that our method achieves better performance than other state-of-the-art solutions to the problem.

References

Miltiadis Allamanis, Hao Peng, and Charles Sutton . 2016. A convolutional attention network for extreme summarization of source code International Conference on Machine Learning. 2091--2100.Google Scholar
Jeremy Ang, Yang Liu, and Elizabeth Shriberg . 2005. Automatic Dialog Act Segmentation and Classification in Multiparty Meetings. In ICASSP.Google Scholar
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio . 2014. Neural Machine Translation by Jointly Learning to Align and Translate. Computer Science (2014).Google Scholar
Phil Blunsom and Nal Kalchbrenner . 2013. Recurrent Convolutional Neural Networks for Discourse Compositionality Proceedings of the 2013 Workshop on Continuous Vector Space Models and their Compositionality. Proceedings of the 2013 Workshop on Continuous Vector Space Models and their Compositionality.Google Scholar
Kristy Elizabeth Boyer, Eunyoung Ha, Michael D Wallis, Robert Phillips, Mladen A Vouk, and James C Lester . 2009. Discovering Tutorial Dialogue Strategies with Hidden Markov Models. AIED. 141--148. Google ScholarDigital Library
Yun-Nung Chen, William Yang Wang, and Alexander I Rudnicky . 2013. An empirical investigation of sparse log-linear models for improved dialogue act classification. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 8317--8321.Google Scholar
Zheqian Chen, Rongqin Yang, Bin Cao, Zhou Zhao, Deng Cai, and Xiaofei He . 2017. Smarnet: Teaching Machines to Read and Comprehend Like Human. (2017).Google Scholar
Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio . 2014. On the Properties of Neural Machine Translation: Encoder--Decoder Approaches. Syntax, Semantics and Structure in Statistical Translation (2014), 103.Google Scholar
Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W Cohen, and Ruslan Salakhutdinov . {n. d.}. Gated-Attention Readers for Text Comprehension. (. {n. d.}).Google Scholar
Orhan Firat, Kyunghyun Cho, and Yoshua Bengio . 2016. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism Proceedings of NAACL-HLT. 866--875.Google Scholar
Michel Galley . 2006. A skip-chain conditional random field for ranking meeting utterances by importance Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 364--372. Google ScholarDigital Library
Jeroen Geertzen, Volha Petukhova, and Harry Bunt . 2007. A multidimensional approach to utterance segmentation and dialogue act classification. In Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, Antwerp. 140--149.Google Scholar
Sergio Grau, Emilio Sanchis, Maria Jose Castro, and David Vilar . 2004. Dialogue act classification using a Bayesian approach 9th Conference Speech and Computer.Google Scholar
Ryuichiro Higashinaka, Kenji Imamura, Toyomi Meguro, Chiaki Miyazaki, Nozomi Kobayashi, Hiroaki Sugiyama, Toru Hirano, Toshiro Makino, and Yoshihiro Matsuo . 2014. Towards an open-domain conversational system fully based on natural language processing.. In COLING. 928--939.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber . 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Zhiheng Huang, Wei Xu, and Kai Yu . {n. d.}. Bidirectional LSTM-CRF Models for Sequence Tagging. (. {n. d.}).Google Scholar
Yangfeng Ji, Gholamreza Haffari, and Jacob Eisenstein . 2016 a. A Latent Variable Recurrent Neural Network for Discourse-Driven Language Models HLT-NAACL.Google Scholar
Yangfeng Ji, Gholamreza Haffari, and Jacob Eisenstein . 2016 b. A Latent Variable Recurrent Neural Network for Discourse Relation Language Models Proceedings of NAACL-HLT. 332--342.Google Scholar
Nal Kalchbrenner and Phil Blunsom . 2013. Recurrent Convolutional Neural Networks for Discourse Compositionality. ACL 2013 (2013), 119.Google Scholar
Hamed Khanpour, Nishitha Guntakandla, and Rodney Nielsen . 2016. Dialogue Act Classification in Domain-Independent Conversations Using a Deep Recurrent Neural Network. In COLING.Google Scholar
Yoon Kim, Carl Denton, and Luong Hoang Alexander M Rush . {n. d.}. Structured Attention Networks. (. {n. d.}).Google Scholar
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush . 2016. Character-Aware Neural Language Models.. In AAAI. 2741--2749. Google ScholarDigital Library
Harshit Kumar, Arvind Agarwal, Riddhiman Dasgupta, Sachindra Joshi, and Arun Kumar . 2017 a. Dialogue Act Sequence Labeling using Hierarchical encoder with CRF. (2017).Google Scholar
Harshit Kumar, Arvind Agarwal, Riddhiman Dasgupta, Sachindra Joshi, and Arun Kumar . 2017 b. Dialogue Act Sequence Labeling using Hierarchical encoder with CRF. CoRR Vol. abs/1709.04250 (2017).Google Scholar
Ji Young Lee and Franck Dernoncourt . 2016 a. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks HLT-NAACL.Google Scholar
Ji Young Lee and Franck Dernoncourt . 2016 b. Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks Proceedings of NAACL-HLT. 515--520.Google Scholar
Piroska Lendvai and Jeroen Geertzen . 2007. Token-based chunking of turn-internal dialogue act sequences Proceedings of the 8th SIGDIAL Workshop on Discourse and Dialogue. 174--181.Google Scholar
Fei Liu, Timothy Baldwin, and Trevor Cohn . 2017. Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields. (2017).Google Scholar
Minh-Thang Luong, Hieu Pham, and Christopher D Manning . {n. d.}. Effective Approaches to Attention-based Neural Machine Translation. (. {n. d.}).Google Scholar
Xuezhe Ma and Eduard Hovy . {n. d.}. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. (. {n. d.}).Google Scholar
Dmitrijs Milajevs and Matthew Purver . 2014. Investigating the contribution of distributional semantic information for dialogue act classification. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC). 40--47.Google ScholarCross Ref
Boyuan Pan, Hao Li, Zhou Zhao, Bin Cao, Deng Cai, and Xiaofei He . 2017. MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension. (2017).Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher Manning . 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.Google ScholarCross Ref
Norbert Reithinger and Martin Klesen . 1997. Dialogue act classification using language models. EuroSpeech.Google Scholar
Alexander M Rush, SEAS Harvard, Sumit Chopra, and Jason Weston . {n. d.}. A Neural Attention Model for Sentence Summarization. (. {n. d.}).Google Scholar
Riccardo Serafin, Barbara Di Eugenio, and Michael Glass . 2003. Latent Semantic Analysis for dialogue act classification Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers-Volume 2. Association for Computational Linguistics, 94--96. Google ScholarDigital Library
Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov . 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research Vol. 15, 1 (2014), 1929--1958. Google ScholarDigital Library
Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Daniel Jurafsky, Paul Taylor, Rachel Martin, Carol Van Ess-Dykema, and Marie Meteer . 2006. Dialogue act modeling for automatic tagging and recognition of conversational speech. Dialogue Vol. 26, 3 (2006).Google Scholar
Dinoj Surendran and Gina-Anne Levow . 2006. Dialog act tagging with support vector machines and hidden Markov models. Interspeech.Google Scholar
Maryam Tavafi, Yashar Mehdad, Shafiq Joty, Giuseppe Carenini, and Raymond Ng . {n. d.}. Dialogue Act Recognition in Synchronous and Asynchronous Conversations. (. {n. d.}).Google Scholar
A. Viterbi . 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans.informat.theory Vol. 13, 2 (1967), 260--269. Google ScholarDigital Library
Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan Liu . 2016. Relation Classification via Multi-Level Attention CNNs. ACL (1).Google Scholar
Nick Webb, Mark Hepple, and Yorick Wilks . 2005. Dialogue act classification based on intra-utterance features Proceedings of the AAAI Workshop on Spoken Language Understanding, Vol. Vol. 4. 5.Google Scholar
Hongyang Xue, Zhou Zhao, and Deng Cai . 2017. Unifying the Video and Question Attentions for Open-Ended Video Question Answering. IEEE Transactions on Image Processing Vol. 26 (2017), 5656--5666.Google ScholarDigital Library
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J Smola, and Eduard H Hovy . 2016. Hierarchical Attention Networks for Document Classification. HLT-NAACL. 1480--1489.Google Scholar
Matthew D Zeiler . {n. d.}. ADADELTA: AN ADAPTIVE LEARNING RATE METHOD. (. {n. d.}).Google Scholar
Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi . 2017 b. Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders. In ACL.Google Scholar
Zhou Zhao, Hanqing Lu, Vincent Wenchen Zheng, Deng Cai, Xiaofei He, and Yueting Zhuang . 2017 a. Community-Based Question Answering via Asymmetric Multi-Faceted Ranking Network Learning AAAI.Google Scholar
Zhou Zhao, Qifan Yang, Hanqing Lu, Tim Weninger, Deng Cai, Xiaofei He, and Yueting Zhuang . 2018. Social-Aware Movie Recommendation via Multimodal Network Learning. IEEE Transactions on Multimedia Vol. 20 (2018), 430--440. Google ScholarDigital Library
Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu . 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. Vol. 2. 207--212.Google ScholarCross Ref
Yucan Zhou, Qinghua Hu, Jie Liu, and Yuan Jia . 2015. Combining heterogeneous deep neural networks with conditional random fields for Chinese dialogue act recognition. Neurocomputing Vol. 168 (2015), 408--417. Google ScholarDigital Library
Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai . 2017. What to Do Next: Modeling User Behaviors by Time-LSTM IJCAI. Google ScholarDigital Library
Matthias Zimmermann . 2009. Joint segmentation and classification of dialog acts using conditional random fields Tenth Annual Conference of the International Speech Communication Association.Google Scholar

Index Terms

Dialogue Act Recognition via CRF-Attentive Structured Network
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
    2. Interaction paradigms
      1. Natural language interfaces
  2. Interaction design
    1. Interaction design process and methods
      1. Contextual design
    2. Interaction design theory, concepts and paradigms

Recommendations

User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems
WWW '22: Proceedings of the ACM Web Conference 2022

User Satisfaction Estimation (USE) is an important yet challenging task in goal-oriented conversational systems. Whether the user is satisfied with the system largely depends on the fulfillment of the user’s needs, which can be implicitly reflected by ...
Read More
Contextual Dialogue Act Classification for Open-Domain Conversational Agents
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

Classifying the general intent of the user utterance in a conversation, also known as Dialogue Act (DA), e.g., open-ended question, statement of opinion, or request for an opinion, is a key step in Natural Language Understanding (NLU) for conversational ...
Read More
Dialogue Act Recognition Using Visual Information
Document Analysis and Recognition – ICDAR 2021
Abstract
Automatic dialogue management including dialogue act (DA) recognition is usually focused on dialogues in the audio signal. However, some dialogues are also available in a written form and their automatic analysis is also very important.
The main ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval
June 2018
1509 pages
ISBN:9781450356572
DOI:10.1145/3209978
General Chairs:
Kevyn Collins-Thompson
University of Michigan, United States
,
Qiaozhu Mei
University of Michigan, United States
,
Program Chairs:
Brian Davison
Lehigh University, United States
,
Yiqun Liu
Tsinghua University, China
,
Emine Yilmaz
University College London, United Kingdom
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
conditional random field
dialogue act recognition
structured attention network
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '18 Paper Acceptance Rate86of409submissions,21%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 715
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dialogue Act Recognition via CRF-Attentive Structured Network

SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems

Contextual Dialogue Act Classification for Open-Domain Conversational Agents

Dialogue Act Recognition Using Visual Information