research-article

A comprehensive comparative evaluation of RST-based summarization methods

Authors:
Vinícius Rodrigues Uzêda

Universidade de São Paulo, Brazil

Universidade de São Paulo, Brazil
View Profile

,
Thiago Alexandre Salgueiro Pardo

Universidade de São Paulo, Brazil

Universidade de São Paulo, Brazil
View Profile

,
Maria Das Graças Volpe Nunes

Universidade de São Paulo, Brazil

Universidade de São Paulo, Brazil
View Profile

ACM Transactions on Speech and Language Processing Volume 6 Issue 4Article No.: 4pp 1–20https://doi.org/10.1145/1767756.1767757

Published:18 May 2010Publication History

ACM Transactions on Speech and Language Processing

Abstract

Motivated by governmental, commercial and academic interests, and due to the growing amount of information, mainly online, automatic text summarization area has experienced an increasing number of researches and products, which led to a countless number of summarization methods. In this paper, we present a comprehensive comparative evaluation of the main automatic text summarization methods based on Rhetorical Structure Theory (RST), claimed to be among the best ones. We compare our results to superficial summarizers, which belong to a paradigm with severe limitations, and to hybrid methods, combining RST and superficial methods. We also test voting systems and machine learning techniques trained on RST features. We run experiments for English and Brazilian Portuguese languages and compare the results obtained by using manually and automatically parsed texts. Our results systematically show that all RST methods have comparable overall performance and that they outperform most of the superficial methods. Machine learning techniques achieved high accuracy in the classification of text segments worth of being in the summary, but were not able to produce more informative summaries than the regular RST methods.

References

Baxendale, P. B. 1958. Machine-Made index for technical literature—An experiment. IBM J. Res. Devel. 2, 354--365. Google ScholarDigital Library
Burstein, J., Marcu, D., and Knight, K. 2003. Finding the WRITE stuff: Automatic identification of discourse structure in student essays. IEEE Intell. Syst., 32--39. Google ScholarDigital Library
Carbonel, T. I., Seno, E. R. M., Pardo, T. A. S., Coelho, J. C., Collovini, S., Rino, L. H. M., and Vieira, R. 2006. A two-step summarizer of Brazilian Portuguese texts. In Proceedings of the 4th Workshop on Information and Human Language Technology (TIL).Google Scholar
Carlson, L., Marcu, D., and Okurowski, M. E. 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Current Directions in Discourse and Dialogue, J. van Kuppevelt and R. Smith, Eds. Kluwer Academic Publishers, 85--112. Google ScholarDigital Library
Cristea, D., Ide, N., and Romary, L. 1998. Veins theory: A model of global discourse cohesion and coherence. In Proceedings of the Coling-ACL Conference. 281--285. Google ScholarDigital Library
Leite, D. S., Rino, L. H. M., Pardo, T. A. S., and Nunes, M. G. V. 2007. Extractive automatic summarization: Does more linguistic knowledge make a difference&quest; In Proceedings of the HLT/NAACL Workshop on TextGraphs-2: Graph-Based Algorithms for Natural Language Processing. 17--24.Google Scholar
Lin, C. Y. and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the Language Technology Conference (HLT-NAACL'03). Google ScholarDigital Library
Luhn, H. 1958. The automatic creation of literature abstracts. IBM J. Res. Devel. 2, 159--165. Google ScholarDigital Library
Mani, I. and Maybury, M. T. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, MA. Google ScholarDigital Library
Mani, I. 2001. Automatic Summarization. John Benjamins Publishing.Google Scholar
Mann, W. C. and Thompson, S. A. 1987. Rhetorical structure theory: A theory of text organization. Tech. rep. ISI/RS-87-190, University of Southern California.Google Scholar
Mann, W. C. and Thompson, S. A. 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text. Pragmatics & Beyond, New Series. John Benjamins.Google ScholarCross Ref
Marcu, D. 1997. The rhetorical parsing, summarization, and generation of natural language texts. Ph.D. thesis, University of Toronto. Google ScholarDigital Library
Marcu, D. 1998. To build text summaries of high quality, nuclearity is not sufficient. Working Notes of the AAAI-98 Spring Symposium on Intelligent Text Summarization.Google Scholar
Marcu, D. 2000. The Theory and Practice of Discourse Parsing and Summarization. The MIT Press. Google ScholarDigital Library
Marcu, D., Carlson, L., and Watanabe, M. 2000. The automatic translation of discourse structures. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00), Vol. 1, 9--17. Google ScholarDigital Library
O'Donnell, M. 1997. Variable-Length on-line document generation. In Proceedings of the 6th European Workshop on Natural Language Generation.Google Scholar
Ono, K., Sumita, K., and Miike, S. 1994. Abstract generation based on rhetorical structure extraction. In Proceedings of the International Conference on Computational Linguistics (Coling-94). Google ScholarDigital Library
Pardo, T. A. S., Rino, L. H. M., and Nunes, M. G. V. 2003. GistSumm: A summarization tool based on a new extractive method. In Proceedings of the 6th Workshop on Computational Processing of the Portuguese Language - Written and Spoken (PROPOR). Lecture Notes in Artificial Intelligence, vol. 2721. 210--218. Google ScholarDigital Library
Pardo, T. A. S. and Seno, E. R. M. 2005. Rhetalho: Um corpus de referência anotado retoricamente. In Proceedings of the V Encontro de Corpora.Google Scholar
Pardo, T. A. S. and Nunes, M. G. V. 2006. Review and evaluation of DiZer—An automatic discourse analyzer for Brazilian Portuguese. In Proceedings of the 7th Workshop on Computational Processing of Written and Spoken Portuguese (PROPOR). Lecture Notes in Computer Science, vol. 3960. Springer, 180--189. Google ScholarDigital Library
Pardo, T. A. S. and Nunes, M. G. V. 2008. On the development and evaluation of a Brazilian Portuguese discourse parser. J. Theor. Appl. Comput. 15, 2, 43--64.Google Scholar
Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the HLT-NAACL Conference. 149--156. Google ScholarDigital Library
Salton, G. 1989. Automatic Text Processing. The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley. Google ScholarDigital Library
Schauer, H. 2000. Referential structure and coherence structure. In Proceedings of the TALN Conference.Google Scholar
Skorochodko, E. F. 1971. Adaptive method of automatic abstracting and indexing. Inform. Process. 2, 1179--1182.Google Scholar
Spärck Jones, K. 2007. Automatic summarising: A review and discussion of the state of the art. Tech. rep. UCAM-CL-TR-679, University of Cambridge.Google Scholar
Sumita, K., Ono, K., Chino, T., Ukita, T., and Amano, S. 1992. A discourse structure analyzer for Japonese text. In Proceedings of the International Conference on Fifth Generation Computer Systems, Vol. 2, 1133--1140.Google Scholar
Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2007. Estudo e avaliação de métodos de sumarização automática de textos baseados na rst. Tech. rep. ICMC-USP, São Carlos-SP.Google Scholar
Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2008. Evaluation of automatic text summarization methods based on rhetorical structure theory. In Proceedings of the 8th IEEE International Conference on Intelligent Systems Design and Applications (ISDA'08). 389--394. Google ScholarDigital Library
Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2009. A comprehensive summary informativeness evaluation for RST-based summarization methods. Int. J. Comput. Inform. Syst. Industr. Manag. Appl. 1, 188--196.Google Scholar
Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Google ScholarDigital Library
Wolf, F. and Gibson, E. 2006. Coherence in Natural Language. Data Structures and Applications. The MIT Press. Google ScholarDigital Library

Index Terms

A comprehensive comparative evaluation of RST-based summarization methods
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics

Recommendations

A Comparative Analysis on Hindi and English Extractive Text Summarization

Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Read More
Sentiment diversification for short review summarization
WI '17: Proceedings of the International Conference on Web Intelligence

With the abundance of reviews published on the Web about a given product, consumers are looking for ways to view major opinions that can be presented in a quick and succinct way. Reviews contain many different opinions, making the ability to show a ...
Read More
Multi-document Hyperedge-based Ranking for Text Summarization
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

In a multi-document settings, graph-based extractive summarization approaches build a similarity graph out of sentences in each cluster of documents then use graph centrality approaches to measure the importance of sentences. The similarity is computed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Speech and Language Processing Volume 6, Issue 4
May 2010
20 pages
ISSN:1550-4875
EISSN:1550-4883
DOI:10.1145/1767756
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 May 2010
- Accepted: 1 March 2010
- Revised: 1 October 2009
- Received: 1 May 2009
Published in tslp Volume 6, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Text summarization
rhetorical structure theory
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 614
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comprehensive comparative evaluation of RST-based summarization methods

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

A Comparative Analysis on Hindi and English Extractive Text Summarization

Sentiment diversification for short review summarization

Multi-document Hyperedge-based Ranking for Text Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A comprehensive comparative evaluation of RST-based summarization methods

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Index Terms

Recommendations

A Comparative Analysis on Hindi and English Extractive Text Summarization

Sentiment diversification for short review summarization

Multi-document Hyperedge-based Ranking for Text Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media