Abstract
Motivated by governmental, commercial and academic interests, and due to the growing amount of information, mainly online, automatic text summarization area has experienced an increasing number of researches and products, which led to a countless number of summarization methods. In this paper, we present a comprehensive comparative evaluation of the main automatic text summarization methods based on Rhetorical Structure Theory (RST), claimed to be among the best ones. We compare our results to superficial summarizers, which belong to a paradigm with severe limitations, and to hybrid methods, combining RST and superficial methods. We also test voting systems and machine learning techniques trained on RST features. We run experiments for English and Brazilian Portuguese languages and compare the results obtained by using manually and automatically parsed texts. Our results systematically show that all RST methods have comparable overall performance and that they outperform most of the superficial methods. Machine learning techniques achieved high accuracy in the classification of text segments worth of being in the summary, but were not able to produce more informative summaries than the regular RST methods.
- Baxendale, P. B. 1958. Machine-Made index for technical literature—An experiment. IBM J. Res. Devel. 2, 354--365. Google ScholarDigital Library
- Burstein, J., Marcu, D., and Knight, K. 2003. Finding the WRITE stuff: Automatic identification of discourse structure in student essays. IEEE Intell. Syst., 32--39. Google ScholarDigital Library
- Carbonel, T. I., Seno, E. R. M., Pardo, T. A. S., Coelho, J. C., Collovini, S., Rino, L. H. M., and Vieira, R. 2006. A two-step summarizer of Brazilian Portuguese texts. In Proceedings of the 4th Workshop on Information and Human Language Technology (TIL).Google Scholar
- Carlson, L., Marcu, D., and Okurowski, M. E. 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Current Directions in Discourse and Dialogue, J. van Kuppevelt and R. Smith, Eds. Kluwer Academic Publishers, 85--112. Google ScholarDigital Library
- Cristea, D., Ide, N., and Romary, L. 1998. Veins theory: A model of global discourse cohesion and coherence. In Proceedings of the Coling-ACL Conference. 281--285. Google ScholarDigital Library
- Leite, D. S., Rino, L. H. M., Pardo, T. A. S., and Nunes, M. G. V. 2007. Extractive automatic summarization: Does more linguistic knowledge make a difference? In Proceedings of the HLT/NAACL Workshop on TextGraphs-2: Graph-Based Algorithms for Natural Language Processing. 17--24.Google Scholar
- Lin, C. Y. and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the Language Technology Conference (HLT-NAACL'03). Google ScholarDigital Library
- Luhn, H. 1958. The automatic creation of literature abstracts. IBM J. Res. Devel. 2, 159--165. Google ScholarDigital Library
- Mani, I. and Maybury, M. T. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Mani, I. 2001. Automatic Summarization. John Benjamins Publishing.Google Scholar
- Mann, W. C. and Thompson, S. A. 1987. Rhetorical structure theory: A theory of text organization. Tech. rep. ISI/RS-87-190, University of Southern California.Google Scholar
- Mann, W. C. and Thompson, S. A. 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text. Pragmatics & Beyond, New Series. John Benjamins.Google ScholarCross Ref
- Marcu, D. 1997. The rhetorical parsing, summarization, and generation of natural language texts. Ph.D. thesis, University of Toronto. Google ScholarDigital Library
- Marcu, D. 1998. To build text summaries of high quality, nuclearity is not sufficient. Working Notes of the AAAI-98 Spring Symposium on Intelligent Text Summarization.Google Scholar
- Marcu, D. 2000. The Theory and Practice of Discourse Parsing and Summarization. The MIT Press. Google ScholarDigital Library
- Marcu, D., Carlson, L., and Watanabe, M. 2000. The automatic translation of discourse structures. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00), Vol. 1, 9--17. Google ScholarDigital Library
- O'Donnell, M. 1997. Variable-Length on-line document generation. In Proceedings of the 6th European Workshop on Natural Language Generation.Google Scholar
- Ono, K., Sumita, K., and Miike, S. 1994. Abstract generation based on rhetorical structure extraction. In Proceedings of the International Conference on Computational Linguistics (Coling-94). Google ScholarDigital Library
- Pardo, T. A. S., Rino, L. H. M., and Nunes, M. G. V. 2003. GistSumm: A summarization tool based on a new extractive method. In Proceedings of the 6th Workshop on Computational Processing of the Portuguese Language - Written and Spoken (PROPOR). Lecture Notes in Artificial Intelligence, vol. 2721. 210--218. Google ScholarDigital Library
- Pardo, T. A. S. and Seno, E. R. M. 2005. Rhetalho: Um corpus de referência anotado retoricamente. In Proceedings of the V Encontro de Corpora.Google Scholar
- Pardo, T. A. S. and Nunes, M. G. V. 2006. Review and evaluation of DiZer—An automatic discourse analyzer for Brazilian Portuguese. In Proceedings of the 7th Workshop on Computational Processing of Written and Spoken Portuguese (PROPOR). Lecture Notes in Computer Science, vol. 3960. Springer, 180--189. Google ScholarDigital Library
- Pardo, T. A. S. and Nunes, M. G. V. 2008. On the development and evaluation of a Brazilian Portuguese discourse parser. J. Theor. Appl. Comput. 15, 2, 43--64.Google Scholar
- Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the HLT-NAACL Conference. 149--156. Google ScholarDigital Library
- Salton, G. 1989. Automatic Text Processing. The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley. Google ScholarDigital Library
- Schauer, H. 2000. Referential structure and coherence structure. In Proceedings of the TALN Conference.Google Scholar
- Skorochodko, E. F. 1971. Adaptive method of automatic abstracting and indexing. Inform. Process. 2, 1179--1182.Google Scholar
- Spärck Jones, K. 2007. Automatic summarising: A review and discussion of the state of the art. Tech. rep. UCAM-CL-TR-679, University of Cambridge.Google Scholar
- Sumita, K., Ono, K., Chino, T., Ukita, T., and Amano, S. 1992. A discourse structure analyzer for Japonese text. In Proceedings of the International Conference on Fifth Generation Computer Systems, Vol. 2, 1133--1140.Google Scholar
- Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2007. Estudo e avaliação de métodos de sumarização automática de textos baseados na rst. Tech. rep. ICMC-USP, São Carlos-SP.Google Scholar
- Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2008. Evaluation of automatic text summarization methods based on rhetorical structure theory. In Proceedings of the 8th IEEE International Conference on Intelligent Systems Design and Applications (ISDA'08). 389--394. Google ScholarDigital Library
- Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2009. A comprehensive summary informativeness evaluation for RST-based summarization methods. Int. J. Comput. Inform. Syst. Industr. Manag. Appl. 1, 188--196.Google Scholar
- Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Google ScholarDigital Library
- Wolf, F. and Gibson, E. 2006. Coherence in Natural Language. Data Structures and Applications. The MIT Press. Google ScholarDigital Library
Index Terms
- A comprehensive comparative evaluation of RST-based summarization methods
Recommendations
A Comparative Analysis on Hindi and English Extractive Text Summarization
Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Sentiment diversification for short review summarization
WI '17: Proceedings of the International Conference on Web IntelligenceWith the abundance of reviews published on the Web about a given product, consumers are looking for ways to view major opinions that can be presented in a quick and succinct way. Reviews contain many different opinions, making the ability to show a ...
Multi-document Hyperedge-based Ranking for Text Summarization
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementIn a multi-document settings, graph-based extractive summarization approaches build a similarity graph out of sentences in each cluster of documents then use graph centrality approaches to measure the importance of sentences. The similarity is computed ...
Comments