skip to main content
research-article

A comprehensive comparative evaluation of RST-based summarization methods

Published:18 May 2010Publication History
Skip Abstract Section

Abstract

Motivated by governmental, commercial and academic interests, and due to the growing amount of information, mainly online, automatic text summarization area has experienced an increasing number of researches and products, which led to a countless number of summarization methods. In this paper, we present a comprehensive comparative evaluation of the main automatic text summarization methods based on Rhetorical Structure Theory (RST), claimed to be among the best ones. We compare our results to superficial summarizers, which belong to a paradigm with severe limitations, and to hybrid methods, combining RST and superficial methods. We also test voting systems and machine learning techniques trained on RST features. We run experiments for English and Brazilian Portuguese languages and compare the results obtained by using manually and automatically parsed texts. Our results systematically show that all RST methods have comparable overall performance and that they outperform most of the superficial methods. Machine learning techniques achieved high accuracy in the classification of text segments worth of being in the summary, but were not able to produce more informative summaries than the regular RST methods.

References

  1. Baxendale, P. B. 1958. Machine-Made index for technical literature—An experiment. IBM J. Res. Devel. 2, 354--365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Burstein, J., Marcu, D., and Knight, K. 2003. Finding the WRITE stuff: Automatic identification of discourse structure in student essays. IEEE Intell. Syst., 32--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Carbonel, T. I., Seno, E. R. M., Pardo, T. A. S., Coelho, J. C., Collovini, S., Rino, L. H. M., and Vieira, R. 2006. A two-step summarizer of Brazilian Portuguese texts. In Proceedings of the 4th Workshop on Information and Human Language Technology (TIL).Google ScholarGoogle Scholar
  4. Carlson, L., Marcu, D., and Okurowski, M. E. 2003. Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Current Directions in Discourse and Dialogue, J. van Kuppevelt and R. Smith, Eds. Kluwer Academic Publishers, 85--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristea, D., Ide, N., and Romary, L. 1998. Veins theory: A model of global discourse cohesion and coherence. In Proceedings of the Coling-ACL Conference. 281--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Leite, D. S., Rino, L. H. M., Pardo, T. A. S., and Nunes, M. G. V. 2007. Extractive automatic summarization: Does more linguistic knowledge make a difference? In Proceedings of the HLT/NAACL Workshop on TextGraphs-2: Graph-Based Algorithms for Natural Language Processing. 17--24.Google ScholarGoogle Scholar
  7. Lin, C. Y. and Hovy, E. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the Language Technology Conference (HLT-NAACL'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Luhn, H. 1958. The automatic creation of literature abstracts. IBM J. Res. Devel. 2, 159--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mani, I. and Maybury, M. T. 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mani, I. 2001. Automatic Summarization. John Benjamins Publishing.Google ScholarGoogle Scholar
  11. Mann, W. C. and Thompson, S. A. 1987. Rhetorical structure theory: A theory of text organization. Tech. rep. ISI/RS-87-190, University of Southern California.Google ScholarGoogle Scholar
  12. Mann, W. C. and Thompson, S. A. 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text. Pragmatics & Beyond, New Series. John Benjamins.Google ScholarGoogle ScholarCross RefCross Ref
  13. Marcu, D. 1997. The rhetorical parsing, summarization, and generation of natural language texts. Ph.D. thesis, University of Toronto. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marcu, D. 1998. To build text summaries of high quality, nuclearity is not sufficient. Working Notes of the AAAI-98 Spring Symposium on Intelligent Text Summarization.Google ScholarGoogle Scholar
  15. Marcu, D. 2000. The Theory and Practice of Discourse Parsing and Summarization. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Marcu, D., Carlson, L., and Watanabe, M. 2000. The automatic translation of discourse structures. In Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'00), Vol. 1, 9--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O'Donnell, M. 1997. Variable-Length on-line document generation. In Proceedings of the 6th European Workshop on Natural Language Generation.Google ScholarGoogle Scholar
  18. Ono, K., Sumita, K., and Miike, S. 1994. Abstract generation based on rhetorical structure extraction. In Proceedings of the International Conference on Computational Linguistics (Coling-94). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pardo, T. A. S., Rino, L. H. M., and Nunes, M. G. V. 2003. GistSumm: A summarization tool based on a new extractive method. In Proceedings of the 6th Workshop on Computational Processing of the Portuguese Language - Written and Spoken (PROPOR). Lecture Notes in Artificial Intelligence, vol. 2721. 210--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pardo, T. A. S. and Seno, E. R. M. 2005. Rhetalho: Um corpus de referência anotado retoricamente. In Proceedings of the V Encontro de Corpora.Google ScholarGoogle Scholar
  21. Pardo, T. A. S. and Nunes, M. G. V. 2006. Review and evaluation of DiZer—An automatic discourse analyzer for Brazilian Portuguese. In Proceedings of the 7th Workshop on Computational Processing of Written and Spoken Portuguese (PROPOR). Lecture Notes in Computer Science, vol. 3960. Springer, 180--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pardo, T. A. S. and Nunes, M. G. V. 2008. On the development and evaluation of a Brazilian Portuguese discourse parser. J. Theor. Appl. Comput. 15, 2, 43--64.Google ScholarGoogle Scholar
  23. Soricut, R. and Marcu, D. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the HLT-NAACL Conference. 149--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Salton, G. 1989. Automatic Text Processing. The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Schauer, H. 2000. Referential structure and coherence structure. In Proceedings of the TALN Conference.Google ScholarGoogle Scholar
  26. Skorochodko, E. F. 1971. Adaptive method of automatic abstracting and indexing. Inform. Process. 2, 1179--1182.Google ScholarGoogle Scholar
  27. Spärck Jones, K. 2007. Automatic summarising: A review and discussion of the state of the art. Tech. rep. UCAM-CL-TR-679, University of Cambridge.Google ScholarGoogle Scholar
  28. Sumita, K., Ono, K., Chino, T., Ukita, T., and Amano, S. 1992. A discourse structure analyzer for Japonese text. In Proceedings of the International Conference on Fifth Generation Computer Systems, Vol. 2, 1133--1140.Google ScholarGoogle Scholar
  29. Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2007. Estudo e avaliação de métodos de sumarização automática de textos baseados na rst. Tech. rep. ICMC-USP, São Carlos-SP.Google ScholarGoogle Scholar
  30. Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2008. Evaluation of automatic text summarization methods based on rhetorical structure theory. In Proceedings of the 8th IEEE International Conference on Intelligent Systems Design and Applications (ISDA'08). 389--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Uzêda, V. R., Pardo, T. A. S., and Nunes, M. G. V. 2009. A comprehensive summary informativeness evaluation for RST-based summarization methods. Int. J. Comput. Inform. Syst. Industr. Manag. Appl. 1, 188--196.Google ScholarGoogle Scholar
  32. Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wolf, F. and Gibson, E. 2006. Coherence in Natural Language. Data Structures and Applications. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A comprehensive comparative evaluation of RST-based summarization methods

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Speech and Language Processing
      ACM Transactions on Speech and Language Processing   Volume 6, Issue 4
      May 2010
      20 pages
      ISSN:1550-4875
      EISSN:1550-4883
      DOI:10.1145/1767756
      Issue’s Table of Contents

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 May 2010
      • Accepted: 1 March 2010
      • Revised: 1 October 2009
      • Received: 1 May 2009
      Published in tslp Volume 6, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader