skip to main content
10.1145/1321440.1321454acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Developing learning strategies for topic-based summarization

Authors Info & Claims
Published:06 November 2007Publication History

ABSTRACT

Most up-to-date well-behaved topic-based summarization systems are built upon the extractive framework. They score the sentences based on the associated features by manually assigning or experimentally tuning the weights of the features. In this paper, we discuss how to develop learning strategies in order to obtain the optimal feature weights automatically, which can be used for assigning a sound score to a sentence characterized with a set of features. The two fundamental issues are about training data and learning models. To save the costly manual annotation time and effort, we construct the training data by labeling the sentence with a "true" score calculated according to human summaries. The Support Vector Regression (SVR) model is then used to learn how to relate the "true" score of the sentence to its features. Once the relations have been mathematically modeled, SVR is able to predict the "estimated" score for any given sentence. The evaluations by ROUGE-2 criterion on DUC 2006 and DUC 2005 document sets demonstrate the competitiveness and the adaptability of the proposed approaches.

References

  1. Ani Nenkova and Lucy Vanderwende. The Impact of Frequency on Summarization. MSR-TR-2005-101. Microsoft Research Technical Report, 2005.Google ScholarGoogle Scholar
  2. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Support Vector Machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chin-Yew Lin and Eduard Hovy. Manual and Automatic Evaluation of Summaries. In Document Understanding Conference 2002 http:/duc.nist.gov, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christiane Fellbaum, editer. WordNet: An Electronic Lexical Database. The MIT Press, Cambridge London, 1998.Google ScholarGoogle Scholar
  5. Deepak Ravichandran, Eduard Hovy. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the 40th Annual Meeting of the ACL, pages 41--47, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dragomir R. Radev, Jahna Otterbacher, Hong Qi, Daniel Tam. MEAD ReDUCs: Michigan at DUC 2003. In Document Understanding Conference 2003, 2003. http://duc.nist.govGoogle ScholarGoogle Scholar
  7. Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Valentin Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, pages 168--175, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hoa Trang Dang. Overview of DUC 2005. Document Understanding Conference 2005 http://duc.nist.gov, 2005.Google ScholarGoogle Scholar
  9. Hoa Trang Dang. Overview of DUC 2006. Document Understanding Conference 2006 http://duc.nist.gov, 2006.Google ScholarGoogle Scholar
  10. John M.Conroy, Judith D. Schlesinger, Dianne P. O'Leary. Topic-Focused Multi-document Summarization Using an Approximate Oracle Score. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 152--159, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Julian M. Kupiec, Jan Pedersen, and Francine Chen. A Trainable Document Summarizer. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 68--73, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Liang Zhou and Eduard Hovy. A Web-trained Extraction Summarization System. In Proceedings of HLT-NAACL 2003, pages 205--211, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lin Zhao, Lide Wu, Xuanjing Huang. Fudan University at DUC 2005. In Document Understanding Conference 2005. http://duc.nist.gov, 2005.Google ScholarGoogle Scholar
  14. Satanjeev Banerjee, Ted Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLING-02). pages 136--145, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Schölkopf, B., Bartlett, P. L., Smola, A., & Williamson, R. C. Support Vector Regression with Automatic Accuracy Control. In Proceedings of the 8th International Conference on Artificial Neural Networks, pages 111--116, 1998.Google ScholarGoogle Scholar
  16. Seeger Fisher, Brian Roark. Query-Focused Summarization By Supervised Sentence Ranking and Skewed Word Distributions. In Document Understanding Conference 2006. http:/duc.nist.gov, 2006.Google ScholarGoogle Scholar
  17. Steve R. Gunn. Support Vector Machines for Classification and Regression, Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, 1998.Google ScholarGoogle Scholar
  18. Tsutomu Hirao, Hideki Isozaki. Extracting Important Sentences with Support Vector Machines. Proceedings of the 19th International Conference on Computational Linguistics, pages 342--348, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Vladimir Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Developing learning strategies for topic-based summarization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
      November 2007
      1048 pages
      ISBN:9781595938039
      DOI:10.1145/1321440

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 November 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader