ABSTRACT
Most up-to-date well-behaved topic-based summarization systems are built upon the extractive framework. They score the sentences based on the associated features by manually assigning or experimentally tuning the weights of the features. In this paper, we discuss how to develop learning strategies in order to obtain the optimal feature weights automatically, which can be used for assigning a sound score to a sentence characterized with a set of features. The two fundamental issues are about training data and learning models. To save the costly manual annotation time and effort, we construct the training data by labeling the sentence with a "true" score calculated according to human summaries. The Support Vector Regression (SVR) model is then used to learn how to relate the "true" score of the sentence to its features. Once the relations have been mathematically modeled, SVR is able to predict the "estimated" score for any given sentence. The evaluations by ROUGE-2 criterion on DUC 2006 and DUC 2005 document sets demonstrate the competitiveness and the adaptability of the proposed approaches.
- Ani Nenkova and Lucy Vanderwende. The Impact of Frequency on Summarization. MSR-TR-2005-101. Microsoft Research Technical Report, 2005.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Support Vector Machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm Google ScholarDigital Library
- Chin-Yew Lin and Eduard Hovy. Manual and Automatic Evaluation of Summaries. In Document Understanding Conference 2002 http:/duc.nist.gov, 2002. Google ScholarDigital Library
- Christiane Fellbaum, editer. WordNet: An Electronic Lexical Database. The MIT Press, Cambridge London, 1998.Google Scholar
- Deepak Ravichandran, Eduard Hovy. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the 40th Annual Meeting of the ACL, pages 41--47, 2002. Google ScholarDigital Library
- Dragomir R. Radev, Jahna Otterbacher, Hong Qi, Daniel Tam. MEAD ReDUCs: Michigan at DUC 2003. In Document Understanding Conference 2003, 2003. http://duc.nist.govGoogle Scholar
- Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Valentin Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, pages 168--175, 2002. Google ScholarDigital Library
- Hoa Trang Dang. Overview of DUC 2005. Document Understanding Conference 2005 http://duc.nist.gov, 2005.Google Scholar
- Hoa Trang Dang. Overview of DUC 2006. Document Understanding Conference 2006 http://duc.nist.gov, 2006.Google Scholar
- John M.Conroy, Judith D. Schlesinger, Dianne P. O'Leary. Topic-Focused Multi-document Summarization Using an Approximate Oracle Score. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 152--159, 2006. Google ScholarDigital Library
- Julian M. Kupiec, Jan Pedersen, and Francine Chen. A Trainable Document Summarizer. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 68--73, 1995. Google ScholarDigital Library
- Liang Zhou and Eduard Hovy. A Web-trained Extraction Summarization System. In Proceedings of HLT-NAACL 2003, pages 205--211, 2003. Google ScholarDigital Library
- Lin Zhao, Lide Wu, Xuanjing Huang. Fudan University at DUC 2005. In Document Understanding Conference 2005. http://duc.nist.gov, 2005.Google Scholar
- Satanjeev Banerjee, Ted Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLING-02). pages 136--145, 2002. Google ScholarDigital Library
- Schölkopf, B., Bartlett, P. L., Smola, A., & Williamson, R. C. Support Vector Regression with Automatic Accuracy Control. In Proceedings of the 8th International Conference on Artificial Neural Networks, pages 111--116, 1998.Google Scholar
- Seeger Fisher, Brian Roark. Query-Focused Summarization By Supervised Sentence Ranking and Skewed Word Distributions. In Document Understanding Conference 2006. http:/duc.nist.gov, 2006.Google Scholar
- Steve R. Gunn. Support Vector Machines for Classification and Regression, Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, 1998.Google Scholar
- Tsutomu Hirao, Hideki Isozaki. Extracting Important Sentences with Support Vector Machines. Proceedings of the 19th International Conference on Computational Linguistics, pages 342--348, 2002. Google ScholarDigital Library
- Vladimir Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998. Google ScholarDigital Library
Index Terms
- Developing learning strategies for topic-based summarization
Recommendations
Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementReaders of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Topic analysis for topic-focused multi-document summarization
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementTopic-focused multi-document summarization has been a challenging task because the created summary is required to be biased to the given topic or query. Existing methods consider the given topic as a single coarse unit and then directly incorporate the ...
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Comments