research-article

Developing learning strategies for topic-based summarization

Authors:
You Ouyang

Hong Kong Polytechnic University, Hong Kong, Hong Kong

Hong Kong Polytechnic University, Hong Kong, Hong Kong
View Profile

,
Sujian Li

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Wenjie Li

Hong Kong Polytechnic University, Hong Kong, Hong Kong

Hong Kong Polytechnic University, Hong Kong, Hong Kong
View Profile

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementNovember 2007Pages 79–86https://doi.org/10.1145/1321440.1321454

Published:06 November 2007Publication History

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 79–86

ABSTRACT

Most up-to-date well-behaved topic-based summarization systems are built upon the extractive framework. They score the sentences based on the associated features by manually assigning or experimentally tuning the weights of the features. In this paper, we discuss how to develop learning strategies in order to obtain the optimal feature weights automatically, which can be used for assigning a sound score to a sentence characterized with a set of features. The two fundamental issues are about training data and learning models. To save the costly manual annotation time and effort, we construct the training data by labeling the sentence with a "true" score calculated according to human summaries. The Support Vector Regression (SVR) model is then used to learn how to relate the "true" score of the sentence to its features. Once the relations have been mathematically modeled, SVR is able to predict the "estimated" score for any given sentence. The evaluations by ROUGE-2 criterion on DUC 2006 and DUC 2005 document sets demonstrate the competitiveness and the adaptability of the proposed approaches.

References

Ani Nenkova and Lucy Vanderwende. The Impact of Frequency on Summarization. MSR-TR-2005-101. Microsoft Research Technical Report, 2005.Google Scholar
Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Support Vector Machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm Google ScholarDigital Library
Chin-Yew Lin and Eduard Hovy. Manual and Automatic Evaluation of Summaries. In Document Understanding Conference 2002 http:/duc.nist.gov, 2002. Google ScholarDigital Library
Christiane Fellbaum, editer. WordNet: An Electronic Lexical Database. The MIT Press, Cambridge London, 1998.Google Scholar
Deepak Ravichandran, Eduard Hovy. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the 40th Annual Meeting of the ACL, pages 41--47, 2002. Google ScholarDigital Library
Dragomir R. Radev, Jahna Otterbacher, Hong Qi, Daniel Tam. MEAD ReDUCs: Michigan at DUC 2003. In Document Understanding Conference 2003, 2003. http://duc.nist.govGoogle Scholar
Hamish Cunningham, Diana Maynard, Kalina Bontcheva, Valentin Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, pages 168--175, 2002. Google ScholarDigital Library
Hoa Trang Dang. Overview of DUC 2005. Document Understanding Conference 2005 http://duc.nist.gov, 2005.Google Scholar
Hoa Trang Dang. Overview of DUC 2006. Document Understanding Conference 2006 http://duc.nist.gov, 2006.Google Scholar
John M.Conroy, Judith D. Schlesinger, Dianne P. O'Leary. Topic-Focused Multi-document Summarization Using an Approximate Oracle Score. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 152--159, 2006. Google ScholarDigital Library
Julian M. Kupiec, Jan Pedersen, and Francine Chen. A Trainable Document Summarizer. In Edward A. Fox, Peter Ingwersen, and Raya Fidel, editors, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 68--73, 1995. Google ScholarDigital Library
Liang Zhou and Eduard Hovy. A Web-trained Extraction Summarization System. In Proceedings of HLT-NAACL 2003, pages 205--211, 2003. Google ScholarDigital Library
Lin Zhao, Lide Wu, Xuanjing Huang. Fudan University at DUC 2005. In Document Understanding Conference 2005. http://duc.nist.gov, 2005.Google Scholar
Satanjeev Banerjee, Ted Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing (CICLING-02). pages 136--145, 2002. Google ScholarDigital Library
Schölkopf, B., Bartlett, P. L., Smola, A., & Williamson, R. C. Support Vector Regression with Automatic Accuracy Control. In Proceedings of the 8th International Conference on Artificial Neural Networks, pages 111--116, 1998.Google Scholar
Seeger Fisher, Brian Roark. Query-Focused Summarization By Supervised Sentence Ranking and Skewed Word Distributions. In Document Understanding Conference 2006. http:/duc.nist.gov, 2006.Google Scholar
Steve R. Gunn. Support Vector Machines for Classification and Regression, Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton, 1998.Google Scholar
Tsutomu Hirao, Hideki Isozaki. Extracting Important Sentences with Support Vector Machines. Proceedings of the 19th International Conference on Computational Linguistics, pages 342--348, 2002. Google ScholarDigital Library
Vladimir Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998. Google ScholarDigital Library

Index Terms

Developing learning strategies for topic-based summarization
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis

Recommendations

Topic-driven reader comments summarization
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Readers of a news article often read its comments contributed by other readers. By reading comments, readers obtain not only complementary information about this news article but also the opinions from other readers. However, the existing ranking ...
Read More
Topic analysis for topic-focused multi-document summarization
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Topic-focused multi-document summarization has been a challenging task because the created summary is required to be biased to the given topic or query. Existing methods consider the given topic as a single coarse unit and then directly incorporate the ...
Read More
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
document summarization
support vector regression
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 739
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Developing learning strategies for topic-based summarization

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Topic-driven reader comments summarization

Topic analysis for topic-focused multi-document summarization

Research on Multi-document Summarization Based on LDA Topic Model