research-article

Free Access

A hybrid hierarchical model for multi-document summarization

Authors:
Asli Celikyilmaz

University of California, Berkeley

University of California, Berkeley
View Profile

,
Dilek Hakkani-Tur

International Computer Science Institute, Berkeley, CA

International Computer Science Institute, Berkeley, CA
View Profile

Authors Info & Claims

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational LinguisticsJuly 2010Pages 815–824

Published:11 July 2010Publication History

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Pages 815–824

ABSTRACT

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ~7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.

References

}}R. Barzilay and L. Lee. Catching the drift: Probabilistic content models with applications to generation and summarization. In In Proc. HLT-NAACL'04, 2004.Google Scholar
}}D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. In In Neural Information Processing Systems {NIPS}, 2003a.Google Scholar
}}D. Blei, T. Griffiths, and M. Jordan. The nested chinese restaurant process and bayesian non-parametric inference of topic hierarchies. In Journal of ACM, 2009. Google ScholarDigital Library
}}D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. In Jrnl. Machine Learning Research, 3:993--1022, 2003b. Google ScholarDigital Library
}}S. R. K. Branavan, H. Chen, J. Eisenstein, and R. Barzilay. Learning document-level semantic properties from free-text annotations. In Journal of Artificial Intelligence Research, volume 34, 2009. Google ScholarDigital Library
}}J. M. Conroy, J. D. Schlesinger, and D. P. O'Leary. Topic focused multi-cument summarization using an approximate oracle score. In In Proc. ACL'06, 2006. Google ScholarDigital Library
}}H. Daumé III and D. Marcu. Bayesian query focused summarization. In Proc. ACL-06, 2006. Google ScholarDigital Library
}}H. Drucker, C. J. C. Burger, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In NIPS 9, 1997.Google Scholar
}}A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In NAACL HLT-09, 2009. Google ScholarDigital Library
}}T. Joachims. Making large-scale svm learning practical. In In Advances in Kernel Methods - Support Vector Learning. MIT Press., 1999. Google ScholarDigital Library
}}C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In In Proc. ACL Workshop on Text Summarization Branches Out, 2004.Google Scholar
}}C.-Y. Lin and E. H. Hovy. Automatic evaluation of summaries using n-gram co-occurance statistics. In Proc. HLT-NAACL, Edmonton, Canada, 2003. Google ScholarDigital Library
}}C. Manning and H. Schuetze. Foundations of statistical natural language processing. In MIT Press. Cambridge, MA, 1999. Google ScholarDigital Library
}}A. Nenkova and L. Vanderwende. The impact of frequency on summarization. In Tech. Report MSR-TR-2005-101, Microsoft Research, Redwood, Washington, 2005.Google Scholar
}}D. R. Radev, H. Jing, M. Stys, and D. Tam. Centroid-based summarization for multiple documents. In In Int. Jrnl. Information Processing and Management, 2004. Google ScholarDigital Library
}}D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In Proc. IJCAI'07, 2007. Google ScholarDigital Library
}}J. Tang, L. Yao, and D. Chens. Multi-topic based query-oriented summarization. In SIAM International Conference Data Mining, 2009.Google ScholarCross Ref
}}I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In ACL-08: HLT, 2008.Google Scholar
}}K. Toutanova, C. Brockett, M. Gamon, J. Jagarlamudi, H. Suzuki, and L. Vanderwende. The phthy summarization system: Microsoft research at duc 2007. In Proc. DUC, 2007.Google Scholar
}}J. Y. Yeh, H.-R. Ke, W. P. Yang, and I-H. Meng. Text summarization using a trainable summarizer and latent semantic analysis. In Information Processing and Management, 2005. Google ScholarDigital Library

Index Terms

A hybrid hierarchical model for multi-document summarization
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More
Hybrid multi-document summarization using pre-trained language models
Abstract
Abstractive multi-document summarization is a type of automatic text summarization. It obtains information from multiple documents and generates a human-like summary from them. In this paper, we propose an abstractive multi-document ...
Highlights
- Introducing a multi-document summarizer, called HMSumm, based on pre-trained methods.
Read More
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
July 2010
1618 pages
Program Chair:
Jan Hajič
Charles University in Prague, Czech Republic
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 11 July 2010
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 834
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A hybrid hierarchical model for multi-document summarization

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model

Hybrid multi-document summarization using pre-trained language models

Latent dirichlet allocation based multi-document summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A hybrid hierarchical model for multi-document summarization

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model

Hybrid multi-document summarization using pre-trained language models

Latent dirichlet allocation based multi-document summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media