research-article

Free Access

A class of submodular functions for document summarization

Authors:
Hui Lin

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
Jeff Bilmes

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1June 2011Pages 510–520

Published:19 June 2011Publication History

HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Pages 510–520

ABSTRACT

We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.

References

F. Bach. 2010. Structured sparsity-inducing norms through submodular functions. Advances in Neural Information Processing Systems.Google Scholar
J. Carbonell and J. Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR. Google ScholarDigital Library
A. Celikyilmaz and D. Hakkani-tür. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 815--824, Uppsala, Sweden, July. Association for Computational Linguistics. Google ScholarDigital Library
H. Daumé, J. Langford, and D. Marcu. 2009. Search-based structured prediction. Machine learning, 75(3):297--325. Google ScholarDigital Library
H. Daumé III and D. Marcu. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, page 312. Google ScholarDigital Library
C. Fellbaum. 1998. WordNet: An electronic lexical database. The MIT press.Google Scholar
E. Filatova and V. Hatzivassiloglou. 2004. Event-based extractive summarization. In Proceedings of ACL Workshop on Summarization, volume 111.Google Scholar
A. Haghighi and L. Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 362--370, Boulder, Colorado, June. Association for Computational Linguistics. Google ScholarDigital Library
J. Jagarlamudi, P. Pingali, and V. Varma. 2006. Query independent sentence scoring approach to DUC 2006. In DUC 2006.Google Scholar
S. Jegelka and J. A. Bilmes. 2011. Submodularity beyond submodular energies: coupling edges in graph cuts. In Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, June. Google ScholarDigital Library
D. Kempe, J. Kleinberg, and E. Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th Conference on SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarDigital Library
S. Khuller, A. Moss, and J. Naor. 1999. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45. Google ScholarDigital Library
V. Kolmogorov and R. Zabin. 2004. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):147--159. Google ScholarDigital Library
A. Krause and C. Guestrin. 2005. Near-optimal nonmyopic value of information in graphical models. In Proc. of Uncertainty in AI.Google Scholar
A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta. 2008. Robust submodular observation selection. Journal of Machine Learning Research, 9:2761--2801.Google Scholar
H. Lin and J. Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. In North American chapter of the Association for Computational Linguistics/Human Language Technology Conference (NAACL/HLT-2010), Los Angeles, CA, June. Google ScholarDigital Library
H. Lin and J. Bilmes. 2011. Word alignment via submod-ular maximization over matroids. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), Portland, OR, June. Google ScholarDigital Library
H. Lin, J. Bilmes, and S. Xie. 2009. Graph-based submod-ular selection for extractive summarization. In Proc. IEEE Automatic Speech Recognition and Understanding (ASRU), Merano, Italy, December.Google Scholar
C.-Y. Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop.Google Scholar
L. Lovász. 1983. Submodular functions and convexity. Mathematical programming-The state of the art, (eds. A. Bachem, M. Grotschel and B. Korte) Springer, pages 235--257.Google Scholar
M. Minoux. 1978. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, pages 234--243.Google ScholarCross Ref
M. Narasimhan and J. Bilmes. 2005. A submodular-supermodular procedure with applications to discriminative structure learning. In Proc. Conf. Uncertainty in Artifical Intelligence, Edinburgh, Scotland, July. Morgan Kaufmann Publishers.Google Scholar
M. Narasimhan and J. Bilmes. 2007. Local search for balanced submodular clusterings. In Twentieth International Joint Conference on Artificial Intelligence (IJ-CAI07), Hyderabad, India, January. Google ScholarDigital Library
H. Narayanan. 1997. Submodular functions and electrical networks. North-Holland.Google Scholar
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions I. Mathematical Programming, 14(1):265--294.Google ScholarDigital Library
P. Pingali, K. Rahul, and V. Varma. 2007. IIIT Hyderabad at DUC 2007. Proceedings of DUC 2007.Google Scholar
V. Qazvinian, D. R. Radev, and A. Ozgür. 2010. Citation Summarization Through Keyphrase Extraction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 895--903. Google ScholarDigital Library
A. Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In EMNLP, volume 1, pages 133--142.Google Scholar
K. Riedhammer, B. Favre, and D. Hakkani-Tür. 2010. Long story short-Global unsupervised models for keyphrase based meeting summarization. Speech Communication. Google ScholarDigital Library
C. Shen and T. Li. 2010. Multi-document summarization via the minimum dominating set. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 984--992, Beijing, China, August. Coling 2010 Organizing Committee. Google ScholarDigital Library
M. Sviridenko. 2004. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32(1):41--43. Google ScholarDigital Library
H. Takamura and M. Okumura. 2009. Text summarization model based on maximum coverage problem and its variant. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 781--789. Association for Computational Linguistics. Google ScholarDigital Library
K. Toutanova, C. Brockett, M. Gamon, J. Jagarlamudi, H. Suzuki, and L. Vanderwende. 2007. The PYTHY summarization system: Microsoft research at DUC 2007. In the proceedings of Document Understanding Conference.Google Scholar
D. Wang, S. Zhu, T. Li, and Y. Gong. 2009. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 297--300, Suntec, Singapore, August. Association for Computational Linguistics. Google ScholarDigital Library
L. A. Wolsey. 1982. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385--393.Google ScholarCross Ref
P. Zhao, G. Rocha, and B. Yu. 2009. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A):3468--3497.Google Scholar

Index Terms

A class of submodular functions for document summarization
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Latent Dirichlet learning for document summarization
ICASSP '09: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet ...
Read More
Multi-document summarisation using feature distribution analysis

Recently, opinion documents have been growing rapidly in an environment where anyone can express an opinion on the internet or SNS. This situation requires an automatic summarisation technique in order to understand the contents of large-scale opinion ...
Read More
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
June 2011
1696 pages
ISBN:9781932432879
General Chair:
Dekang Lin
Google
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 19 June 2011
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate240of768submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 123
  Total Citations
  View Citations
- 1,696
  Total Downloads
- Downloads (Last 12 months)70
- Downloads (Last 6 weeks)24
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A class of submodular functions for document summarization

HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent Dirichlet learning for document summarization

Multi-document summarisation using feature distribution analysis

Latent dirichlet allocation based multi-document summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A class of submodular functions for document summarization

HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent Dirichlet learning for document summarization

Multi-document summarisation using feature distribution analysis

Latent dirichlet allocation based multi-document summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media