skip to main content
10.5555/2002472.2002537dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
research-article
Free Access

A class of submodular functions for document summarization

Published:19 June 2011Publication History

ABSTRACT

We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.

References

  1. F. Bach. 2010. Structured sparsity-inducing norms through submodular functions. Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  2. J. Carbonell and J. Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Celikyilmaz and D. Hakkani-tür. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 815--824, Uppsala, Sweden, July. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Daumé, J. Langford, and D. Marcu. 2009. Search-based structured prediction. Machine learning, 75(3):297--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Daumé III and D. Marcu. 2006. Bayesian query-focused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, page 312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Fellbaum. 1998. WordNet: An electronic lexical database. The MIT press.Google ScholarGoogle Scholar
  7. E. Filatova and V. Hatzivassiloglou. 2004. Event-based extractive summarization. In Proceedings of ACL Workshop on Summarization, volume 111.Google ScholarGoogle Scholar
  8. A. Haghighi and L. Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 362--370, Boulder, Colorado, June. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Jagarlamudi, P. Pingali, and V. Varma. 2006. Query independent sentence scoring approach to DUC 2006. In DUC 2006.Google ScholarGoogle Scholar
  10. S. Jegelka and J. A. Bilmes. 2011. Submodularity beyond submodular energies: coupling edges in graph cuts. In Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Kempe, J. Kleinberg, and E. Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th Conference on SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Khuller, A. Moss, and J. Naor. 1999. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Kolmogorov and R. Zabin. 2004. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):147--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Krause and C. Guestrin. 2005. Near-optimal nonmyopic value of information in graphical models. In Proc. of Uncertainty in AI.Google ScholarGoogle Scholar
  15. A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta. 2008. Robust submodular observation selection. Journal of Machine Learning Research, 9:2761--2801.Google ScholarGoogle Scholar
  16. H. Lin and J. Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. In North American chapter of the Association for Computational Linguistics/Human Language Technology Conference (NAACL/HLT-2010), Los Angeles, CA, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Lin and J. Bilmes. 2011. Word alignment via submod-ular maximization over matroids. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), Portland, OR, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Lin, J. Bilmes, and S. Xie. 2009. Graph-based submod-ular selection for extractive summarization. In Proc. IEEE Automatic Speech Recognition and Understanding (ASRU), Merano, Italy, December.Google ScholarGoogle Scholar
  19. C.-Y. Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop.Google ScholarGoogle Scholar
  20. L. Lovász. 1983. Submodular functions and convexity. Mathematical programming-The state of the art, (eds. A. Bachem, M. Grotschel and B. Korte) Springer, pages 235--257.Google ScholarGoogle Scholar
  21. M. Minoux. 1978. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, pages 234--243.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Narasimhan and J. Bilmes. 2005. A submodular-supermodular procedure with applications to discriminative structure learning. In Proc. Conf. Uncertainty in Artifical Intelligence, Edinburgh, Scotland, July. Morgan Kaufmann Publishers.Google ScholarGoogle Scholar
  23. M. Narasimhan and J. Bilmes. 2007. Local search for balanced submodular clusterings. In Twentieth International Joint Conference on Artificial Intelligence (IJ-CAI07), Hyderabad, India, January. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Narayanan. 1997. Submodular functions and electrical networks. North-Holland.Google ScholarGoogle Scholar
  25. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions I. Mathematical Programming, 14(1):265--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Pingali, K. Rahul, and V. Varma. 2007. IIIT Hyderabad at DUC 2007. Proceedings of DUC 2007.Google ScholarGoogle Scholar
  27. V. Qazvinian, D. R. Radev, and A. Ozgür. 2010. Citation Summarization Through Keyphrase Extraction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 895--903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In EMNLP, volume 1, pages 133--142.Google ScholarGoogle Scholar
  29. K. Riedhammer, B. Favre, and D. Hakkani-Tür. 2010. Long story short-Global unsupervised models for keyphrase based meeting summarization. Speech Communication. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Shen and T. Li. 2010. Multi-document summarization via the minimum dominating set. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 984--992, Beijing, China, August. Coling 2010 Organizing Committee. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Sviridenko. 2004. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32(1):41--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Takamura and M. Okumura. 2009. Text summarization model based on maximum coverage problem and its variant. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 781--789. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Toutanova, C. Brockett, M. Gamon, J. Jagarlamudi, H. Suzuki, and L. Vanderwende. 2007. The PYTHY summarization system: Microsoft research at DUC 2007. In the proceedings of Document Understanding Conference.Google ScholarGoogle Scholar
  34. D. Wang, S. Zhu, T. Li, and Y. Gong. 2009. Multi-document summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 297--300, Suntec, Singapore, August. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. A. Wolsey. 1982. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385--393.Google ScholarGoogle ScholarCross RefCross Ref
  36. P. Zhao, G. Rocha, and B. Yu. 2009. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A):3468--3497.Google ScholarGoogle Scholar

Index Terms

  1. A class of submodular functions for document summarization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
        June 2011
        1696 pages
        ISBN:9781932432879

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 19 June 2011

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate240of768submissions,31%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader