skip to main content
10.5555/1858681.1858765dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

A hybrid hierarchical model for multi-document summarization

Published:11 July 2010Publication History

ABSTRACT

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ~7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.

References

  1. }}R. Barzilay and L. Lee. Catching the drift: Probabilistic content models with applications to generation and summarization. In In Proc. HLT-NAACL'04, 2004.Google ScholarGoogle Scholar
  2. }}D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. In In Neural Information Processing Systems {NIPS}, 2003a.Google ScholarGoogle Scholar
  3. }}D. Blei, T. Griffiths, and M. Jordan. The nested chinese restaurant process and bayesian non-parametric inference of topic hierarchies. In Journal of ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. In Jrnl. Machine Learning Research, 3:993--1022, 2003b. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}S. R. K. Branavan, H. Chen, J. Eisenstein, and R. Barzilay. Learning document-level semantic properties from free-text annotations. In Journal of Artificial Intelligence Research, volume 34, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}J. M. Conroy, J. D. Schlesinger, and D. P. O'Leary. Topic focused multi-cument summarization using an approximate oracle score. In In Proc. ACL'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}H. Daumé III and D. Marcu. Bayesian query focused summarization. In Proc. ACL-06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}H. Drucker, C. J. C. Burger, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In NIPS 9, 1997.Google ScholarGoogle Scholar
  9. }}A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In NAACL HLT-09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}T. Joachims. Making large-scale svm learning practical. In In Advances in Kernel Methods - Support Vector Learning. MIT Press., 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In In Proc. ACL Workshop on Text Summarization Branches Out, 2004.Google ScholarGoogle Scholar
  12. }}C.-Y. Lin and E. H. Hovy. Automatic evaluation of summaries using n-gram co-occurance statistics. In Proc. HLT-NAACL, Edmonton, Canada, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}C. Manning and H. Schuetze. Foundations of statistical natural language processing. In MIT Press. Cambridge, MA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}A. Nenkova and L. Vanderwende. The impact of frequency on summarization. In Tech. Report MSR-TR-2005-101, Microsoft Research, Redwood, Washington, 2005.Google ScholarGoogle Scholar
  15. }}D. R. Radev, H. Jing, M. Stys, and D. Tam. Centroid-based summarization for multiple documents. In In Int. Jrnl. Information Processing and Management, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In Proc. IJCAI'07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}J. Tang, L. Yao, and D. Chens. Multi-topic based query-oriented summarization. In SIAM International Conference Data Mining, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  18. }}I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In ACL-08: HLT, 2008.Google ScholarGoogle Scholar
  19. }}K. Toutanova, C. Brockett, M. Gamon, J. Jagarlamudi, H. Suzuki, and L. Vanderwende. The phthy summarization system: Microsoft research at duc 2007. In Proc. DUC, 2007.Google ScholarGoogle Scholar
  20. }}J. Y. Yeh, H.-R. Ke, W. P. Yang, and I-H. Meng. Text summarization using a trainable summarizer and latent semantic analysis. In Information Processing and Management, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A hybrid hierarchical model for multi-document summarization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
        July 2010
        1618 pages
        • Program Chair:
        • Jan Hajič

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 11 July 2010

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate85of443submissions,19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader