skip to main content
10.1145/2644866.2644869acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Truncation: all the news that fits we'll print

Published:16 September 2014Publication History

ABSTRACT

A news article generally contains a high-level overview of the facts early on, followed by paragraphs of more detailed information. This structure allows copy editors to truncate the latter paragraphs of an article in order to satisfy space limitations without losing critical information. Existing approaches to this problem of automatic multi-article layout focus exclusively on maximizing content and aesthetics. However, no algorithm can determine how "good" a truncation point is based on the semantic content, or article readability. Yet, disregarding the semantic information within the article can lead to either overly aggressive cutting, thereby eliminating key content and potentially confusing the reader; conversely, it may set too generous of a truncation point, thus leaving in superfluous content and making automatic layout more difficult. This is one of the remaining challenges on the path from manual layouts to fully automated processes with high quality output. In this work, we present a new semantic-focused approach to rate the quality of a truncation point. We built models based on results from an extensive user study on over 700 news articles. Further results show that existing techniques over-cut content. We demonstrate the layout impact through a second evaluation that implements our models in the first layout approach that integrates both layout and semantic quality. The primary contribution of this work is the demonstration that semantic-based modeling is critical for high-quality automated document synthesis within a real-world context.

References

  1. I. Ahmadullin and N. Damera-Venkata. Hierarchical probabilistic model for news composition. In DocEng, page 141, New York, New York, USA, Sept. 2013. ACM Request Permissions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. J. Badros, A. Borning, and P. J. Stuckey. The Cassowary linear arithmetic constraint solving algorithm. TOCHI, 8(4 (Dec)):267--306, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Batista and D. Oliverira. Two algorithms for automatic document page layout. In DocEng, page 141, New York, New York, USA, Sept. 2008. ACM Request Permissions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Beeferman, A. Berger, and J. Lafferty. Statistical Models for Text Segmentation. Machine learning, 34(1-3):177--210, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Brüggemann-Klein, R. Klein, and S. Wohlfeil. On the pagination of complex documents. Lecture Notes in Computer Science, 2598:49--68, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Chua and S. Asur. Automatic Summarization of Events From Social Media. In ICWSM, 2013.Google ScholarGoogle Scholar
  8. P. Ciancarini, A. Di Iorio, L. Furini, and F. Vitali. High-quality pagination for publishing. Software - Practice & Experience, 42(6), June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Damera-Venkata, J. Bento, and E. O'Brien-Strain. Probabilistic document model for automated document composition. In DocEng, page 3, New York, New York, USA, Sept. 2011. ACM Request Permissions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. P. Edmundson. New Methods in Automatic Extracting. Journal of the ACM (JACM, 16(2), Apr. 1969. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Erkan and D. R. Radev. LexRank: Graph-based lexical centrality as salience in text summarization. J Artif Intell Res(JAIR), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Fiszman, T. C. Rindflesch, and H. Kilicoglu. Abstraction summarization for managing the biomedical research literature. pages 76--83, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Giannetti. An exploratory mapping strategy for web-driven magazines. In Proceeding of the eighth ACM symposium, pages 223--229, New York, New York, USA, 2008. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. pages 362--370, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Hailpern, N. Damera-Venkata, and M. Danilevsky. Pagination: It's what you say, not how long it takes to say it. In DocENG. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hailpern and B. A. Huberman. Echo: the editor's wisdom with the elegance of a magazine. In EICS. ACM Request Permissions, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. A. Hearst. TextTiling: segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1), Mar. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Hurst, W. Li, and K. Marriott. Review of automatic document formatting. In DocEng, page 99, New York, New York, USA, Sept. 2009. ACM Request Permissions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Jacobs, W. Li, E. Schrier, D. Bargeron, and D. Salesin. Adaptive grid-based document layout. SIGGRAPH, 22(3):838--847, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Jamil, J. Mueller, C. Lutteroth, and G. Weber. Extending Linear Relaxation for User Interface Layout. In ICTAI. IEEE Computer Society, Nov. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. Kastner and C. Monz. Automatic single-document key fact extraction from newswire articles. In EACL. Association for Computational Linguistics, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Katragadda, P. Pingali, and V. Varma. Sentence position revisited: a robust light-weight update summarization baseline algorithm. pages 46--52, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C.-Y. Lin and E. Hovy. Identifying topics by position. In ANCL. Association for Computational Linguistics, Mar. 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Liu, E. Wagner, and L. Birnbaum. Compare & contrast: using the web to discover comparable cases for news stories. In WWW, page 541, New York, New York, USA, May 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Lutteroth, R. Strandh, and G. Weber. Domain Specific High-Level Constraints for User Interface Layout. Constraints, 13(3), Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In EMNLP, 2004.Google ScholarGoogle Scholar
  27. A. Nenkova. Automatic text summarization of newswire: lessons learned from the document understanding conference. In AAAI. AAAI Press, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Nenkova and L. Vanderwende. The impact of frequency on summarization. Technical Report MSR-TR-2005-101, Microsoft Research, 2005.Google ScholarGoogle Scholar
  29. L. Pevzner and M. A. Hearst. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Schütze and C. Silverstein. Projections for efficient document clustering. In SIGIR, pages 74--81, New York, New York, USA, Dec. 1997. ACM Request Permissions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Scoditti and W. Stuerzlinger. A new layout method for graphical user interfaces. In TIC-STH, pages 642--647. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  32. Y. Seki, K. Eguchi, and N. Kando. Compact Summarization for Mobile Phones. Mobile and Ubiquitous Information Access, 2954(Chapter 13):172--186, 2004.Google ScholarGoogle Scholar
  33. J. Seo and W. B. Croft. Unsupervised estimation of dirichlet smoothing parameters. In SIGIR '10, pages 759--760, New York, New York, USA, 2010. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. A. van Dijk. News as discourse. Lawrence Erlbaum Associates, Inc, 1988.Google ScholarGoogle Scholar
  35. T. Weninger, W. H. Hsu, and J. Han. CETR: content extraction via tag ratios. WWW 2010, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. C. Yang and F. L. Wang. Automatic summarization of nancial news delivery on mobile devices. In WWW'03, 2003.Google ScholarGoogle Scholar
  37. C. C. Yang and F. L. Wang. Hierarchical summarization of large documents. J. of the American Society for Information Science and Technology, 59(6), Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Zeidler, J. Müller, C. Lutteroth, and G. Weber. Comparing the usability of grid-bag and constraint-based layouts. In OzCHI, pages 674--682, New York, New York, USA, Nov. 2012. ACM Request Permissions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Zhai. Statistical Language Models for Information Retrieval. Morgan & Claypool Publishers, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Truncation: all the news that fits we'll print

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DocEng '14: Proceedings of the 2014 ACM symposium on Document engineering
        September 2014
        226 pages
        ISBN:9781450329491
        DOI:10.1145/2644866

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 September 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        DocEng '14 Paper Acceptance Rate15of41submissions,37%Overall Acceptance Rate178of537submissions,33%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader