skip to main content
article

Cross-language headline generation for Hindi

Published:01 September 2003Publication History
Skip Abstract Section

Abstract

This paper presents new approaches to headline generation for English newspaper texts, with an eye toward the production of document surrogates for document selection in cross-language information retrieval. This task is difficult because the user must make decisions about relevance based on (often poor) translations of retrieved documents. To facilitate the decision-making process we need translations that can be assessed rapidly and accurately; our approach is to provide an English headline for the non-English document. We describe two approaches to headline generation and their application to the recent DARPA TIDES-2003 Surprise Language Exercise for Hindi. For comparison, we also implemented an alternative method for surrogate generation: a system that produces topic lists for (Hindi) articles. We present the results of a series of experiments comparing each of these approaches. We demonstrate in both automatic and human evaluations that our linguistically motivated approach outperforms two other surrogate-generation methods: a statistical system and a topic discovery system.

References

  1. Bahl, L., Jelinek, F., and Mercer, R. 1983. A maximum likelihood approach to speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 2, 179--190.Google ScholarGoogle Scholar
  2. Bangalore, S. and Rambow, O. 2000. Exploiting a probabilistic hierarchical model for generation. In COLING 2000; Proceedings of the 18th International Conference on Computational Linguistics. (Saarbrücken, Germany, July 31--Aug 4, 2000), Morgan Kaufmann, San Mateo, CA, 42--48. Google ScholarGoogle Scholar
  3. Bikel, D., Schwartz, R., and Weischedel, R. 1999. An algorithm that learns what's in a name. Machine Learning 34, 1/3. Google ScholarGoogle Scholar
  4. Brown, P., Cocke, J., Pietra, S., Pietra, V., Jelinek, F., Lafferty, J., Mercer, R., and Roossin, P. 1990. A statistical approach to machine translation. Computational Linguistics 16, 2, 79--85. Google ScholarGoogle Scholar
  5. Charniak, M. 1997. Statistical parsing with a context-free grammar and word statistics. In AAAI97, IAAI97: Proceedings of the 14th National Conference on Artificial Intelligence and 9th Innovative Applications of Artificial Intelligence Conference (Providence, RI, July 27--31, 1997). AAAI Press/The MIT Press, Cambridge, MA, 598--603. Google ScholarGoogle Scholar
  6. Chomsky, N. A. 1981. Lectures on Government and Binding. Foris Publications, Dordrecht, Holland.Google ScholarGoogle Scholar
  7. Collins, M. 1997a. The EM Algorithm (In fulfillment of the Written Preliminary Exam II Requirement).Google ScholarGoogle Scholar
  8. Collins, M. 1997b. Three Generative, Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics (Madrid, Spain, July 7--12, 1997). Morgan Kaufmann/ACL, San Mateo, CA, 16--23. Google ScholarGoogle Scholar
  9. Cutting, D., Pedersen, J., and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing (Trento, Italy). Google ScholarGoogle Scholar
  10. Dunning, T. 1994. Statistical identification of language. Technical Report MCCS 94-273, New Mexico State University.Google ScholarGoogle Scholar
  11. Edmundson, H. 1969. New methods in automatic extracting. Journal of the ACM 16, 2. Google ScholarGoogle Scholar
  12. Gotoh, Y. and Reynolds, S. 2000. Sentence boundary detection in broadcast speech transcripts. In Proceedings of the International Speech Communication Association Workshop: Automatic Speech Recognition: Challenges for the New Millennium (Paris).Google ScholarGoogle Scholar
  13. Johnson, F., Paice, C., Black, W., and Neal, A. 1993. The application of linguistic processing to automatic abstract generation. Journal of Document and Text Management 1, 3, 215--242.Google ScholarGoogle Scholar
  14. Knight, K. and Marcu, D. 2000. Statistics-based summarization---step one: Sentence compression. In The 17th National Conference of the American Association for Artificial Intelligence AAAI2000 (Austin, TX). Google ScholarGoogle Scholar
  15. Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summarizer. In Proceedings of the 18th ACM-SIGIR Conference. Google ScholarGoogle Scholar
  16. Langkilde, I. and Knight, K. 1998. Generation that exploits corpus-based statistical knowledge. In COLING-ACL '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (Montreal, Canada, Aug. 10--14, 1998), 2 volumes. ACL/Morgan Kaufmann, 704--710. Google ScholarGoogle Scholar
  17. Lin, C.-Y. and Hovy, E. 2003. Automatic Evaluation of Summaries Using n-Gram Co-Occurrences Statistics. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Edmonton, AB). Google ScholarGoogle Scholar
  18. Lin, D. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING/ACL. Google ScholarGoogle Scholar
  19. Luhn, H. 1958. The automatic creation of literature abstracts. IBM Journal of Research Development 2, 2, 159--165.Google ScholarGoogle Scholar
  20. Mann, W. C., Matthiesen, C. M. I. M., and Thompson, S. A. 1992. Rhetorical structure theory and text analysis. In Discoure Description, W. C. Mann and S. A. Thompson, Eds. J. Benjamin Publishing, Amsterdam.Google ScholarGoogle Scholar
  21. Mårdh, I. 1980. Headlinese: On the Grammar of English Front Page Headlines. Malmo.Google ScholarGoogle Scholar
  22. Mays, E., Damerau, F., and Mercer, R. 1990. Context-based spelling correction. In Proceedings of IBM Natural Language ITL (France). 517--522.Google ScholarGoogle Scholar
  23. Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stone, R., and Weischedel, R. 1998. Algorithms that learn to extract information; BBN: Description of the SIFT system as used for MUC-7. In Proceedings of the 7th Message Understanding Conference (MUC-7) (Fairfax, VA, Apr. 29--May 1, 1998).Google ScholarGoogle Scholar
  24. Miller, S., Ramshaw, L., Fox, H., and Weischedel, R. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of the First Meeting of the North American Chapter of the ACL (Seattle, WA). 226--233. Google ScholarGoogle Scholar
  25. Paice, C. and Jones, A. 1993. The Identification of important concepts in highly structured technical papers. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in IR. Google ScholarGoogle Scholar
  26. Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of Association of Computational Linguistics (Philadelphia, PA). Google ScholarGoogle Scholar
  27. Pereira, F., Tishby, N., and Lee, L. 1993. Distributional clustering of English words. In Proceedings of 31st Annual Meeting of the Association for Computational Linguistics (Columbus, OH, June 22--26, 1993), 183--190. Google ScholarGoogle Scholar
  28. Radev, D. R. and McKeown, K. R. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics 24, 469--500. Google ScholarGoogle Scholar
  29. Rooney, E. and Witte, O. 2000. Copy Editing for Professionals. Stipes Publishing Co.Google ScholarGoogle Scholar
  30. Schwartz, R., Imai, T., Jubala, F., Nguyen, L., and Makhoul, J. 1999. A maximum likelihood model for topic classification of broadcast news. In Eurospeech-97 (Rhodes, Greece).Google ScholarGoogle Scholar
  31. Schwartz, R., Sista, S., and Leek, T. R. 2001. Unsupervised topic discovery. In Proceedings of Workshop on Language Modeling and Information Retrieval (Pittsburgh, PA). 72--77.Google ScholarGoogle Scholar
  32. Teufel, S. and Moens, M. 1997. Sentence extraction as a classification task. In Proceedings of the Workshop on Intelligent and scalable Text summarization, ACL/EACL (Madrid, Spain).Google ScholarGoogle Scholar
  33. Zechner, K. 1995. Automatic Text Abstracting by Selecting Relevant Passages. M.S. thesis, Center for Cognitive Science, University of Edinburgh.Google ScholarGoogle Scholar

Index Terms

  1. Cross-language headline generation for Hindi

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Asian Language Information Processing
              ACM Transactions on Asian Language Information Processing  Volume 2, Issue 3
              September 2003
              132 pages
              ISSN:1530-0226
              EISSN:1558-3430
              DOI:10.1145/979872
              Issue’s Table of Contents

              Copyright © 2003 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 September 2003
              Published in talip Volume 2, Issue 3

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader