article

Cross-language headline generation for Hindi

Authors:
Bonnie Dorr

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
David Zajic

University of Maryland, College Park, MD

University of Maryland, College Park, MD
View Profile

,
Richard Schwartz

BBN Technologies, Columbia, MD

BBN Technologies, Columbia, MD
View Profile

ACM Transactions on Asian Language Information Processing Volume 2 Issue 3pp 270–289https://doi.org/10.1145/979872.979878

Published:01 September 2003Publication History

ACM Transactions on Asian Language Information Processing

Abstract

This paper presents new approaches to headline generation for English newspaper texts, with an eye toward the production of document surrogates for document selection in cross-language information retrieval. This task is difficult because the user must make decisions about relevance based on (often poor) translations of retrieved documents. To facilitate the decision-making process we need translations that can be assessed rapidly and accurately; our approach is to provide an English headline for the non-English document. We describe two approaches to headline generation and their application to the recent DARPA TIDES-2003 Surprise Language Exercise for Hindi. For comparison, we also implemented an alternative method for surrogate generation: a system that produces topic lists for (Hindi) articles. We present the results of a series of experiments comparing each of these approaches. We demonstrate in both automatic and human evaluations that our linguistically motivated approach outperforms two other surrogate-generation methods: a statistical system and a topic discovery system.

References

Bahl, L., Jelinek, F., and Mercer, R. 1983. A maximum likelihood approach to speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 2, 179--190.Google Scholar
Bangalore, S. and Rambow, O. 2000. Exploiting a probabilistic hierarchical model for generation. In COLING 2000; Proceedings of the 18th International Conference on Computational Linguistics. (Saarbrücken, Germany, July 31--Aug 4, 2000), Morgan Kaufmann, San Mateo, CA, 42--48. Google Scholar
Bikel, D., Schwartz, R., and Weischedel, R. 1999. An algorithm that learns what's in a name. Machine Learning 34, 1/3. Google Scholar
Brown, P., Cocke, J., Pietra, S., Pietra, V., Jelinek, F., Lafferty, J., Mercer, R., and Roossin, P. 1990. A statistical approach to machine translation. Computational Linguistics 16, 2, 79--85. Google Scholar
Charniak, M. 1997. Statistical parsing with a context-free grammar and word statistics. In AAAI97, IAAI97: Proceedings of the 14th National Conference on Artificial Intelligence and 9th Innovative Applications of Artificial Intelligence Conference (Providence, RI, July 27--31, 1997). AAAI Press/The MIT Press, Cambridge, MA, 598--603. Google Scholar
Chomsky, N. A. 1981. Lectures on Government and Binding. Foris Publications, Dordrecht, Holland.Google Scholar
Collins, M. 1997a. The EM Algorithm (In fulfillment of the Written Preliminary Exam II Requirement).Google Scholar
Collins, M. 1997b. Three Generative, Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and the 8th Conference of the European Chapter of the Association for Computational Linguistics (Madrid, Spain, July 7--12, 1997). Morgan Kaufmann/ACL, San Mateo, CA, 16--23. Google Scholar
Cutting, D., Pedersen, J., and Sibun, P. 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing (Trento, Italy). Google Scholar
Dunning, T. 1994. Statistical identification of language. Technical Report MCCS 94-273, New Mexico State University.Google Scholar
Edmundson, H. 1969. New methods in automatic extracting. Journal of the ACM 16, 2. Google Scholar
Gotoh, Y. and Reynolds, S. 2000. Sentence boundary detection in broadcast speech transcripts. In Proceedings of the International Speech Communication Association Workshop: Automatic Speech Recognition: Challenges for the New Millennium (Paris).Google Scholar
Johnson, F., Paice, C., Black, W., and Neal, A. 1993. The application of linguistic processing to automatic abstract generation. Journal of Document and Text Management 1, 3, 215--242.Google Scholar
Knight, K. and Marcu, D. 2000. Statistics-based summarization---step one: Sentence compression. In The 17th National Conference of the American Association for Artificial Intelligence AAAI2000 (Austin, TX). Google Scholar
Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summarizer. In Proceedings of the 18th ACM-SIGIR Conference. Google Scholar
Langkilde, I. and Knight, K. 1998. Generation that exploits corpus-based statistical knowledge. In COLING-ACL '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (Montreal, Canada, Aug. 10--14, 1998), 2 volumes. ACL/Morgan Kaufmann, 704--710. Google Scholar
Lin, C.-Y. and Hovy, E. 2003. Automatic Evaluation of Summaries Using n-Gram Co-Occurrences Statistics. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Edmonton, AB). Google Scholar
Lin, D. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING/ACL. Google Scholar
Luhn, H. 1958. The automatic creation of literature abstracts. IBM Journal of Research Development 2, 2, 159--165.Google Scholar
Mann, W. C., Matthiesen, C. M. I. M., and Thompson, S. A. 1992. Rhetorical structure theory and text analysis. In Discoure Description, W. C. Mann and S. A. Thompson, Eds. J. Benjamin Publishing, Amsterdam.Google Scholar
Mårdh, I. 1980. Headlinese: On the Grammar of English Front Page Headlines. Malmo.Google Scholar
Mays, E., Damerau, F., and Mercer, R. 1990. Context-based spelling correction. In Proceedings of IBM Natural Language ITL (France). 517--522.Google Scholar
Miller, S., Crystal, M., Fox, H., Ramshaw, L., Schwartz, R., Stone, R., and Weischedel, R. 1998. Algorithms that learn to extract information; BBN: Description of the SIFT system as used for MUC-7. In Proceedings of the 7th Message Understanding Conference (MUC-7) (Fairfax, VA, Apr. 29--May 1, 1998).Google Scholar
Miller, S., Ramshaw, L., Fox, H., and Weischedel, R. 2000. A novel use of statistical parsing to extract information from text. In Proceedings of the First Meeting of the North American Chapter of the ACL (Seattle, WA). 226--233. Google Scholar
Paice, C. and Jones, A. 1993. The Identification of important concepts in highly structured technical papers. In Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in IR. Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of Association of Computational Linguistics (Philadelphia, PA). Google Scholar
Pereira, F., Tishby, N., and Lee, L. 1993. Distributional clustering of English words. In Proceedings of 31st Annual Meeting of the Association for Computational Linguistics (Columbus, OH, June 22--26, 1993), 183--190. Google Scholar
Radev, D. R. and McKeown, K. R. 1998. Generating natural language summaries from multiple on-line sources. Computational Linguistics 24, 469--500. Google Scholar
Rooney, E. and Witte, O. 2000. Copy Editing for Professionals. Stipes Publishing Co.Google Scholar
Schwartz, R., Imai, T., Jubala, F., Nguyen, L., and Makhoul, J. 1999. A maximum likelihood model for topic classification of broadcast news. In Eurospeech-97 (Rhodes, Greece).Google Scholar
Schwartz, R., Sista, S., and Leek, T. R. 2001. Unsupervised topic discovery. In Proceedings of Workshop on Language Modeling and Information Retrieval (Pittsburgh, PA). 72--77.Google Scholar
Teufel, S. and Moens, M. 1997. Sentence extraction as a classification task. In Proceedings of the Workshop on Intelligent and scalable Text summarization, ACL/EACL (Madrid, Spain).Google Scholar
Zechner, K. 1995. Automatic Text Abstracting by Selecting Relevant Passages. M.S. thesis, Center for Cognitive Science, University of Edinburgh.Google Scholar

Index Terms

Cross-language headline generation for Hindi
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals

Recommendations

Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval
ICIA-16: Proceedings of the International Conference on Informatics and Analytics

Todays, The internet has become a source of multi-lingual content. Users are not aware of multiple languages, so the language diversity becomes a great barrier for world communication. Cross-Language Information Retrieval (CLIR) provides a solution for ...
Read More
Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation
Advances in Multilingual and Multimodal Information Retrieval

In this paper, we present our Hindi to English and Marathi to English CLIR systems developed as part of our participation in the CLEF 2007 Ad-Hoc Bilingual task. We take a query translation based approach using bi-lingual dictionaries. Query words not ...
Read More
A Neural Framework for English-Hindi Cross-Lingual Natural Language Inference
Neural Information Processing
Abstract
Recognizing Textual Entailment (RTE) between two pieces of texts is a very crucial problem in Natural Language Processing (NLP), and it adds further challenges when involving two different languages, i.e. in cross-lingual scenario. The paucity of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 2, Issue 3
September 2003
132 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/979872
Issue’s Table of Contents

Copyright © 2003 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2003
Published in talip Volume 2, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 650
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-language headline generation for Hindi

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval

Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation

A Neural Framework for English-Hindi Cross-Lingual Natural Language Inference

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cross-language headline generation for Hindi

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Exploring Bilingual Word Vectors for Hindi-English Cross-Language Information Retrieval

Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation

A Neural Framework for English-Hindi Cross-Lingual Natural Language Inference

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media