ABSTRACT
Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the the data representation choice has a minor influence on chunking performance. However, equipped with the most suitable data representation, our memory-based learning chunker was able to improve the best published chunking results for a standard data set.
- Steven Abney. 1991. Parsing by chunks. In Principle-Based Parsing. Kluwer Academic Publishers.Google Scholar
- Shlomo Argamon, Ido Dagan, and Yuval Krymolowski. 1998. A memory-based approach to learning shallow natural language patterns. In Proceedings of the 17th International Conference on Computational Linguistics (COLING-ACL '98). Google ScholarDigital Library
- Claire Cardie and David Pierce. 1998. Error-driven pruning of treebank grammars for base noun phrase identification. In Proceedings of the 17th International Conference on Computational Linguistics (COLING-ACL '98). Google ScholarDigital Library
- Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch. 1998. TiMBL: Tilburg Memory Based Learner - version 1.0 - Reference Guide. ILK, Tilburg University, The Netherlands. http://ilk.kub.nl/~ilk/papers/ilk9803.ps.gz.Google Scholar
- Walter Daelemans, Antal van den Bosch, and Jakub Zavrel. 1999. Forgetting exceptions is harmful in language learning. Machine Learning, 11. Google ScholarDigital Library
- Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the Third ACL Workshop on Very Large Corpora.Google Scholar
- Adwait Ratnaparkhi. 1998. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis Computer and Information Science, University of Pennsylvania. Google ScholarDigital Library
- Jorn Veenstra. 1998. Fast np chunking using memory-based learning techniques. In BENELEARN-98: Proceedings of the Eigth Belgian-Dutch Conference on Machine Learning. ATO-DLO, Wageningen, report 352.Google Scholar
- Representing text chunks
Recommendations
Word alignment of English-Chinese bilingual corpus based on chunks
EMNLP '00: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13In this paper, a method for the word alignment of English-Chinese corpus based on chunks is proposed. The chunks of English sentences are identified firstly. Then the chunk boundaries of Chinese sentences are predicted by the translations of English ...
English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks
COLING '00: Proceedings of the 18th conference on Computational linguistics - Volume 1We present in this paper the method of English-to-Korean (E-K) transliteration and back-transliteration. In Korean technical documents, many English words are transliterated into Korean words in various forms in diverse ways. As English words and Korean ...
Caching multidimensional queries using chunks
SIGMOD '98: Proceedings of the 1998 ACM SIGMOD international conference on Management of dataCaching has been proposed (and implemented) by OLAP systems in order to reduce response times for multidimensional queries. Previous work on such caching has considered table level caching and query level caching. Table level caching is more suitable ...
Comments