ABSTRACT
We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidance of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest. We describe a method for discovering these patterns and suggest that other lexical relations will also be acquirable in this way. A subset of the acquisition algorithm is implemented and the results are used to augment and critique the structure of a large hand-built thesaurus. Extensions and applications to areas such as information retrieval are suggested.
- Ahlswede, T. & M. Evens (1988). Parsing vs. text processing in the analysis of dictionary definitions. Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics, pages 217--224. Google ScholarDigital Library
- Alshawi, H. (1987). Processing dictionary definitions with phrasal pattern hierarchies. American Journal of Computational Linguistics, 13(3):195--202. Google ScholarDigital Library
- Batali, J. (1991). Automatic Acquisition and Use of Some of the Knowledge in Physics Texts. PhD thesis, Massachusetts Institute of Technology, Artificial Intelligence Laboratory. Google ScholarDigital Library
- Brent, M. R. (1991). Automatic acquisition of subcategorization frames from untagged, free-text corpora. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Calzolari, N. & R. Bindi (1990). Acquisition of lexical information from a large textual italian corpus. In Proceedings of the Thirteenth International Conference on Computational Linguistics, Helsinki. Google ScholarDigital Library
- Coates-Stephens, S. (1991). Coping with lexical inadequacy - the automatic acquisition of proper nouns from news text. In The Proceedings of the 7th Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora, pages 154--169, Oxford.Google Scholar
- Cutting, D., J. Kupiec, J. Pedersen, & P. Sibun (1991). A practical part-of-speech tagger. Submitted to The 3rd Conference on Applied Natural Language Processing. Google ScholarDigital Library
- Grolier (1990). Academic American Encyclopedia Grolier Electronic Publishing, Danbury, Connecticut.Google Scholar
- Hearst, M. A. (1991). Noun homograph disambiguation using local context in large text corpora. In The Proceedings of the 7th Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora, Oxford.Google Scholar
- Hindle, D. (1990). Noun classification from predicate-argument structures. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pages 268--275. Google ScholarDigital Library
- Jacobs, P. & U. Zernik (1988). Acquiring lexical knowledge from text: A case study. In Proceedings of AAAI88, pages 739--744.Google Scholar
- Jensen, K. & J.-L. Binot (1987). Disambiguating prepositional phrase attachments by using online dictionary definitions. American Journal of Computational Linguistics, 13(3):251--260. Google ScholarDigital Library
- Markowitz, J., T. Ahlswede, & M. Evens (1986). Semantically significant patterns in dictionary definitions. Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, pages 112--119. Google ScholarDigital Library
- Miller, G. A., R. Beckwith, C. Fellbaum, D. Gross, & K. J. Miller (1990). Introduction to wordnet: An on-line lexical database. Journal of Lexicography, 3(4):235--244.Google ScholarCross Ref
- Morris, J. & G. Hirst (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1):21--48. Google ScholarDigital Library
- Nakamura, J. & M. Nagao (1988). Extraction of semantic information from an ordinary english dictionary and its evaluation. In Proceedings of the Twelfth International Conference on Computational Linguistics, pages 459--464, Budapest. Google ScholarDigital Library
- Smadja, F. A. & K. R. McKeown (1990). Automatically extracting and representing collocations for language generation. Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pages 252--259. Google ScholarDigital Library
- Velardi, P. & M. T. Pazienza (1989). Computer aided interpretation of lexical cooccurrences. Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, pages 185--192. Google ScholarDigital Library
- Wilks, Y. A., D. C. Fass, C. ming Guo, J. E. McDonald, T. Plate, & B. M. Slator (1990). Providing machine tractable dictionary tools. Journal of Machine Translation, 2.Google ScholarCross Ref
- Automatic acquisition of hyponyms from large text corpora
Recommendations
Word sense acquisition from bilingual comparable corpora
NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1Manually constructing an inventory of word senses has suffered from problems including high cost, arbitrary assignment of meaning to words, and mismatch to domains. To overcome these problems, we propose a method to assign word meaning from a bilingual ...
Acquiring paraphrases from text corpora
K-CAP '09: Proceedings of the fifth international conference on Knowledge captureParaphrases are textual expressions that convey the same meaning using different surface forms. Capturing the variability of language, they play an important role in many natural language applications includ ing question answering, machine translation, ...
Comments