This thesis presents an automatic, incremental lexical acquisition mechanism that uses the context of example sentences to guide inference of the meanings of unknown words. The goal of this line of research is to allow a Natural Language Processing (NLP) system to cope with words that it does not know--not just to gloss over them, but to try to infer what they mean. The environment within which this system operates is epitomized by the information extraction task: from virtually unconstrained text, elicit certain information that is deemed interesting. The knowledge acquisition bottleneck inherent in this task imposes constraints on the type of knowledge available for lexical inference. The main objective in this work is to infer as much information as possible about unknown words from context without requiring special-purpose knowledge. This was accomplished by extending the underlying NLP system to search its domain-specific concept representation for an appropriate concept to denote the meaning of the unknown word. The learning method is incremental, so every time the system encounters an example of an unfamiliar word, it adjusts its hypotheses. The basic system evolved through several different stages in order to improve its inferences. Then several variations to the basic system were made to capture especially difficult aspects of the acquisition task and to take advantage of discourse context. The approach was tested in two different domains. Target words were removed from the lexica and sentences containing them were processed by the system. The results were evaluated using measures taken from the field of Information Retrieval.When humans learn language, they are faced with a similar task: from a set of examples of a word's use, they must infer what that word means and how it is used. Not only is the task similar, but many of the behaviors and difficulties that the computational acquisition mechanism have encountered have also been described in the psycholinguistic literature. Although the system was not intended as a cognitive model, these parallels indicate strong constraints from the task itself, and therefore lend credence to viewing the system as a cognitive model.
Cited By
- Ruiz-Casado M, Alfonseca E and Castells P Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia Proceedings of the 10th international conference on Natural Language Processing and Information Systems, (67-79)
- Purver M Processing unknown words in a dialogue system Proceedings of the 3rd SIGdial workshop on Discourse and dialogue - Volume 2, (174-183)
- Light M Morphological cues for lexical semantics Proceedings of the 34th annual meeting on Association for Computational Linguistics, (25-31)
- Hastings P and Lytinen S The ups and downs of lexical acquisition Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence, (754-759)
Index Terms
- Automatic acquisition of word meaning from context
Recommendations
Lexical acquisition and clustering of word senses to conceptual lexicon construction
We describe a mechanism and an algorithm to support construction of a large complex conceptual lexicon from an existing alphabetical lexicon. As part of this research, we define lexical models to present words and lexicons. Given the fact that an ...
Persian POS tagging using probabilistic morphological analysis
Part of speech (POS) tagging as a fundamental task in natural language processing (NLP) has attracted many research efforts and many taggers are developed with different approaches to reach high performance and accuracy. In many complex applications of ...
Acquisition of a New Type of Lexical-Semantic Relation from German Corpora
Proceedings of the 2008 conference on New Trends in Multimedia and Network Information SystemsIn this paper we will report on work in progress towards increasing the relational density of the German wordnet. It is also an experiment in corpus-based lexical acquisition. The source of the acquisition is a large corpus of German newspaper texts. ...