This dissertation presents the work done for the semantic disambiguation of WordNet glosses and the improvement of the performance of a Question Answering system using lexical chains derived with WordNet relations.
The glosses were preprocessed before disambiguation. This preprocessing step consists of separation of definitions from examples, tokenization, part of speech tagging and compound concept identification. Some words were manually disambiguated to create a set of 3196 gold-standard glosses.
The disambiguation relies on a set of heuristics using WordNet semantic relations, WordNet glosses, SemCor corpus, domain labels and recurring patterns. When tested, the results were: disambiguation precision of 73% without monosemous words and 86% with monosemous words on the set of 3196 goldstandard glosses. An error analysis of these methods was performed in order to get an insight of the disambiguation process. An SVM classifier trained on SemCor was added to the set of disambiguation methods. The methods were combined using different techniques. The best one was based on C4.5 rules. These enhancements resulted in a precision of 75% without monosemous words and 88% with monosemous words on the set of 3196 goldstandard glosses.
Lexical chains were derived using existing semantic relations from WordNet and its glosses. Using these relations an algorithm to find semantically related concepts was designed. This algorithm assigns weights to the neightbors of a target concept up to a certain distance. These weights were used to order related concepts. It was found that 33% of the questions in TREC 2001 had answers formulated with the words in the top 20 concepts given by the algorithm.
In a second experiment, two algorithms for propagating verb arguments along lexical chains were used to improve the performance of a Question Answering system: an algorithm propagating syntactic constraints and an algorithm propagating semantic constraints. Lexical chains were derived between words in the questions and words in the correct answers. It was found that for 33 (14.3%) factoid questions from the TREC 2004, lexical chains contained only relations that propagate verb arguments. The algorithm for propagating verb arguments with syntactic constraints improved the system by 1.9% and the one for propagating arguments with semantic constraints improved the system by 2.4%.
Index Terms
- Semantic disambiguation of wordnet glosses and lexical chains on extended wordnet
Recommendations
Annotating words using wordnet semantic glosses
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IVAn approach to the word sense disambiguation (WSD) relaying on the WordNet synsets is proposed. The method uses semantically tagged glosses to perform a process similar to the spreading activation in semantic network, creating ranking of the most ...
Data-driven synset induction and disambiguation for wordnet development
Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and ...
Building Synsets for Indonesian WordNet with Monolingual Lexical Resources
IALP '10: Proceedings of the 2010 International Conference on Asian Language ProcessingThis paper presents an approach to build synsets for Indonesian Word Net semi-automatically using monolingual lexical resources available freely in Bahasa Indonesia. Monolingual lexical resources refer to Kamus Besar Bahasa Indoensia or KBBI (...