skip to main content
Semantic disambiguation of wordnet glosses and lexical chains on extended wordnet
Publisher:
  • University of Texas at Dallas
  • Computer Science P.O. Box 688 Richardson, TX
  • United States
ISBN:978-0-542-74685-7
Order Number:AAI3224360
Pages:
187
Bibliometrics
Skip Abstract Section
Abstract

This dissertation presents the work done for the semantic disambiguation of WordNet glosses and the improvement of the performance of a Question Answering system using lexical chains derived with WordNet relations.

The glosses were preprocessed before disambiguation. This preprocessing step consists of separation of definitions from examples, tokenization, part of speech tagging and compound concept identification. Some words were manually disambiguated to create a set of 3196 gold-standard glosses.

The disambiguation relies on a set of heuristics using WordNet semantic relations, WordNet glosses, SemCor corpus, domain labels and recurring patterns. When tested, the results were: disambiguation precision of 73% without monosemous words and 86% with monosemous words on the set of 3196 goldstandard glosses. An error analysis of these methods was performed in order to get an insight of the disambiguation process. An SVM classifier trained on SemCor was added to the set of disambiguation methods. The methods were combined using different techniques. The best one was based on C4.5 rules. These enhancements resulted in a precision of 75% without monosemous words and 88% with monosemous words on the set of 3196 goldstandard glosses.

Lexical chains were derived using existing semantic relations from WordNet and its glosses. Using these relations an algorithm to find semantically related concepts was designed. This algorithm assigns weights to the neightbors of a target concept up to a certain distance. These weights were used to order related concepts. It was found that 33% of the questions in TREC 2001 had answers formulated with the words in the top 20 concepts given by the algorithm.

In a second experiment, two algorithms for propagating verb arguments along lexical chains were used to improve the performance of a Question Answering system: an algorithm propagating syntactic constraints and an algorithm propagating semantic constraints. Lexical chains were derived between words in the questions and words in the correct answers. It was found that for 33 (14.3%) factoid questions from the TREC 2004, lexical chains contained only relations that propagate verb arguments. The algorithm for propagating verb arguments with syntactic constraints improved the system by 1.9% and the one for propagating arguments with semantic constraints improved the system by 2.4%.

Contributors
  • The University of Texas at Dallas
  • The University of Texas at Dallas

Recommendations