Broad-coverage hierarchical word sense disambiguation

January 2005

Author:
Massimiliano Ciaramita
Brown University
,
Director:
Mark Johnson
Brown University

Publisher:

Brown University
Department of Computer Science Box 1910 Providence, RI
United States

ISBN:978-0-542-12704-5

Order Number:AAI3174590

Pages:

138

Purchase on ProQuest

Bibliometrics

Abstract

In naturally occurring language, hearers and readers are faced with large numbers of “ambiguous” words, i.e., words with multiple senses, and “unknown” words, i.e., words they are encountering for the first time and could not be in their lexicon. Ambiguous and unknown words seem to cause little difficulty for humans, who infer their syntactic and semantic properties on the fly to resolve ambiguities, and incorporate unknown words into their lexicons. Ambiguous and unknown words also pose problems for dictionary-based approaches in natural language processing applications. To use the information contained in the dictionary it is necessary to associate each word in the text that is being processed with one of the senses or concepts defined in the dictionary. If the word is ambiguous, it is necessary to identify the intended sense among the possible senses of the word, if it is unknown it is necessary to assign the word to one among all possible senses defined by the dictionary. The acquisition of unknown words can be seen as a disambiguation task in which the possible senses are all senses listed in the dictionary. In this thesis we formulate a single unified approach for learning unknown words, and performing word sense disambiguation. We focus on nouns but our method can be generalized to verbs and other syntactic categories. We propose a broad-coverage method which can be applied to any kind of text. We frame this problem as a pattern classification task. Each ambiguous or unknown word is classified as belonging to one of the existing concepts on the basis of morphological, syntactic and semantic properties of the contexts in which it appears. Our system takes as input an existing dictionary, which defines a hierarchy of concepts, and a corpus of textual data, and disambiguates all nouns in the corpus. We demonstrate this by disambiguating all nouns in a 40 million words collection of newspaper articles. We present empirical results from experiments carried out also with novel multi-level classification techniques, which exploit generalizations that hold at different levels of the concept hierarchy.

Contributors

Massimiliano Ciaramita
Google Switzerland GmbH
- Publication Years2000 - 2018
- Publication counts36
- Citation count850
- Available for Download28
- Downloads (cumulative)13,091
- Downloads (12 months)546
- Downloads (6 weeks)93
- Average Downloads per Article468
- Average Citation per Article24
View Full Profile
Mark Edward Johnson
Macquarie University
- Publication Years1985 - 2018
- Publication counts89
- Citation count1,483
- Available for Download66
- Downloads (cumulative)23,996
- Downloads (12 months)1,282
- Downloads (6 weeks)177
- Average Downloads per Article364
- Average Citation per Article17
View Full Profile

Index Terms

Broad-coverage hierarchical word sense disambiguation
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Discourse, dialogue and pragmatics
      2. Language resources
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
      2. Unsupervised learning
        Cluster analysis
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

An unsupervised method for word sense disambiguation
Abstract
Word sense disambiguation (WSD) finds the actual meaning of a word according to its context. This paper presents a novel WSD method to find the correct sense of a word present in a sentence. The proposed method uses both the WordNet ...
Read More
A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Word Sense Disambiguation (WSD) aims to automatically predict the correct sense of a word used in a given context. All human languages exhibit word sense ambiguity, and resolving this ambiguity can be difficult. Standard benchmark resources are required ...
Read More
Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns ...
Read More

Comments

Browse Theses

Sections

Index Terms

An unsupervised method for word sense disambiguation

A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Sections

Save to Binder

Index Terms

Recommendations

An unsupervised method for word sense disambiguation

A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora