Article

Free Access

Automatic retrieval and clustering of similar words

Author:
Dekang Lin

University of Manitoba, Winnipeg, Manitoba, Canada

University of Manitoba, Winnipeg, Manitoba, Canada
View Profile

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2August 1998Pages 768–774https://doi.org/10.3115/980691.980696

Published:10 August 1998Publication History

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2

Pages 768–774

ABSTRACT

Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurus is significantly closer to WordNet than Roget Thesaurus is.

References

Hiyan Alshawi and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics, 20(4): 635--648, December. Google ScholarDigital Library
Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of ACL-93, pages 164--171, Columbus, Ohio, June. Google ScholarDigital Library
Ido Dagan, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting of the ACL, pages 272--278, Las Cruces, NM. Google ScholarDigital Library
Ido Dagan, Lillian Lee, and Fernando Pereira. 1997. Similarity-based method for word sense disambiguation. In Proceedings of the 35th Annual Meeting of the ACL, pages 56--63, Madrid, Spain. Google ScholarDigital Library
Ute Essen and Volker Steinbiss. 1992. Cooccurrence smoothing for stochastic language modeling. In Proceedings of ICASSP, volume 1, pages 161--164.Google ScholarCross Ref
W. B. Frakes and R. Baeza-Yates, editors. 1992. Information Retrieval, Data Structure and Algorithms. Prentice Hall. Google ScholarDigital Library
D. Gentner. 1982. Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S. A. Kuczaj, editor, Language development: Vol. 2. Language, thought, and culture, pages 301--334. Erlbaum, Hillsdale, NJ.Google Scholar
Gregory Grefenstette. 1994. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, MA. Google ScholarDigital Library
Donald Hindle. 1990. Noun classification from predicate-argument structures. In Proceedings of ACL-90, pages 268--275, Pittsburg, Pennsylvania, June. Google ScholarDigital Library
Dekang Lin. 1993. Principle-based parsing without overgeneration. In Proceedings of ACL-93, pages 112--120, Columbus, Ohio. Google ScholarDigital Library
Dekang Lin. 1994. Principar---an efficient, broad-coverage, principle-based parser. In Proceedings of COLING-94, pages 482--488. Kyoto, Japan. Google ScholarDigital Library
Dekang Lin. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In Proceedings of ACL/EACL-97, pages 64--71, Madrid, Spain, July. Google ScholarDigital Library
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--244.Google ScholarCross Ref
George A. Miller. 1990. WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--312.Google ScholarCross Ref
Eugene A. Nida. 1975. Componential Analysis of Meaning. The Hague, Mouton.Google Scholar
F. Pereira, N. Tishby, and L. Lee. 1993. Distributional Clustering of English Words. In Proceedings of ACL93, pages 183--190, Ohio State University, Columbus, Ohio. Google ScholarDigital Library
Gerda Ruge. 1992. Experiments on linguistically based term associations. Information Processing & Management, 28(3): 317--332. Google ScholarDigital Library
Frank Smadja. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1): 143--178. Google ScholarDigital Library

Automatic retrieval and clustering of similar words
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Vietnamese Paraphrase Identification Using Matching Duplicate Phrases and Similar Words
Future Data and Security Engineering
Abstract
Paraphrase identification is a core component for many significant tasks in natural language processing (e.g., text summarization, headline generation). A method suggested by Bach et al. for detecting Vietnamese paraphrase text using nine ...
Read More
Automatic transliteration for Japanese-to-English text retrieval
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between ...
Read More
Automatic lemmatizer construction with focus on OOV words lemmatization
TSD'05: Proceedings of the 8th international conference on Text, Speech and Dialogue

This paper deals with the automatic construction of a lemmatizer from a Full Form – Lemma (FFL) training dictionary and with lemmatization of new, in the FFL dictionary unseen, i.e. out-of-vocabulary (OOV) words. Three methods of lemmatization of three ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
August 1998
768 pages
Program Chairs:
Christian Boitet
Université Joseph Fourier, Grenoble, France
,
Pete Whitelock
Sharp Laboratories of Europe Ltd., Oxford, United Kingdom
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 10 August 1998
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 417
  Total Citations
  View Citations
- 3,764
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic retrieval and clustering of similar words

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2

ABSTRACT

References

Cited By

Recommendations

Vietnamese Paraphrase Identification Using Matching Duplicate Phrases and Similar Words

Automatic transliteration for Japanese-to-English text retrieval

Automatic lemmatizer construction with focus on OOV words lemmatization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic retrieval and clustering of similar words

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2

ABSTRACT

References

Cited By

Recommendations

Vietnamese Paraphrase Identification Using Matching Duplicate Phrases and Similar Words

Automatic transliteration for Japanese-to-English text retrieval

Automatic lemmatizer construction with focus on OOV words lemmatization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media