ABSTRACT
Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurus is significantly closer to WordNet than Roget Thesaurus is.
- Hiyan Alshawi and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics, 20(4): 635--648, December. Google ScholarDigital Library
- Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of ACL-93, pages 164--171, Columbus, Ohio, June. Google ScholarDigital Library
- Ido Dagan, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting of the ACL, pages 272--278, Las Cruces, NM. Google ScholarDigital Library
- Ido Dagan, Lillian Lee, and Fernando Pereira. 1997. Similarity-based method for word sense disambiguation. In Proceedings of the 35th Annual Meeting of the ACL, pages 56--63, Madrid, Spain. Google ScholarDigital Library
- Ute Essen and Volker Steinbiss. 1992. Cooccurrence smoothing for stochastic language modeling. In Proceedings of ICASSP, volume 1, pages 161--164.Google ScholarCross Ref
- W. B. Frakes and R. Baeza-Yates, editors. 1992. Information Retrieval, Data Structure and Algorithms. Prentice Hall. Google ScholarDigital Library
- D. Gentner. 1982. Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S. A. Kuczaj, editor, Language development: Vol. 2. Language, thought, and culture, pages 301--334. Erlbaum, Hillsdale, NJ.Google Scholar
- Gregory Grefenstette. 1994. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, MA. Google ScholarDigital Library
- Donald Hindle. 1990. Noun classification from predicate-argument structures. In Proceedings of ACL-90, pages 268--275, Pittsburg, Pennsylvania, June. Google ScholarDigital Library
- Dekang Lin. 1993. Principle-based parsing without overgeneration. In Proceedings of ACL-93, pages 112--120, Columbus, Ohio. Google ScholarDigital Library
- Dekang Lin. 1994. Principar---an efficient, broad-coverage, principle-based parser. In Proceedings of COLING-94, pages 482--488. Kyoto, Japan. Google ScholarDigital Library
- Dekang Lin. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In Proceedings of ACL/EACL-97, pages 64--71, Madrid, Spain, July. Google ScholarDigital Library
- George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--244.Google ScholarCross Ref
- George A. Miller. 1990. WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--312.Google ScholarCross Ref
- Eugene A. Nida. 1975. Componential Analysis of Meaning. The Hague, Mouton.Google Scholar
- F. Pereira, N. Tishby, and L. Lee. 1993. Distributional Clustering of English Words. In Proceedings of ACL93, pages 183--190, Ohio State University, Columbus, Ohio. Google ScholarDigital Library
- Gerda Ruge. 1992. Experiments on linguistically based term associations. Information Processing & Management, 28(3): 317--332. Google ScholarDigital Library
- Frank Smadja. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1): 143--178. Google ScholarDigital Library
- Automatic retrieval and clustering of similar words
Recommendations
Vietnamese Paraphrase Identification Using Matching Duplicate Phrases and Similar Words
Future Data and Security EngineeringAbstractParaphrase identification is a core component for many significant tasks in natural language processing (e.g., text summarization, headline generation). A method suggested by Bach et al. for detecting Vietnamese paraphrase text using nine ...
Automatic transliteration for Japanese-to-English text retrieval
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalFor cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between ...
Automatic lemmatizer construction with focus on OOV words lemmatization
TSD'05: Proceedings of the 8th international conference on Text, Speech and DialogueThis paper deals with the automatic construction of a lemmatizer from a Full Form – Lemma (FFL) training dictionary and with lemmatization of new, in the FFL dictionary unseen, i.e. out-of-vocabulary (OOV) words. Three methods of lemmatization of three ...
Comments