skip to main content
10.3115/980691.980696dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Automatic retrieval and clustering of similar words

Published:10 August 1998Publication History

ABSTRACT

Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurus is significantly closer to WordNet than Roget Thesaurus is.

References

  1. Hiyan Alshawi and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics, 20(4): 635--648, December. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ido Dagan, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of ACL-93, pages 164--171, Columbus, Ohio, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ido Dagan, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting of the ACL, pages 272--278, Las Cruces, NM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ido Dagan, Lillian Lee, and Fernando Pereira. 1997. Similarity-based method for word sense disambiguation. In Proceedings of the 35th Annual Meeting of the ACL, pages 56--63, Madrid, Spain. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ute Essen and Volker Steinbiss. 1992. Cooccurrence smoothing for stochastic language modeling. In Proceedings of ICASSP, volume 1, pages 161--164.Google ScholarGoogle ScholarCross RefCross Ref
  6. W. B. Frakes and R. Baeza-Yates, editors. 1992. Information Retrieval, Data Structure and Algorithms. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Gentner. 1982. Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S. A. Kuczaj, editor, Language development: Vol. 2. Language, thought, and culture, pages 301--334. Erlbaum, Hillsdale, NJ.Google ScholarGoogle Scholar
  8. Gregory Grefenstette. 1994. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Donald Hindle. 1990. Noun classification from predicate-argument structures. In Proceedings of ACL-90, pages 268--275, Pittsburg, Pennsylvania, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dekang Lin. 1993. Principle-based parsing without overgeneration. In Proceedings of ACL-93, pages 112--120, Columbus, Ohio. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dekang Lin. 1994. Principar---an efficient, broad-coverage, principle-based parser. In Proceedings of COLING-94, pages 482--488. Kyoto, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dekang Lin. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In Proceedings of ACL/EACL-97, pages 64--71, Madrid, Spain, July. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--244.Google ScholarGoogle ScholarCross RefCross Ref
  14. George A. Miller. 1990. WordNet: An on-line lexical database. International Journal of Lexicography, 3(4): 235--312.Google ScholarGoogle ScholarCross RefCross Ref
  15. Eugene A. Nida. 1975. Componential Analysis of Meaning. The Hague, Mouton.Google ScholarGoogle Scholar
  16. F. Pereira, N. Tishby, and L. Lee. 1993. Distributional Clustering of English Words. In Proceedings of ACL93, pages 183--190, Ohio State University, Columbus, Ohio. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gerda Ruge. 1992. Experiments on linguistically based term associations. Information Processing & Management, 28(3): 317--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Frank Smadja. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1): 143--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Automatic retrieval and clustering of similar words

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 2
        August 1998
        768 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 10 August 1998

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate85of443submissions,19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader