skip to main content
article
Free Access

Automatic word sense discrimination

Published:01 March 1998Publication History
Skip Abstract Section

Abstract

This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.

References

  1. Berry, Michael W. 1992. Large-scale sparse singular value computations. The International Journal of Supercomputer Applications, 6(1):13--49.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1991. Word-sense disambiguation using statistical methods. In Proceedings of the 29th Annual Meeting, pages 264--270, Berkeley CA. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bruce, Rebecca and Jaynce Wiebe. 1994. Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting, pages 139--145, Las Cruces, NM. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Burgess, Curt and Kevin Lund. 1997. Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes, 12. To appear.]]Google ScholarGoogle Scholar
  6. Church, Kenneth W. and William A. Gale. 1991. Concordances for parallel text. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pages 40--62, Oxford, England.]]Google ScholarGoogle Scholar
  7. Church, Kenneth and William Gale. 1995. Poisson mixtures. Journal of Natural Language Engineering, 1(2):163--190.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. Cottrell, Garrison W. 1989. A Connectionist Approach to Word Sense Disambiguation. Pitman, London.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cutting, Douglas R., David R. Karger, and Jan O. Pedersen. 1993. Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of SIGIR'93, Pittsburgh, PA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cutting, Douglass R., Jan O. Pedersen, and Per-Kristian Halvorsen. 1991. An object-oriented architecture for text retrieval. In Proceedings of RIAO'91, pages 285--298, Barcelona, Spain.]]Google ScholarGoogle Scholar
  11. Cutting, Douglas R., Jan O. Pedersen, David Karger, and John W. Tukey. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318--329, Copenhagen, Denmark.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dagan, Ido, Alon Itai, and Ulrike Schwall. 1991. Two languages are more informative than one. In Proceedings of the 29th Annual Meeting, pages 130--137, Berkeley, CA. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting, pages 164--171, Columbus, OH. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dagan, Ido, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting, pages 272--278, Las Cruces, NM. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1--38.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. Duda, Richard O. and Peter E. Hart. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Finch, Steven Paul. 1993. Finding Structure in Language. Ph.D. thesis, University of Edinburgh.]]Google ScholarGoogle Scholar
  19. Gale, William A., Kenneth W. Church, and David Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 54--60, AAAI Press, Menlo Park, CA.]]Google ScholarGoogle Scholar
  20. Gallant, Stephen I. 1991. A practical approach for representing context and for performing word sense disambiguation using neural networks. Neural Computation, 3(3):293--309.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. Ghahramani, Zoubin. 1994. Solving inverse problems using an EM approach to density estimation. In Michael C. Mozer, Paul Smolensky, David S. Touretzky, and Andreas S. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Erlbaum Associates, Hillsdale, NJ.]]Google ScholarGoogle Scholar
  22. Golub, Gene H. and Charles F. van Loan. 1989. Matrix Computations. The Johns Hopkins University Press, Baltimore and London.]]Google ScholarGoogle Scholar
  23. Grefenstette, Gregory. 1992. Use of syntactic context to produce term association lists for text retrieval. In Proceedings of SIGIR '92, pages 89--97.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Grefenstette, Gregory. 1994a. Corpus-derived first, second and third-order word affinities. In Proceedings of the Sixth Euralex International Congress, Amsterdam.]]Google ScholarGoogle Scholar
  25. Grefenstette, Gregory. 1994b. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Grefenstette, Gregory. 1996. Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Branimir Boguraev and James Pustejovsky, editors, Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Guthrie, Joe A., Louise Guthrie, Yorick Wilks, and Homa Aidinejad. 1991. Subject-dependent co-occurrence and word sense disambiguation. In Proceedings of the 29th Annual Meeting, pages 146--152, Berkeley, CA. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Harman, D. K., editor. 1993. The First Text REtrieval Conference (TREC-1). U.S. Department of Commerce, Washington, DC. NIST Special Publication 500--207.]]Google ScholarGoogle ScholarCross RefCross Ref
  29. Hearst, Marti A. 1991. Noun homograph disambiguation using local context in large text corpora. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora, pages 1--22, Oxford.]]Google ScholarGoogle Scholar
  30. Hearst, Marti and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Proceedings of SIGIR '93, pages 59--68.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hirst, Graeme. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge University Press, Cambridge.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jain, Anil K. and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Karov, Yael and Shimon Edelman. 1996. Learning similarity-based word sense disambiguation from sparse data. In Proceedings of the Fourth Workshop on Very Large Corpora.]]Google ScholarGoogle Scholar
  34. Kelly, Edward and Phillip Stone. 1975. Computer Recognition of English Word Senses. North-Holland, Amsterdam.]]Google ScholarGoogle Scholar
  35. Kilgarriff, Adam. 1993. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities, 26:365--387.]]Google ScholarGoogle ScholarCross RefCross Ref
  36. Krovetz, Robert. 1997. Homonymy and polysemy in information retrieval. In Proceedings of the 35th Annual Meeting and EACL 8, pages 72--79, Morgan Kaufmann, San Francisco, CA. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Krovetz, Robert and W. Bruce Croft. 1989. Word sense disambiguation using machine-readable dictionaries. In Proceedings of SIGIR '89, pages 127--136, Cambridge, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Krovetz, Robert and W. Bruce Croft. 1992. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10(2):115--141.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Towards building contextual representations of word senses using statistical models. In Branimir Boguraev and James Pustejovsky, editors, Acquisition of Lexical Knowledge From Text: Workshop Proceedings, pages 10--21, Ohio.]]Google ScholarGoogle Scholar
  40. Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Corpus-based statistical sense resolution. In Proceedings of the ARPA Workshop on Human Language Technology, Morgan Kaufman, San Mateo, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lesk, M. E. 1969. Word-word association in document retrieval systems. American Documentation, 20(1):27--38.]]Google ScholarGoogle ScholarCross RefCross Ref
  42. Lesk, Michael. 1986. Automatic sense disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, pages 24--26, New York. Association for Computing Machinery.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Miller, George A. and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28.]]Google ScholarGoogle ScholarCross RefCross Ref
  44. Niwa, Yoshiki and Yoshihiko Nitta. 1994. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of COLING94, pages 304--309.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ott, Lyman. 1992. An Introduction to Statistical Methods and Data Analysis. Wadsworth, Belmont, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Pedersen, Ted and Rebecca Bruce. 1997. Distinguishing word senses in untagged text. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 197--207, Providence, RI.]]Google ScholarGoogle Scholar
  47. Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st Annual Meeting, pages 183--190, Columbus, OH. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Qiu, Yonggang and H.P. Frei. 1993. Concept based query expansion. In Proceedings of SIGIR '93, pages 160--169.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ruge, Gerda. 1992. Experiments on linguistically-based term associations. Information Processing & Management, 28(3):317--332.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Salton, Gerard. 1971. Experiments in automatic thesaurus construction for information retrieval. In Proceedings IFIP Congress, pages 43--49.]]Google ScholarGoogle Scholar
  51. Salton, Gerard and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288--297.]]Google ScholarGoogle ScholarCross RefCross Ref
  52. Salton, Gerard and Michael J. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sanderson, Mark. 1994. Word sense disambiguation and information retrieval. In Proceedings of SIGIR '94, pages 142--151.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Schütze, Hinrich. 1992a. Context space. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 113--120, AAAI Press, Menlo Park, CA.]]Google ScholarGoogle Scholar
  55. Schütze, Hinrich. 1992b. Dimensions of meaning. In Proceedings of Supercomputing '92, pages 787--796, Minneapolis, MN.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Schütze, Hinrich. 1997. Ambiguity Resolution in Language Learning. CSLI Publications, Stanford, CA.]]Google ScholarGoogle Scholar
  57. Schütze, Hinrich and Jan O. Pedersen. 1995. Information retrieal based on word senses. In Proceedings for the Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 161--175, Las Vegas, NV.]]Google ScholarGoogle Scholar
  58. Schütze, Hinrich and Jan O. Pedersen. 1997. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 33(3):307--318.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Sparck-Jones, Karen. 1986. Synonymy and Semantic Classification. Edinburgh University Press, Edinburgh. (Publication of Ph.D. thesis, University of Cambridge, 1964.)]]Google ScholarGoogle Scholar
  60. Sparck-Jones, Karen. 1991. Notes and references on early classification work. ACM SIGIR Forum, 25(1):10--17.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. van Rijsbergen, C. J. 1979. Information Retrieval. Second edition. Butterworths, London.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Voorhees, Ellen M. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of SIGIR '93, pages 171--180.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Walker, Donald E. and Robert A. Amsler. 1986. The use of machine-readable dictionaries in sublanguage analysis. In Ralph Grishman and Richard Kittredge, editors, Analyzing Language in Restricted Domains: Sublanguage Description and Processing. L. Erlbaum Associates, Hillsdale, NJ, pages 69--84.]]Google ScholarGoogle Scholar
  64. Wilks, Yorick A., Dan C. Fass, Cheng Ming Guo, James E. McDonald, Tony Plate, and Brian M. Slator. 1990. Providing machine tractable dictionary tools. Journal of Computers and Translation, 2.]]Google ScholarGoogle Scholar
  65. Willett, Peter. 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5):577--597.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Winer, B. J. 1971. Statistical Principles in Experimental Design. Second edition. McGraw-Hill, New York, NY.]]Google ScholarGoogle Scholar
  67. Yarowsky, David. 1992. Word-sense disambiguation using ststistical models of Roget's categories trained on large corpora. In Proceedings of Coling-92, pages 454--460, Nantes, France.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting, Cambridge, MA. Association for Computational Linguistics.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automatic word sense discrimination
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Computational Linguistics
        Computational Linguistics  Volume 24, Issue 1
        Special issue on word sense disambiguation
        March 1998
        179 pages
        ISSN:0891-2017
        EISSN:1530-9312
        Issue’s Table of Contents

        Publisher

        MIT Press

        Cambridge, MA, United States

        Publication History

        • Published: 1 March 1998
        Published in coli Volume 24, Issue 1

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader