article

Free Access

Automatic word sense discrimination

Author:
Hinrich Schütze

Xerox Palo Alto Research Center

Xerox Palo Alto Research Center
View Profile

Authors Info & Claims

Computational Linguistics Volume 24 Issue 1pp 97–123

Published:01 March 1998Publication History

Computational Linguistics

Abstract

This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.

References

Berry, Michael W. 1992. Large-scale sparse singular value computations. The International Journal of Supercomputer Applications, 6(1):13--49.]]Google ScholarDigital Library
Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1991. Word-sense disambiguation using statistical methods. In Proceedings of the 29th Annual Meeting, pages 264--270, Berkeley CA. Association for Computational Linguistics.]] Google ScholarDigital Library
Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479.]] Google ScholarDigital Library
Bruce, Rebecca and Jaynce Wiebe. 1994. Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting, pages 139--145, Las Cruces, NM. Association for Computational Linguistics.]] Google ScholarDigital Library
Burgess, Curt and Kevin Lund. 1997. Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes, 12. To appear.]]Google Scholar
Church, Kenneth W. and William A. Gale. 1991. Concordances for parallel text. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pages 40--62, Oxford, England.]]Google Scholar
Church, Kenneth and William Gale. 1995. Poisson mixtures. Journal of Natural Language Engineering, 1(2):163--190.]]Google ScholarCross Ref
Cottrell, Garrison W. 1989. A Connectionist Approach to Word Sense Disambiguation. Pitman, London.]] Google ScholarDigital Library
Cutting, Douglas R., David R. Karger, and Jan O. Pedersen. 1993. Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of SIGIR'93, Pittsburgh, PA.]] Google ScholarDigital Library
Cutting, Douglass R., Jan O. Pedersen, and Per-Kristian Halvorsen. 1991. An object-oriented architecture for text retrieval. In Proceedings of RIAO'91, pages 285--298, Barcelona, Spain.]]Google Scholar
Cutting, Douglas R., Jan O. Pedersen, David Karger, and John W. Tukey. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318--329, Copenhagen, Denmark.]] Google ScholarDigital Library
Dagan, Ido, Alon Itai, and Ulrike Schwall. 1991. Two languages are more informative than one. In Proceedings of the 29th Annual Meeting, pages 130--137, Berkeley, CA. Association for Computational Linguistics.]] Google ScholarDigital Library
Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting, pages 164--171, Columbus, OH. Association for Computational Linguistics.]] Google ScholarDigital Library
Dagan, Ido, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting, pages 272--278, Las Cruces, NM. Association for Computational Linguistics.]] Google ScholarDigital Library
Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407.]]Google ScholarCross Ref
Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1--38.]]Google ScholarCross Ref
Duda, Richard O. and Peter E. Hart. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York.]] Google ScholarDigital Library
Finch, Steven Paul. 1993. Finding Structure in Language. Ph.D. thesis, University of Edinburgh.]]Google Scholar
Gale, William A., Kenneth W. Church, and David Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 54--60, AAAI Press, Menlo Park, CA.]]Google Scholar
Gallant, Stephen I. 1991. A practical approach for representing context and for performing word sense disambiguation using neural networks. Neural Computation, 3(3):293--309.]]Google ScholarCross Ref
Ghahramani, Zoubin. 1994. Solving inverse problems using an EM approach to density estimation. In Michael C. Mozer, Paul Smolensky, David S. Touretzky, and Andreas S. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Erlbaum Associates, Hillsdale, NJ.]]Google Scholar
Golub, Gene H. and Charles F. van Loan. 1989. Matrix Computations. The Johns Hopkins University Press, Baltimore and London.]]Google Scholar
Grefenstette, Gregory. 1992. Use of syntactic context to produce term association lists for text retrieval. In Proceedings of SIGIR '92, pages 89--97.]] Google ScholarDigital Library
Grefenstette, Gregory. 1994a. Corpus-derived first, second and third-order word affinities. In Proceedings of the Sixth Euralex International Congress, Amsterdam.]]Google Scholar
Grefenstette, Gregory. 1994b. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston.]] Google ScholarDigital Library
Grefenstette, Gregory. 1996. Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Branimir Boguraev and James Pustejovsky, editors, Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA.]] Google ScholarDigital Library
Guthrie, Joe A., Louise Guthrie, Yorick Wilks, and Homa Aidinejad. 1991. Subject-dependent co-occurrence and word sense disambiguation. In Proceedings of the 29th Annual Meeting, pages 146--152, Berkeley, CA. Association for Computational Linguistics.]] Google ScholarDigital Library
Harman, D. K., editor. 1993. The First Text REtrieval Conference (TREC-1). U.S. Department of Commerce, Washington, DC. NIST Special Publication 500--207.]]Google ScholarCross Ref
Hearst, Marti A. 1991. Noun homograph disambiguation using local context in large text corpora. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora, pages 1--22, Oxford.]]Google Scholar
Hearst, Marti and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Proceedings of SIGIR '93, pages 59--68.]] Google ScholarDigital Library
Hirst, Graeme. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge University Press, Cambridge.]] Google ScholarDigital Library
Jain, Anil K. and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.]] Google ScholarDigital Library
Karov, Yael and Shimon Edelman. 1996. Learning similarity-based word sense disambiguation from sparse data. In Proceedings of the Fourth Workshop on Very Large Corpora.]]Google Scholar
Kelly, Edward and Phillip Stone. 1975. Computer Recognition of English Word Senses. North-Holland, Amsterdam.]]Google Scholar
Kilgarriff, Adam. 1993. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities, 26:365--387.]]Google ScholarCross Ref
Krovetz, Robert. 1997. Homonymy and polysemy in information retrieval. In Proceedings of the 35th Annual Meeting and EACL 8, pages 72--79, Morgan Kaufmann, San Francisco, CA. Association for Computational Linguistics.]] Google ScholarDigital Library
Krovetz, Robert and W. Bruce Croft. 1989. Word sense disambiguation using machine-readable dictionaries. In Proceedings of SIGIR '89, pages 127--136, Cambridge, MA.]] Google ScholarDigital Library
Krovetz, Robert and W. Bruce Croft. 1992. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10(2):115--141.]] Google ScholarDigital Library
Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Towards building contextual representations of word senses using statistical models. In Branimir Boguraev and James Pustejovsky, editors, Acquisition of Lexical Knowledge From Text: Workshop Proceedings, pages 10--21, Ohio.]]Google Scholar
Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Corpus-based statistical sense resolution. In Proceedings of the ARPA Workshop on Human Language Technology, Morgan Kaufman, San Mateo, CA.]] Google ScholarDigital Library
Lesk, M. E. 1969. Word-word association in document retrieval systems. American Documentation, 20(1):27--38.]]Google ScholarCross Ref
Lesk, Michael. 1986. Automatic sense disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, pages 24--26, New York. Association for Computing Machinery.]] Google ScholarDigital Library
Miller, George A. and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28.]]Google ScholarCross Ref
Niwa, Yoshiki and Yoshihiko Nitta. 1994. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of COLING94, pages 304--309.]] Google ScholarDigital Library
Ott, Lyman. 1992. An Introduction to Statistical Methods and Data Analysis. Wadsworth, Belmont, CA.]] Google ScholarDigital Library
Pedersen, Ted and Rebecca Bruce. 1997. Distinguishing word senses in untagged text. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 197--207, Providence, RI.]]Google Scholar
Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st Annual Meeting, pages 183--190, Columbus, OH. Association for Computational Linguistics.]] Google ScholarDigital Library
Qiu, Yonggang and H.P. Frei. 1993. Concept based query expansion. In Proceedings of SIGIR '93, pages 160--169.]] Google ScholarDigital Library
Ruge, Gerda. 1992. Experiments on linguistically-based term associations. Information Processing & Management, 28(3):317--332.]] Google ScholarDigital Library
Salton, Gerard. 1971. Experiments in automatic thesaurus construction for information retrieval. In Proceedings IFIP Congress, pages 43--49.]]Google Scholar
Salton, Gerard and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288--297.]]Google ScholarCross Ref
Salton, Gerard and Michael J. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.]] Google ScholarDigital Library
Sanderson, Mark. 1994. Word sense disambiguation and information retrieval. In Proceedings of SIGIR '94, pages 142--151.]] Google ScholarDigital Library
Schütze, Hinrich. 1992a. Context space. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 113--120, AAAI Press, Menlo Park, CA.]]Google Scholar
Schütze, Hinrich. 1992b. Dimensions of meaning. In Proceedings of Supercomputing '92, pages 787--796, Minneapolis, MN.]] Google ScholarDigital Library
Schütze, Hinrich. 1997. Ambiguity Resolution in Language Learning. CSLI Publications, Stanford, CA.]]Google Scholar
Schütze, Hinrich and Jan O. Pedersen. 1995. Information retrieal based on word senses. In Proceedings for the Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 161--175, Las Vegas, NV.]]Google Scholar
Schütze, Hinrich and Jan O. Pedersen. 1997. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 33(3):307--318.]] Google ScholarDigital Library
Sparck-Jones, Karen. 1986. Synonymy and Semantic Classification. Edinburgh University Press, Edinburgh. (Publication of Ph.D. thesis, University of Cambridge, 1964.)]]Google Scholar
Sparck-Jones, Karen. 1991. Notes and references on early classification work. ACM SIGIR Forum, 25(1):10--17.]] Google ScholarDigital Library
van Rijsbergen, C. J. 1979. Information Retrieval. Second edition. Butterworths, London.]] Google ScholarDigital Library
Voorhees, Ellen M. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of SIGIR '93, pages 171--180.]] Google ScholarDigital Library
Walker, Donald E. and Robert A. Amsler. 1986. The use of machine-readable dictionaries in sublanguage analysis. In Ralph Grishman and Richard Kittredge, editors, Analyzing Language in Restricted Domains: Sublanguage Description and Processing. L. Erlbaum Associates, Hillsdale, NJ, pages 69--84.]]Google Scholar
Wilks, Yorick A., Dan C. Fass, Cheng Ming Guo, James E. McDonald, Tony Plate, and Brian M. Slator. 1990. Providing machine tractable dictionary tools. Journal of Computers and Translation, 2.]]Google Scholar
Willett, Peter. 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5):577--597.]] Google ScholarDigital Library
Winer, B. J. 1971. Statistical Principles in Experimental Design. Second edition. McGraw-Hill, New York, NY.]]Google Scholar
Yarowsky, David. 1992. Word-sense disambiguation using ststistical models of Roget's categories trained on large corpora. In Proceedings of Coling-92, pages 454--460, Nantes, France.]] Google ScholarDigital Library
Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting, Cambridge, MA. Association for Computational Linguistics.]] Google ScholarDigital Library

Index Terms

Automatic word sense discrimination
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Hardware
  1. Power and energy
    1. Power estimation and optimization
      1. Platform power issues

Index terms have been assigned to the content through auto-classification.

Recommendations

Multilingual word sense discrimination: a comparative cross-linguistic study
ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies

We describe a study that evaluates an approach to Word Sense Discrimination on three languages with different linguistic structures, English, Hebrew, and Russian. The goal of the study is to determine whether there are significant performance ...
Read More
Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus
GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language Processing

This paper presents an algorithm for Word Sense Discrimination that divides the global representation of a word into a number of classes by determining for any two occurrences whether they belong to the same sense or not. We rely on the notion that ...
Read More
A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Word Sense Disambiguation (WSD) aims to automatically predict the correct sense of a word used in a given context. All human languages exhibit word sense ambiguity, and resolving this ambiguity can be difficult. Standard benchmark resources are required ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Computational Linguistics Volume 24, Issue 1
Special issue on word sense disambiguation
March 1998
179 pages
ISSN:0891-2017
EISSN:1530-9312
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
MIT Press
Cambridge, MA, United States
Publication History
- Published: 1 March 1998
Published in coli Volume 24, Issue 1
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 277
  Total Citations
  View Citations
- 7,157
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic word sense discrimination

Computational Linguistics

Abstract

References

Cited By

Index Terms

Recommendations

Multilingual word sense discrimination: a comparative cross-linguistic study

Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus

A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic word sense discrimination

Computational Linguistics

Abstract

References

Cited By

Index Terms

Recommendations

Multilingual word sense discrimination: a comparative cross-linguistic study

Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus

A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media