Abstract
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.
- Berry, Michael W. 1992. Large-scale sparse singular value computations. The International Journal of Supercomputer Applications, 6(1):13--49.]]Google ScholarDigital Library
- Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1991. Word-sense disambiguation using statistical methods. In Proceedings of the 29th Annual Meeting, pages 264--270, Berkeley CA. Association for Computational Linguistics.]] Google ScholarDigital Library
- Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479.]] Google ScholarDigital Library
- Bruce, Rebecca and Jaynce Wiebe. 1994. Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting, pages 139--145, Las Cruces, NM. Association for Computational Linguistics.]] Google ScholarDigital Library
- Burgess, Curt and Kevin Lund. 1997. Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes, 12. To appear.]]Google Scholar
- Church, Kenneth W. and William A. Gale. 1991. Concordances for parallel text. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pages 40--62, Oxford, England.]]Google Scholar
- Church, Kenneth and William Gale. 1995. Poisson mixtures. Journal of Natural Language Engineering, 1(2):163--190.]]Google ScholarCross Ref
- Cottrell, Garrison W. 1989. A Connectionist Approach to Word Sense Disambiguation. Pitman, London.]] Google ScholarDigital Library
- Cutting, Douglas R., David R. Karger, and Jan O. Pedersen. 1993. Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of SIGIR'93, Pittsburgh, PA.]] Google ScholarDigital Library
- Cutting, Douglass R., Jan O. Pedersen, and Per-Kristian Halvorsen. 1991. An object-oriented architecture for text retrieval. In Proceedings of RIAO'91, pages 285--298, Barcelona, Spain.]]Google Scholar
- Cutting, Douglas R., Jan O. Pedersen, David Karger, and John W. Tukey. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318--329, Copenhagen, Denmark.]] Google ScholarDigital Library
- Dagan, Ido, Alon Itai, and Ulrike Schwall. 1991. Two languages are more informative than one. In Proceedings of the 29th Annual Meeting, pages 130--137, Berkeley, CA. Association for Computational Linguistics.]] Google ScholarDigital Library
- Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting, pages 164--171, Columbus, OH. Association for Computational Linguistics.]] Google ScholarDigital Library
- Dagan, Ido, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting, pages 272--278, Las Cruces, NM. Association for Computational Linguistics.]] Google ScholarDigital Library
- Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407.]]Google ScholarCross Ref
- Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1--38.]]Google ScholarCross Ref
- Duda, Richard O. and Peter E. Hart. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York.]] Google ScholarDigital Library
- Finch, Steven Paul. 1993. Finding Structure in Language. Ph.D. thesis, University of Edinburgh.]]Google Scholar
- Gale, William A., Kenneth W. Church, and David Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 54--60, AAAI Press, Menlo Park, CA.]]Google Scholar
- Gallant, Stephen I. 1991. A practical approach for representing context and for performing word sense disambiguation using neural networks. Neural Computation, 3(3):293--309.]]Google ScholarCross Ref
- Ghahramani, Zoubin. 1994. Solving inverse problems using an EM approach to density estimation. In Michael C. Mozer, Paul Smolensky, David S. Touretzky, and Andreas S. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Erlbaum Associates, Hillsdale, NJ.]]Google Scholar
- Golub, Gene H. and Charles F. van Loan. 1989. Matrix Computations. The Johns Hopkins University Press, Baltimore and London.]]Google Scholar
- Grefenstette, Gregory. 1992. Use of syntactic context to produce term association lists for text retrieval. In Proceedings of SIGIR '92, pages 89--97.]] Google ScholarDigital Library
- Grefenstette, Gregory. 1994a. Corpus-derived first, second and third-order word affinities. In Proceedings of the Sixth Euralex International Congress, Amsterdam.]]Google Scholar
- Grefenstette, Gregory. 1994b. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston.]] Google ScholarDigital Library
- Grefenstette, Gregory. 1996. Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Branimir Boguraev and James Pustejovsky, editors, Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA.]] Google ScholarDigital Library
- Guthrie, Joe A., Louise Guthrie, Yorick Wilks, and Homa Aidinejad. 1991. Subject-dependent co-occurrence and word sense disambiguation. In Proceedings of the 29th Annual Meeting, pages 146--152, Berkeley, CA. Association for Computational Linguistics.]] Google ScholarDigital Library
- Harman, D. K., editor. 1993. The First Text REtrieval Conference (TREC-1). U.S. Department of Commerce, Washington, DC. NIST Special Publication 500--207.]]Google ScholarCross Ref
- Hearst, Marti A. 1991. Noun homograph disambiguation using local context in large text corpora. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora, pages 1--22, Oxford.]]Google Scholar
- Hearst, Marti and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Proceedings of SIGIR '93, pages 59--68.]] Google ScholarDigital Library
- Hirst, Graeme. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge University Press, Cambridge.]] Google ScholarDigital Library
- Jain, Anil K. and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.]] Google ScholarDigital Library
- Karov, Yael and Shimon Edelman. 1996. Learning similarity-based word sense disambiguation from sparse data. In Proceedings of the Fourth Workshop on Very Large Corpora.]]Google Scholar
- Kelly, Edward and Phillip Stone. 1975. Computer Recognition of English Word Senses. North-Holland, Amsterdam.]]Google Scholar
- Kilgarriff, Adam. 1993. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities, 26:365--387.]]Google ScholarCross Ref
- Krovetz, Robert. 1997. Homonymy and polysemy in information retrieval. In Proceedings of the 35th Annual Meeting and EACL 8, pages 72--79, Morgan Kaufmann, San Francisco, CA. Association for Computational Linguistics.]] Google ScholarDigital Library
- Krovetz, Robert and W. Bruce Croft. 1989. Word sense disambiguation using machine-readable dictionaries. In Proceedings of SIGIR '89, pages 127--136, Cambridge, MA.]] Google ScholarDigital Library
- Krovetz, Robert and W. Bruce Croft. 1992. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10(2):115--141.]] Google ScholarDigital Library
- Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Towards building contextual representations of word senses using statistical models. In Branimir Boguraev and James Pustejovsky, editors, Acquisition of Lexical Knowledge From Text: Workshop Proceedings, pages 10--21, Ohio.]]Google Scholar
- Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Corpus-based statistical sense resolution. In Proceedings of the ARPA Workshop on Human Language Technology, Morgan Kaufman, San Mateo, CA.]] Google ScholarDigital Library
- Lesk, M. E. 1969. Word-word association in document retrieval systems. American Documentation, 20(1):27--38.]]Google ScholarCross Ref
- Lesk, Michael. 1986. Automatic sense disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, pages 24--26, New York. Association for Computing Machinery.]] Google ScholarDigital Library
- Miller, George A. and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28.]]Google ScholarCross Ref
- Niwa, Yoshiki and Yoshihiko Nitta. 1994. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of COLING94, pages 304--309.]] Google ScholarDigital Library
- Ott, Lyman. 1992. An Introduction to Statistical Methods and Data Analysis. Wadsworth, Belmont, CA.]] Google ScholarDigital Library
- Pedersen, Ted and Rebecca Bruce. 1997. Distinguishing word senses in untagged text. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 197--207, Providence, RI.]]Google Scholar
- Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st Annual Meeting, pages 183--190, Columbus, OH. Association for Computational Linguistics.]] Google ScholarDigital Library
- Qiu, Yonggang and H.P. Frei. 1993. Concept based query expansion. In Proceedings of SIGIR '93, pages 160--169.]] Google ScholarDigital Library
- Ruge, Gerda. 1992. Experiments on linguistically-based term associations. Information Processing & Management, 28(3):317--332.]] Google ScholarDigital Library
- Salton, Gerard. 1971. Experiments in automatic thesaurus construction for information retrieval. In Proceedings IFIP Congress, pages 43--49.]]Google Scholar
- Salton, Gerard and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288--297.]]Google ScholarCross Ref
- Salton, Gerard and Michael J. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.]] Google ScholarDigital Library
- Sanderson, Mark. 1994. Word sense disambiguation and information retrieval. In Proceedings of SIGIR '94, pages 142--151.]] Google ScholarDigital Library
- Schütze, Hinrich. 1992a. Context space. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 113--120, AAAI Press, Menlo Park, CA.]]Google Scholar
- Schütze, Hinrich. 1992b. Dimensions of meaning. In Proceedings of Supercomputing '92, pages 787--796, Minneapolis, MN.]] Google ScholarDigital Library
- Schütze, Hinrich. 1997. Ambiguity Resolution in Language Learning. CSLI Publications, Stanford, CA.]]Google Scholar
- Schütze, Hinrich and Jan O. Pedersen. 1995. Information retrieal based on word senses. In Proceedings for the Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 161--175, Las Vegas, NV.]]Google Scholar
- Schütze, Hinrich and Jan O. Pedersen. 1997. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 33(3):307--318.]] Google ScholarDigital Library
- Sparck-Jones, Karen. 1986. Synonymy and Semantic Classification. Edinburgh University Press, Edinburgh. (Publication of Ph.D. thesis, University of Cambridge, 1964.)]]Google Scholar
- Sparck-Jones, Karen. 1991. Notes and references on early classification work. ACM SIGIR Forum, 25(1):10--17.]] Google ScholarDigital Library
- van Rijsbergen, C. J. 1979. Information Retrieval. Second edition. Butterworths, London.]] Google ScholarDigital Library
- Voorhees, Ellen M. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of SIGIR '93, pages 171--180.]] Google ScholarDigital Library
- Walker, Donald E. and Robert A. Amsler. 1986. The use of machine-readable dictionaries in sublanguage analysis. In Ralph Grishman and Richard Kittredge, editors, Analyzing Language in Restricted Domains: Sublanguage Description and Processing. L. Erlbaum Associates, Hillsdale, NJ, pages 69--84.]]Google Scholar
- Wilks, Yorick A., Dan C. Fass, Cheng Ming Guo, James E. McDonald, Tony Plate, and Brian M. Slator. 1990. Providing machine tractable dictionary tools. Journal of Computers and Translation, 2.]]Google Scholar
- Willett, Peter. 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5):577--597.]] Google ScholarDigital Library
- Winer, B. J. 1971. Statistical Principles in Experimental Design. Second edition. McGraw-Hill, New York, NY.]]Google Scholar
- Yarowsky, David. 1992. Word-sense disambiguation using ststistical models of Roget's categories trained on large corpora. In Proceedings of Coling-92, pages 454--460, Nantes, France.]] Google ScholarDigital Library
- Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting, Cambridge, MA. Association for Computational Linguistics.]] Google ScholarDigital Library
Index Terms
- Automatic word sense discrimination
Recommendations
Multilingual word sense discrimination: a comparative cross-linguistic study
ACL '07: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling TechnologiesWe describe a study that evaluates an approach to Word Sense Discrimination on three languages with different linguistic structures, English, Hebrew, and Russian. The goal of the study is to determine whether there are significant performance ...
Augmenting Word Space Models for Word Sense Discrimination Using an Automatic Thesaurus
GoTAL '08: Proceedings of the 6th international conference on Advances in Natural Language ProcessingThis paper presents an algorithm for Word Sense Discrimination that divides the global representation of a word into a number of classes by determining for any two occurrences whether they belong to the same sense or not. We rely on the notion that ...
A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation
Word Sense Disambiguation (WSD) aims to automatically predict the correct sense of a word used in a given context. All human languages exhibit word sense ambiguity, and resolving this ambiguity can be difficult. Standard benchmark resources are required ...
Comments