research-article

Free Access

Word sense induction & disambiguation using hierarchical random graphs

Authors:
Ioannis P. Klapaftis

University of York, United Kingdom

University of York, United Kingdom
View Profile

,
Suresh Manandhar

University of York, United Kingdom

University of York, United Kingdom
View Profile

EMNLP '10: Proceedings of the 2010 Conference on Empirical Methods in Natural Language ProcessingOctober 2010Pages 745–755

Published:09 October 2010Publication History

EMNLP '10: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Pages 745–755

ABSTRACT

Graph-based methods have gained attention in many areas of Natural Language Processing (NLP) including Word Sense Disambiguation (WSD), text summarization, keyword extraction and others. Most of the work in these areas formulate their problem in a graph-based setting and apply unsupervised graph clustering to obtain a set of clusters. Recent studies suggest that graphs often exhibit a hierarchical structure that goes beyond simple flat clustering. This paper presents an unsupervised method for inferring the hierarchical grouping of the senses of a polysemous word. The inferred hierarchical structures are applied to the problem of word sense disambiguation, where we show that our method performs significantly better than traditional graph-based methods and agglomerative clustering yielding improvements over state-of-the-art WSD systems based on sense induction.

References

Eneko Agirre and Aitor. Soroa. 2007. Semeval-2007 Task 02: Evaluating Word Sense Induction and Discrimination Systems. In Proceedings of SemEval-2007, pages 7--12, Prague, Czech Republic. Google ScholarDigital Library
Eneko Agirre, David Martínez, Oier López de Lacalle, and Aitor Soroa. 2006. Two Graph-based Algorithms for State-of-the-art WSD. In Proceedings of EMNLP-2006, pages 585--593, Sydney, Australia. Google ScholarDigital Library
Chris Biemann. 2006. Chinese Whispers - An Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. In Proceedings of TextGraphs, pages 73--80, New York, USA. Google ScholarDigital Library
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res., 3:993--1022. Google ScholarCross Ref
Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst., 30(1--7):107--117. Google ScholarDigital Library
Samuel Brody and Mirella Lapata. 2009. Bayesian Word Sense Induction. In Proceedings of EACL-2009, pages 103--111, Athens, Greece. ACL. Google ScholarDigital Library
Aaron Clauset, Cristopher Moore, and Mark E. J. Newman. 2006. Structural Inference of Hierarchies in Networks. In Proceedings of the ICML-2006 Workshop on Social Network Analysis, pages 1--13, Pittsburgh, USA. Google ScholarDigital Library
Aaron Clauset, Cristopher Moore, and Mark E. J. Newman. 2008. Hierarchical Structure and the Prediction of Missing Links in Networks. Nature, 453(7191):98--101.Google ScholarCross Ref
Stijn Dongen. 2000. Performance Criteria for Graph Clustering and Markov Cluster Experiments. Technical report, CWI (Centre for Mathematics and Computer Science), Amsterdam, The Netherlands. Google ScholarDigital Library
Beate Dorow and Dominic Widdows. 2003. Discovering Corpus-specific Word Senses. In Proceedings of the EACL-2003, pages 79--82, Budapest, Hungary. Google ScholarDigital Library
Ted Dunning. 1993. Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1):61--74. Google ScholarDigital Library
Phil Edmonds and Beate Dorow. 2001. Senseval-2: Overview. In Proceedings of SensEval-2, pages 1--5, Toulouse, France. Google ScholarDigital Library
Ioannis P. Klapaftis and Suresh Manandhar. 2008. Word Sense Induction Using Graphs of Collocations. In Proceedings of ECAI-2008, pages 298--302, Patras, Greece. Google ScholarDigital Library
Ioannis P. Klapaftis and Suresh Manandhar. 2010. Taxonomy Learning Using Word Sense Induction. In Proceedings of NAACL-HLT-2010, pages 82--90, Los Angeles, California, June. ACL. Google ScholarDigital Library
Suresh Manandhar, Ioannis P. Klapaftis, Dmitriy Dligach, and Sameer S. Pradhan. 2010. Semeval-2010 Task 14: Word Sense Induction & Disambiguation. In Proceedings of SemEval-2, Uppsala, Sweden. ACL. Google ScholarDigital Library
Rada Mihalcea. 2004. Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization. In Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, page 20, Morristown, NJ, USA. Google ScholarDigital Library
Mark Newman and Gerard Barkema. 1999. Monte Carlo Methods in Statistical Physics. Oxford: Clarendon Press, New York, USA.Google Scholar
Zheng-Yu Niu, Dong-Hong Ji, and Chew-Lim Tan. 2007. I2R: Three Systems for Word Sense Discrimination, Chinese Word Sense Disambiguation, and English Word Sense Disambiguation. In Proceedings of SemEval-2007, pages 177--182, Prague, Czech Republic. Google ScholarDigital Library
Patrick Pantel and Dekang Lin. 2003. Automatically Discovering Word Senses. In Proceedings of NAACL-HLT-2003, pages 21--22, Morristown, NJ, USA. Google ScholarDigital Library
Ted Pedersen and Anagha Kulkarni. 2006. Automatic Cluster Stopping With Criterion Functions and the gap Statistic. In Proceedings of the 2006 Conference of the North American Chapter of the ACL on Human Language Technology, pages 276--279, Morristown, NJ, USA. Google ScholarDigital Library
Ted Pedersen. 2007. UMND2: Senseclusters Applied to the Sense Induction Task of Senseval-4. In Proceedings of SemEval-2007, pages 394--397, Prague, Czech Republic. Google ScholarDigital Library
Daniel Ramage, Anna N. Rafferty, and Christopher D. Manning. 2009. Random Walks for Text Semantic Similarity. In Proceedings of TextGraphs-4, Suntec, Singapore, August. Google ScholarDigital Library
Noam Slonim, Nir Friedman, and Naftali Tishby. 2002. Unsupervised Document Classification Using Sequential Information Maximization. In SIGIR 2002, pages 129--136, New York, NY, USA. ACM. Google ScholarDigital Library
Benjamin Snyder and Martha Palmer. 2004. The English All-words Task. In Rada Mihalcea and Phil Edmonds, editors, In Proceedings of Senseval-3, pages 41--43, Barcelona, Spain.Google Scholar
Jean Véronis. 2004. Hyperlex: Lexical Cartography for Information Retrieval. Computer Speech & Language, 18(3):223--252.Google Scholar
Julie Weeds, David Weir, and Diana McCarthy. 2004. Characterising Measures of Lexical Distributional Similarity. In Proceedings of COLING-2004, pages 10--15, Morristown, NJ, USA. Google ScholarDigital Library
Dominic Widdows and Beate Dorow. 2002. A Graph Model for Unsupervised Lexical Acquisition. In Proceedings of Coling-2002, pages 1--7, Morristown, NJ, USA. Google ScholarDigital Library

Word sense induction & disambiguation using hierarchical random graphs

Recommendations

Broad-coverage hierarchical word sense disambiguation
Read More
Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns ...
Read More
Unsupervised word sense disambiguation using bilingual comparable corpora
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

An unsupervised method for word sense disambiguation using a bilingual comparable corpus was developed. First, it extracts statistically significant pairs of related words from the corpus of each language. Then, aligning pairs of related words ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '10: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
October 2010
1332 pages
Program Chairs:
Hang Li
Microsoft Research Asia
,
Lluís Màrquez
Technical University of Catalonia
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 9 October 2010
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 431
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Word sense induction & disambiguation using hierarchical random graphs

EMNLP '10: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Broad-coverage hierarchical word sense disambiguation

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Unsupervised word sense disambiguation using bilingual comparable corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Word sense induction & disambiguation using hierarchical random graphs

EMNLP '10: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Broad-coverage hierarchical word sense disambiguation

Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora

Unsupervised word sense disambiguation using bilingual comparable corpora

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media