research-article

Free Access

Improving word representations via global context and multiple word prototypes

Authors:
Eric H. Huang

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Richard Socher

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Christopher D. Manning

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Andrew Y. Ng

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1July 2012Pages 873–882

Published:08 July 2012Publication History

ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Pages 873–882

ABSTRACT

Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models.

References

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin, Jaz K, Thomas Hofmann, Tomaso Poggio, and John Shawe-taylor. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3: 1137--1155. Google ScholarDigital Library
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, ICML '08, pages 160--167, New York, NY, USA. ACM. Google ScholarDigital Library
James Richard Curran. 2004. From distributional to semantic similarity. Technical report.Google Scholar
Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept decompositions for large sparse text data using clustering. Mach. Learn., 42: 143--175, January. Google ScholarDigital Library
Paramveer S. Dhillon, Dean Foster, and Lyle Ungar. 2011. Multi-view learning of word embeddings via cca. In Advances in Neural Information Processing Systems (NIPS), volume 24.Google ScholarDigital Library
Georgiana Dinu and Mirella Lapata. 2010. Measuring distributional similarity in context. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 1162--1172, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
Ahmad Emami, Peng Xu, and Frederick Jelinek. 2003. Using a connectionist model in a syntactical based language model. In Acoustics, Speech, and Signal Processing, pages 372--375.Google Scholar
Katrin Erk and Sebastian Padó. 2008. A structured vector space model for word meaning in context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 897--906, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: the concept revisited. In Proceedings of the 10th international conference on World Wide Web, WWW '01, pages 406--414, New York, NY, USA. ACM. Google ScholarDigital Library
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI'07, pages 1606--1611, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
Thomas L Griffiths, Kevin R Canini, Adam N Sanborn, and Daniel J Navarro. 2009. Unifying rational models of categorization via the hierarchical dirichlet process. Brain, page 323328.Google Scholar
David J Hess, Donald J Foss, and Patrick Carroll. 1995. Effects of global and local context on lexical processing during language comprehension. Journal of Experimental Psychology: General, 124(1): 62--82.Google ScholarCross Ref
Ping Li, Curt Burgess, and Kevin Lund. 2000. The acquisition of word meaning through global lexical cooccurrences.Google Scholar
D. C. Liu and J. Nocedal. 1989. On the limited memory bfgs method for large scale optimization. Math. Program., 45(3): 503--528, December. Google ScholarDigital Library
Suresh Manandhar, Ioannis P Klapaftis, Dmitriy Dligach, and Sameer S Pradhan. 2010. Semeval-2010 task 14: Word sense induction & disambiguation. Word Journal Of The International Linguistic Association, (July): 63--68.Google Scholar
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. Google ScholarDigital Library
George A Miller and Walter G Charles. 1991. Contextual correlates of semantic similarity. Language & Cognitive Processes, 6(1): 1--28.Google ScholarCross Ref
George A. Miller. 1995. Wordnet: A lexical database for english. Communications of the ACM, 38: 39--41. Google ScholarDigital Library
Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In In Proceedings of ACL-08: HLT, pages 236--244.Google Scholar
Andriy Mnih and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning, ICML '07, pages 641--648, New York, NY, USA. ACM. Google ScholarDigital Library
Andriy Mnih and Geoffrey Hinton. 2008. A scalable hierarchical distributed language model. In In NIPS.Google Scholar
Ht Ng and J Zelle. 1997. Corpus-based approaches to semantic interpretation in natural language processing. AI Magazine, 18(4): 45--64.Google Scholar
Siva Reddy, Ioannis Klapaftis, Diana McCarthy, and Suresh Manandhar. 2011. Dynamic and static prototype vectors for semantic composition. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 705--713, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing.Google Scholar
Joseph Reisinger and Raymond Mooney. 2010a. A mixture model with sharing for lexical semantics. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 1173--1182, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
Joseph Reisinger and Raymond J. Mooney. 2010b. Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 109--117, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
Yves Rosseel. 2002. Mixture models of categorization. Journal of Mathematical Psychology, 46: 178--210. Google ScholarDigital Library
Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Commun. ACM, 8: 627--633, October. Google ScholarDigital Library
Hinrich Schütze. 1998. Automatic word sense discrimination. Journal of Computational Linguistics, 24: 97--123. Google ScholarDigital Library
Holger Schwenk and Jean-luc Gauvain. 2002. Connectionist language modeling for large vocabulary continuous speech recognition. In In International Conference on Acoustics, Speech and Signal Processing, pages 765--768.Google Scholar
Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv., 34: 1--47, March. Google ScholarDigital Library
Cyrus Shaoul and Chris Westbury. 2010. The westbury lab wikipedia corpus.Google Scholar
Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 254--263, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. 2011a. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems 24.Google Scholar
Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. 2011b. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML).Google Scholar
Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011c. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP). Google ScholarDigital Library
Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Search and Development in Information Retrieval, pages 41--47. ACM Press. Google ScholarDigital Library
Stefan Thater, Hagen Fürstenau, and Manfred Pinkal. 2011. Word meaning in context: a simple and effective vector model. In Proceedings of the 5th International Joint Conference on Natural Language Processing, IJCNLP '11.Google Scholar
Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 384--394, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library

Recommendations

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Read More
Improving bilingual word embeddings mapping with monolingual context information
Abstract
Bilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train ...
Read More
Learning syntactic categories using paradigmatic representations of word context
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
July 2012
1100 pages
General Chair:
Haizhou Li
Institute for Infocomm Research
,
Program Chairs:
Chin-Yew Lin
Microsoft Research Asia
,
Miles Osborne
University of Edinburgh
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 8 July 2012
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 115
  Total Citations
  View Citations
- 4,086
  Total Downloads
- Downloads (Last 12 months)51
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving word representations via global context and multiple word prototypes

ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

ABSTRACT

References

Cited By

Recommendations

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Improving bilingual word embeddings mapping with monolingual context information

Learning syntactic categories using paradigmatic representations of word context

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving word representations via global context and multiple word prototypes

ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

ABSTRACT

References

Cited By

Recommendations

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Improving bilingual word embeddings mapping with monolingual context information

Learning syntactic categories using paradigmatic representations of word context

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media