ABSTRACT
Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models.
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin, Jaz K, Thomas Hofmann, Tomaso Poggio, and John Shawe-taylor. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3: 1137--1155. Google ScholarDigital Library
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, ICML '08, pages 160--167, New York, NY, USA. ACM. Google ScholarDigital Library
- James Richard Curran. 2004. From distributional to semantic similarity. Technical report.Google Scholar
- Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept decompositions for large sparse text data using clustering. Mach. Learn., 42: 143--175, January. Google ScholarDigital Library
- Paramveer S. Dhillon, Dean Foster, and Lyle Ungar. 2011. Multi-view learning of word embeddings via cca. In Advances in Neural Information Processing Systems (NIPS), volume 24.Google ScholarDigital Library
- Georgiana Dinu and Mirella Lapata. 2010. Measuring distributional similarity in context. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 1162--1172, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
- Ahmad Emami, Peng Xu, and Frederick Jelinek. 2003. Using a connectionist model in a syntactical based language model. In Acoustics, Speech, and Signal Processing, pages 372--375.Google Scholar
- Katrin Erk and Sebastian Padó. 2008. A structured vector space model for word meaning in context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 897--906, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
- Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: the concept revisited. In Proceedings of the 10th international conference on World Wide Web, WWW '01, pages 406--414, New York, NY, USA. ACM. Google ScholarDigital Library
- Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI'07, pages 1606--1611, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
- Thomas L Griffiths, Kevin R Canini, Adam N Sanborn, and Daniel J Navarro. 2009. Unifying rational models of categorization via the hierarchical dirichlet process. Brain, page 323328.Google Scholar
- David J Hess, Donald J Foss, and Patrick Carroll. 1995. Effects of global and local context on lexical processing during language comprehension. Journal of Experimental Psychology: General, 124(1): 62--82.Google ScholarCross Ref
- Ping Li, Curt Burgess, and Kevin Lund. 2000. The acquisition of word meaning through global lexical cooccurrences.Google Scholar
- D. C. Liu and J. Nocedal. 1989. On the limited memory bfgs method for large scale optimization. Math. Program., 45(3): 503--528, December. Google ScholarDigital Library
- Suresh Manandhar, Ioannis P Klapaftis, Dmitriy Dligach, and Sameer S Pradhan. 2010. Semeval-2010 task 14: Word sense induction & disambiguation. Word Journal Of The International Linguistic Association, (July): 63--68.Google Scholar
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. Google ScholarDigital Library
- George A Miller and Walter G Charles. 1991. Contextual correlates of semantic similarity. Language & Cognitive Processes, 6(1): 1--28.Google ScholarCross Ref
- George A. Miller. 1995. Wordnet: A lexical database for english. Communications of the ACM, 38: 39--41. Google ScholarDigital Library
- Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In In Proceedings of ACL-08: HLT, pages 236--244.Google Scholar
- Andriy Mnih and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning, ICML '07, pages 641--648, New York, NY, USA. ACM. Google ScholarDigital Library
- Andriy Mnih and Geoffrey Hinton. 2008. A scalable hierarchical distributed language model. In In NIPS.Google Scholar
- Ht Ng and J Zelle. 1997. Corpus-based approaches to semantic interpretation in natural language processing. AI Magazine, 18(4): 45--64.Google Scholar
- Siva Reddy, Ioannis Klapaftis, Diana McCarthy, and Suresh Manandhar. 2011. Dynamic and static prototype vectors for semantic composition. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 705--713, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing.Google Scholar
- Joseph Reisinger and Raymond Mooney. 2010a. A mixture model with sharing for lexical semantics. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 1173--1182, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
- Joseph Reisinger and Raymond J. Mooney. 2010b. Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 109--117, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
- Yves Rosseel. 2002. Mixture models of categorization. Journal of Mathematical Psychology, 46: 178--210. Google ScholarDigital Library
- Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Commun. ACM, 8: 627--633, October. Google ScholarDigital Library
- Hinrich Schütze. 1998. Automatic word sense discrimination. Journal of Computational Linguistics, 24: 97--123. Google ScholarDigital Library
- Holger Schwenk and Jean-luc Gauvain. 2002. Connectionist language modeling for large vocabulary continuous speech recognition. In In International Conference on Acoustics, Speech and Signal Processing, pages 765--768.Google Scholar
- Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv., 34: 1--47, March. Google ScholarDigital Library
- Cyrus Shaoul and Chris Westbury. 2010. The westbury lab wikipedia corpus.Google Scholar
- Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 254--263, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
- Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. 2011a. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems 24.Google Scholar
- Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. 2011b. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML).Google Scholar
- Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011c. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP). Google ScholarDigital Library
- Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Search and Development in Information Retrieval, pages 41--47. ACM Press. Google ScholarDigital Library
- Stefan Thater, Hagen Fürstenau, and Manfred Pinkal. 2011. Word meaning in context: a simple and effective vector model. In Proceedings of the 5th International Joint Conference on Natural Language Processing, IJCNLP '11.Google Scholar
- Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 384--394, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarDigital Library
Recommendations
Two-Word Collocation Extraction Using Monolingual Word Alignment Method
Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Improving bilingual word embeddings mapping with monolingual context information
AbstractBilingual word embeddings (BWEs) play a very important role in many natural language processing (NLP) tasks, especially cross-lingual tasks such as machine translation (MT) and cross-language information retrieval. Most existing methods to train ...
Learning syntactic categories using paradigmatic representations of word context
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language LearningWe investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic ...
Comments