skip to main content
10.5555/2390524.2390645dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

Improving word representations via global context and multiple word prototypes

Published:08 July 2012Publication History

ABSTRACT

Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models.

References

  1. Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin, Jaz K, Thomas Hofmann, Tomaso Poggio, and John Shawe-taylor. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3: 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, ICML '08, pages 160--167, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. James Richard Curran. 2004. From distributional to semantic similarity. Technical report.Google ScholarGoogle Scholar
  4. Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept decompositions for large sparse text data using clustering. Mach. Learn., 42: 143--175, January. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Paramveer S. Dhillon, Dean Foster, and Lyle Ungar. 2011. Multi-view learning of word embeddings via cca. In Advances in Neural Information Processing Systems (NIPS), volume 24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Georgiana Dinu and Mirella Lapata. 2010. Measuring distributional similarity in context. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 1162--1172, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ahmad Emami, Peng Xu, and Frederick Jelinek. 2003. Using a connectionist model in a syntactical based language model. In Acoustics, Speech, and Signal Processing, pages 372--375.Google ScholarGoogle Scholar
  8. Katrin Erk and Sebastian Padó. 2008. A structured vector space model for word meaning in context. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 897--906, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: the concept revisited. In Proceedings of the 10th international conference on World Wide Web, WWW '01, pages 406--414, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI'07, pages 1606--1611, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thomas L Griffiths, Kevin R Canini, Adam N Sanborn, and Daniel J Navarro. 2009. Unifying rational models of categorization via the hierarchical dirichlet process. Brain, page 323328.Google ScholarGoogle Scholar
  12. David J Hess, Donald J Foss, and Patrick Carroll. 1995. Effects of global and local context on lexical processing during language comprehension. Journal of Experimental Psychology: General, 124(1): 62--82.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ping Li, Curt Burgess, and Kevin Lund. 2000. The acquisition of word meaning through global lexical cooccurrences.Google ScholarGoogle Scholar
  14. D. C. Liu and J. Nocedal. 1989. On the limited memory bfgs method for large scale optimization. Math. Program., 45(3): 503--528, December. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Suresh Manandhar, Ioannis P Klapaftis, Dmitriy Dligach, and Sameer S Pradhan. 2010. Semeval-2010 task 14: Word sense induction & disambiguation. Word Journal Of The International Linguistic Association, (July): 63--68.Google ScholarGoogle Scholar
  16. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. George A Miller and Walter G Charles. 1991. Contextual correlates of semantic similarity. Language & Cognitive Processes, 6(1): 1--28.Google ScholarGoogle ScholarCross RefCross Ref
  18. George A. Miller. 1995. Wordnet: A lexical database for english. Communications of the ACM, 38: 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In In Proceedings of ACL-08: HLT, pages 236--244.Google ScholarGoogle Scholar
  20. Andriy Mnih and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In Proceedings of the 24th international conference on Machine learning, ICML '07, pages 641--648, New York, NY, USA. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Andriy Mnih and Geoffrey Hinton. 2008. A scalable hierarchical distributed language model. In In NIPS.Google ScholarGoogle Scholar
  22. Ht Ng and J Zelle. 1997. Corpus-based approaches to semantic interpretation in natural language processing. AI Magazine, 18(4): 45--64.Google ScholarGoogle Scholar
  23. Siva Reddy, Ioannis Klapaftis, Diana McCarthy, and Suresh Manandhar. 2011. Dynamic and static prototype vectors for semantic composition. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 705--713, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing.Google ScholarGoogle Scholar
  24. Joseph Reisinger and Raymond Mooney. 2010a. A mixture model with sharing for lexical semantics. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP '10, pages 1173--1182, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Joseph Reisinger and Raymond J. Mooney. 2010b. Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 109--117, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yves Rosseel. 2002. Mixture models of categorization. Journal of Mathematical Psychology, 46: 178--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Commun. ACM, 8: 627--633, October. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hinrich Schütze. 1998. Automatic word sense discrimination. Journal of Computational Linguistics, 24: 97--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Holger Schwenk and Jean-luc Gauvain. 2002. Connectionist language modeling for large vocabulary continuous speech recognition. In In International Conference on Acoustics, Speech and Signal Processing, pages 765--768.Google ScholarGoogle Scholar
  30. Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv., 34: 1--47, March. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Cyrus Shaoul and Chris Westbury. 2010. The westbury lab wikipedia corpus.Google ScholarGoogle Scholar
  32. Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, pages 254--263, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y. Ng, and Christopher D. Manning. 2011a. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems 24.Google ScholarGoogle Scholar
  34. Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. 2011b. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  35. Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011c. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Search and Development in Information Retrieval, pages 41--47. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Stefan Thater, Hagen Fürstenau, and Manfred Pinkal. 2011. Word meaning in context: a simple and effective vector model. In Proceedings of the 5th International Joint Conference on Natural Language Processing, IJCNLP '11.Google ScholarGoogle Scholar
  38. Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 384--394, Stroudsburg, PA, USA. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    ACL '12: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
    July 2012
    1100 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 8 July 2012

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate85of443submissions,19%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader