skip to main content
10.5555/1613692.1613696dlproceedingsArticle/Chapter ViewAbstractPublication PagesmweConference Proceedingsconference-collections
research-article
Free Access

Automatic identification of non-compositional multi-word expressions using latent semantic analysis

Published:23 July 2006Publication History

ABSTRACT

Making use of latent semantic analysis, we explore the hypothesis that local linguistic context can serve to identify multi-word expressions that have non-compositional meanings. We propose that vector-similarity between distribution vectors associated with an MWE as a whole and those associated with its constituent parts can serve as a good measure of the degree to which the MWE is compositional. We present experiments that show that low (cosine) similarity does, in fact, correlate with non-compositionality.

References

  1. Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 1999. Modern Information Retrieval. ACM Press / Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows. 2003. An empirical model of multiword expression decomposability. In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pages 89--96, Sapporo, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Colin Bannard, Timothy Baldwin, and Alex Lascarides. 2003. A statistical approach to the semantics of verb-particles. In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pages 65--72, Sapporo, Japan. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael W. Berry, Zlatko Drmavc, and Elisabeth R. Jessup. 1999. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jean Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Scott Cederberg and Dominic Widdows. 2003. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In In Seventh Conference on Computational Natural Language Learning, pages 111--118, Edmonton, Canada, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407.Google ScholarGoogle ScholarCross RefCross Ref
  8. Stefan Evert and Hannah Kermes. 2003. Experiments on candidate data for collocation extraction. In Companion Volume to the Proceedings of the 10th Conference of The European Chapter of the Association for Computational Linguistics, pages 83--86, Budapest, Hungary. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stefan Evert and Brigitte Krenn. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pages 188--195, Toulouse, France. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stefan Evert. 2004. The Statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. thesis, University of Stuttgart.Google ScholarGoogle Scholar
  11. Christiane Fellbaum. 1998. WordNet, an electronic lexical database. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  12. Nancy Ide and Jean Véronis. 1998. Word sense disambiguation: The state of the art. Computational Linguistics, 14(1).Google ScholarGoogle Scholar
  13. Walter Kintsch. 2001. Predication. Cognitive Science, 25(2):173--202.Google ScholarGoogle ScholarCross RefCross Ref
  14. Brigitte Krenn. 2000. The Usual Suspects: Data-Oriented Models for Identification and Representation of Lexical Collocations. Dissertations in Computational Linguistics and Language Technology. German Research Center for Artificial Intelligence and Saarland University, Saarbrücken, Germany.Google ScholarGoogle Scholar
  15. Thomas K. Landauer and Susan T. Dumais. 1997. A solution to plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:211--240.Google ScholarGoogle ScholarCross RefCross Ref
  16. Thomas K. Landauer and Joseph Psotka. 2000. Simulating text understanding for educational applications with latent semantic analysis: Introduction to LSA. Interactive Learning Environments, 8(2):73--86.Google ScholarGoogle ScholarCross RefCross Ref
  17. Dekang Lin. 1999. Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 317--324, College Park, MD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical NaturalLanguage Processing. The MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann A. Copestake, and Dan Flickinger. 2002. Multiword expressions: A pain in the neck for NLP. In Proceedings of the 3rd International Conferences on Intelligent Text Processing and Computational Linguistics, pages 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Patrick Schone and Daniel Jurafsky. 2001. Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In Proceedings of Empirical Methods in Natural Language Processing, Pittsburgh, PA.Google ScholarGoogle Scholar
  21. Hinrich Schütze. 1998. Automatic word sense discrimination. Computational Linguistics, 24(1):97--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Begoña Villada Moirón and Jörg Tiedemann. 2006. Identifying idiomatic expressions using automatic word-alignment. In Proceedings of the EACL 2006 Workshop on Multiword Expressions in a Multilingual Context, Trento, Italy.Google ScholarGoogle Scholar
  23. Dominic Widdows and Stanley Peters. 2003. Word vectors and quantum logic: Experiments with negation and disjunction. In Eighth Mathematics of Language Conference, pages 141--150, Bloomington, Indiana.Google ScholarGoogle Scholar
  24. Chengxiang Zhai. 1997. Exploiting context to identify lexical atoms --- a statistical view of linguistic context. In Proceedings of the International and Interdisciplinary Conference on Modelling and Using Context (CONTEXT-97), pages 119--129.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    MWE '06: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
    July 2006
    72 pages
    ISBN:1932432841

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 23 July 2006

    Qualifiers

    • research-article

    Acceptance Rates

    MWE '06 Paper Acceptance Rate10of23submissions,43%Overall Acceptance Rate31of69submissions,45%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader