research-article

Free Access

Automatic identification of non-compositional multi-word expressions using latent semantic analysis

Authors:
Graham Katz

University of Osnabrück

University of Osnabrück
View Profile

,
Eugenie Giesbrecht

University of Osnabrück

University of Osnabrück
View Profile

MWE '06: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying PropertiesJuly 2006Pages 12–19

Published:23 July 2006Publication History

MWE '06: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties

Pages 12–19

ABSTRACT

Making use of latent semantic analysis, we explore the hypothesis that local linguistic context can serve to identify multi-word expressions that have non-compositional meanings. We propose that vector-similarity between distribution vectors associated with an MWE as a whole and those associated with its constituent parts can serve as a good measure of the degree to which the MWE is compositional. We present experiments that show that low (cosine) similarity does, in fact, correlate with non-compositionality.

References

Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 1999. Modern Information Retrieval. ACM Press / Addison-Wesley. Google ScholarDigital Library
Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Dominic Widdows. 2003. An empirical model of multiword expression decomposability. In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pages 89--96, Sapporo, Japan. Google ScholarDigital Library
Colin Bannard, Timothy Baldwin, and Alex Lascarides. 2003. A statistical approach to the semantics of verb-particles. In Proceedings of the ACL-2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pages 65--72, Sapporo, Japan. Google ScholarDigital Library
Michael W. Berry, Zlatko Drmavc, and Elisabeth R. Jessup. 1999. Matrices, vector spaces, and information retrieval. SIAM Review, 41(2):335--362. Google ScholarDigital Library
Jean Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22(2):249--254. Google ScholarDigital Library
Scott Cederberg and Dominic Widdows. 2003. Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy extraction. In In Seventh Conference on Computational Natural Language Learning, pages 111--118, Edmonton, Canada, June. Google ScholarDigital Library
Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407.Google ScholarCross Ref
Stefan Evert and Hannah Kermes. 2003. Experiments on candidate data for collocation extraction. In Companion Volume to the Proceedings of the 10th Conference of The European Chapter of the Association for Computational Linguistics, pages 83--86, Budapest, Hungary. Google ScholarDigital Library
Stefan Evert and Brigitte Krenn. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pages 188--195, Toulouse, France. Google ScholarDigital Library
Stefan Evert. 2004. The Statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. thesis, University of Stuttgart.Google Scholar
Christiane Fellbaum. 1998. WordNet, an electronic lexical database. MIT Press, Cambridge, MA.Google Scholar
Nancy Ide and Jean Véronis. 1998. Word sense disambiguation: The state of the art. Computational Linguistics, 14(1).Google Scholar
Walter Kintsch. 2001. Predication. Cognitive Science, 25(2):173--202.Google ScholarCross Ref
Brigitte Krenn. 2000. The Usual Suspects: Data-Oriented Models for Identification and Representation of Lexical Collocations. Dissertations in Computational Linguistics and Language Technology. German Research Center for Artificial Intelligence and Saarland University, Saarbrücken, Germany.Google Scholar
Thomas K. Landauer and Susan T. Dumais. 1997. A solution to plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104:211--240.Google ScholarCross Ref
Thomas K. Landauer and Joseph Psotka. 2000. Simulating text understanding for educational applications with latent semantic analysis: Introduction to LSA. Interactive Learning Environments, 8(2):73--86.Google ScholarCross Ref
Dekang Lin. 1999. Automatic identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 317--324, College Park, MD. Google ScholarDigital Library
Christopher D. Manning and Hinrich Schütze. 1999. Foundations of Statistical NaturalLanguage Processing. The MIT Press, Cambridge, MA. Google ScholarDigital Library
Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann A. Copestake, and Dan Flickinger. 2002. Multiword expressions: A pain in the neck for NLP. In Proceedings of the 3rd International Conferences on Intelligent Text Processing and Computational Linguistics, pages 1--15. Google ScholarDigital Library
Patrick Schone and Daniel Jurafsky. 2001. Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In Proceedings of Empirical Methods in Natural Language Processing, Pittsburgh, PA.Google Scholar
Hinrich Schütze. 1998. Automatic word sense discrimination. Computational Linguistics, 24(1):97--124. Google ScholarDigital Library
Begoña Villada Moirón and Jörg Tiedemann. 2006. Identifying idiomatic expressions using automatic word-alignment. In Proceedings of the EACL 2006 Workshop on Multiword Expressions in a Multilingual Context, Trento, Italy.Google Scholar
Dominic Widdows and Stanley Peters. 2003. Word vectors and quantum logic: Experiments with negation and disjunction. In Eighth Mathematics of Language Conference, pages 141--150, Bloomington, Indiana.Google Scholar
Chengxiang Zhai. 1997. Exploiting context to identify lexical atoms --- a statistical view of linguistic context. In Proceedings of the International and Interdisciplinary Conference on Modelling and Using Context (CONTEXT-97), pages 119--129.Google Scholar

Recommendations

Multi-word expressions in textual inference: much ado about nothing?
TextInfer '09: Proceedings of the 2009 Workshop on Applied Textual Inference

Multi-word expressions (MWE) have seen much attention from the NLP community. In this paper, we investigate their impact on the recognition of textual entailment (RTE). Using the manual Microsoft Research annotations, we first manually count and ...
Read More
Automatic identification of infrequent word senses
COLING '04: Proceedings of the 20th international conference on Computational Linguistics

In this paper we show that an unsupervised method for ranking word senses automatically can be used to identify infrequently occurring senses. We demonstrate this using a ranking of noun senses derived from the BNC and evaluating on the sense-tagged ...
Read More
Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MWE '06: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
July 2006
72 pages
ISBN:1932432841
Conference Chairs:
Begona Villada Moiron
Federal University of Rio Grande do Sul (Brazil)
,
Diana McCarthy
University of Sussex (UK)
,
Stefan Evert
University of Osnabrueck (Germany)
,
Suzanne Stevenson
University of Toronto (Canada)
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 23 July 2006
Qualifiers
- research-article
Conference

Acceptance Rates
MWE '06 Paper Acceptance Rate10of23submissions,43%Overall Acceptance Rate31of69submissions,45%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 1,087
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties

ABSTRACT

References

Cited By

Recommendations

Multi-word expressions in textual inference: much ado about nothing?

Automatic identification of infrequent word senses

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06: Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties

ABSTRACT

References

Cited By

Recommendations

Multi-word expressions in textual inference: much ado about nothing?

Automatic identification of infrequent word senses

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media