research-article

Free Access

Measuring the semantic similarity of texts

Authors:
Courtney Corley

University of North Texas

University of North Texas
View Profile

,
Rada Mihalcea

University of North Texas

University of North Texas
View Profile

EMSEE '05: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and EntailmentJune 2005Pages 13–18

Published:30 June 2005Publication History

EMSEE '05: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

Pages 13–18

ABSTRACT

This paper presents a knowledge-based method for measuring the semantic-similarity of texts. While there is a large body of previous work focused on finding the semantic similarity of concepts and words, the application of these word-oriented methods to text similarity has not been yet explored. In this paper, we introduce a method that combines word-to-word similarity metrics into a text-to-text metric, and we show that this method outperforms the traditional text similarity metrics based on lexical matching.

References

A. Budanitsky and G. Hirst. 2001. Semantic distance in word-net: An experimental, application-oriented evaluation of five measures. In Proceedings of the NAACL Workshop on Word-Net and Other Lexical Resources, Pittsburgh, June.Google Scholar
I. Dagan, O. Glickman, and B. Magnini. 2005. The PASCAL recognising textual entailment challenge. In Proceedings of the PASCAL Workshop.Google Scholar
W. B. Dolan, C. Quirk, and C. Brockett. 2004. Unsuper-vised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. Google ScholarDigital Library
Y. Freund and R. E. Schapire. 1998. Large margin classification using the perceptron algorithm. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 209--217, New York, NY. ACM Press. Google ScholarDigital Library
J. Jiang and D. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.Google Scholar
T. K. Landauer, P. Foltz, and D. Laham. 1998. Introduction to latent semantic analysis. Discourse Processes, 25.Google Scholar
C. Leacock and M. Chodorow. 1998. Combining local context and WordNet sense similiarity for word sense disambiguation. In WordNet, An Electronic Lexical Database. The MIT Press.Google Scholar
M. E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June. Google ScholarDigital Library
C. Y. Lin and E. H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May. Google ScholarDigital Library
D. Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI. Google ScholarDigital Library
K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, July. Google ScholarDigital Library
S. Patwardhan, S. Banerjee, and T. Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February. Google ScholarDigital Library
P. Resnik. 1995. Using information content to evaluate semantic similarity. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada. Google ScholarDigital Library
J. Rocchio, 1971. Relevance feedback in information retrieval. Prentice Hall, Ing. Englewood Cliffs, New Jersey.Google Scholar
G. Salton and M. E. Lesk, 1971. Computer evaluation of indexing and text processing, pages 143--180. Prentice Hall, Ing. Englewood Cliffs, New Jersey.Google Scholar
G. Salton, and A. Bukley. 1997a. Term weighting approaches in automatic text retrieval. In Readings in Information Retrieval. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
G. Salton, A. Singhal, M. Mitra, and C. Buckley. 1997b. Automatic text structuring and summarization. Information Processing and Management, 2(32). Google ScholarDigital Library
K. Sparck-Jones. 1972. A statistical interpretation of term specificity and its applicatoin in retrieval. Journal of Documentation, 28(1):11--21.Google ScholarCross Ref
E. Voorhees. 1993. Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference, Pittsburgh, PA. Google ScholarDigital Library
Z. Wu and M. Palmer. 1994. Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico. Google ScholarDigital Library
J. Xu and W. B. Croft. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference, Zurich, Switzerland. Google ScholarDigital Library

Recommendations

Measuring Semantic Similarity between Words Using HowNet
ICCSIT '08: Proceedings of the 2008 International Conference on Computer Science and Information Technology

Semantic similarity between words is a fundamental issue for many natural language processing applications. The difficulty lies in that how to develop a computational method that is capable of generating satisfactory results close to how humans ...
Read More
Measuring semantic similarity: representations and methods
Read More
Ontology-based approach for measuring semantic similarity

The challenge of measuring semantic similarity between words is to find a method that can simulate the thinking process of human. The use of computers to quantify and compare semantic similarities has become an important area of research in various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMSEE '05: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
June 2005
69 pages
Program Chairs:
Bill Dolan
Microsoft Research
,
Ido Dagan
Bar Ilan University
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 30 June 2005
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 56
  Total Citations
  View Citations
- 3,431
  Total Downloads
- Downloads (Last 12 months)153
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Measuring the semantic similarity of texts

EMSEE '05: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

ABSTRACT

References

Cited By

Recommendations

Measuring Semantic Similarity between Words Using HowNet

Measuring semantic similarity: representations and methods

Ontology-based approach for measuring semantic similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Measuring the semantic similarity of texts

EMSEE '05: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

ABSTRACT

References

Cited By

Recommendations

Measuring Semantic Similarity between Words Using HowNet

Measuring semantic similarity: representations and methods

Ontology-based approach for measuring semantic similarity

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media