skip to main content
10.5555/1631862.1631865dlproceedingsArticle/Chapter ViewAbstractPublication PagesemseeConference Proceedingsconference-collections
research-article
Free Access

Measuring the semantic similarity of texts

Published:30 June 2005Publication History

ABSTRACT

This paper presents a knowledge-based method for measuring the semantic-similarity of texts. While there is a large body of previous work focused on finding the semantic similarity of concepts and words, the application of these word-oriented methods to text similarity has not been yet explored. In this paper, we introduce a method that combines word-to-word similarity metrics into a text-to-text metric, and we show that this method outperforms the traditional text similarity metrics based on lexical matching.

References

  1. A. Budanitsky and G. Hirst. 2001. Semantic distance in word-net: An experimental, application-oriented evaluation of five measures. In Proceedings of the NAACL Workshop on Word-Net and Other Lexical Resources, Pittsburgh, June.Google ScholarGoogle Scholar
  2. I. Dagan, O. Glickman, and B. Magnini. 2005. The PASCAL recognising textual entailment challenge. In Proceedings of the PASCAL Workshop.Google ScholarGoogle Scholar
  3. W. B. Dolan, C. Quirk, and C. Brockett. 2004. Unsuper-vised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Freund and R. E. Schapire. 1998. Large margin classification using the perceptron algorithm. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 209--217, New York, NY. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Jiang and D. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.Google ScholarGoogle Scholar
  6. T. K. Landauer, P. Foltz, and D. Laham. 1998. Introduction to latent semantic analysis. Discourse Processes, 25.Google ScholarGoogle Scholar
  7. C. Leacock and M. Chodorow. 1998. Combining local context and WordNet sense similiarity for word sense disambiguation. In WordNet, An Electronic Lexical Database. The MIT Press.Google ScholarGoogle Scholar
  8. M. E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Y. Lin and E. H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, July. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Patwardhan, S. Banerjee, and T. Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Resnik. 1995. Using information content to evaluate semantic similarity. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Rocchio, 1971. Relevance feedback in information retrieval. Prentice Hall, Ing. Englewood Cliffs, New Jersey.Google ScholarGoogle Scholar
  15. G. Salton and M. E. Lesk, 1971. Computer evaluation of indexing and text processing, pages 143--180. Prentice Hall, Ing. Englewood Cliffs, New Jersey.Google ScholarGoogle Scholar
  16. G. Salton, and A. Bukley. 1997a. Term weighting approaches in automatic text retrieval. In Readings in Information Retrieval. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Salton, A. Singhal, M. Mitra, and C. Buckley. 1997b. Automatic text structuring and summarization. Information Processing and Management, 2(32). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Sparck-Jones. 1972. A statistical interpretation of term specificity and its applicatoin in retrieval. Journal of Documentation, 28(1):11--21.Google ScholarGoogle ScholarCross RefCross Ref
  19. E. Voorhees. 1993. Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference, Pittsburgh, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Wu and M. Palmer. 1994. Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Xu and W. B. Croft. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference, Zurich, Switzerland. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    EMSEE '05: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
    June 2005
    69 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 30 June 2005

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader