ABSTRACT
This paper presents a knowledge-based method for measuring the semantic-similarity of texts. While there is a large body of previous work focused on finding the semantic similarity of concepts and words, the application of these word-oriented methods to text similarity has not been yet explored. In this paper, we introduce a method that combines word-to-word similarity metrics into a text-to-text metric, and we show that this method outperforms the traditional text similarity metrics based on lexical matching.
- A. Budanitsky and G. Hirst. 2001. Semantic distance in word-net: An experimental, application-oriented evaluation of five measures. In Proceedings of the NAACL Workshop on Word-Net and Other Lexical Resources, Pittsburgh, June.Google Scholar
- I. Dagan, O. Glickman, and B. Magnini. 2005. The PASCAL recognising textual entailment challenge. In Proceedings of the PASCAL Workshop.Google Scholar
- W. B. Dolan, C. Quirk, and C. Brockett. 2004. Unsuper-vised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. 1998. Large margin classification using the perceptron algorithm. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 209--217, New York, NY. ACM Press. Google ScholarDigital Library
- J. Jiang and D. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.Google Scholar
- T. K. Landauer, P. Foltz, and D. Laham. 1998. Introduction to latent semantic analysis. Discourse Processes, 25.Google Scholar
- C. Leacock and M. Chodorow. 1998. Combining local context and WordNet sense similiarity for word sense disambiguation. In WordNet, An Electronic Lexical Database. The MIT Press.Google Scholar
- M. E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June. Google ScholarDigital Library
- C. Y. Lin and E. H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May. Google ScholarDigital Library
- D. Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI. Google ScholarDigital Library
- K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, PA, July. Google ScholarDigital Library
- S. Patwardhan, S. Banerjee, and T. Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, February. Google ScholarDigital Library
- P. Resnik. 1995. Using information content to evaluate semantic similarity. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada. Google ScholarDigital Library
- J. Rocchio, 1971. Relevance feedback in information retrieval. Prentice Hall, Ing. Englewood Cliffs, New Jersey.Google Scholar
- G. Salton and M. E. Lesk, 1971. Computer evaluation of indexing and text processing, pages 143--180. Prentice Hall, Ing. Englewood Cliffs, New Jersey.Google Scholar
- G. Salton, and A. Bukley. 1997a. Term weighting approaches in automatic text retrieval. In Readings in Information Retrieval. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
- G. Salton, A. Singhal, M. Mitra, and C. Buckley. 1997b. Automatic text structuring and summarization. Information Processing and Management, 2(32). Google ScholarDigital Library
- K. Sparck-Jones. 1972. A statistical interpretation of term specificity and its applicatoin in retrieval. Journal of Documentation, 28(1):11--21.Google ScholarCross Ref
- E. Voorhees. 1993. Using wordnet to disambiguate word senses for text retrieval. In Proceedings of the 16th annual international ACM SIGIR conference, Pittsburgh, PA. Google ScholarDigital Library
- Z. Wu and M. Palmer. 1994. Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico. Google ScholarDigital Library
- J. Xu and W. B. Croft. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th annual international ACM SIGIR conference, Zurich, Switzerland. Google ScholarDigital Library
Recommendations
Measuring Semantic Similarity between Words Using HowNet
ICCSIT '08: Proceedings of the 2008 International Conference on Computer Science and Information TechnologySemantic similarity between words is a fundamental issue for many natural language processing applications. The difficulty lies in that how to develop a computational method that is capable of generating satisfactory results close to how humans ...
Ontology-based approach for measuring semantic similarity
The challenge of measuring semantic similarity between words is to find a method that can simulate the thinking process of human. The use of computers to quantify and compare semantic similarities has become an important area of research in various ...
Comments