This dissertation investigates and proposes ways to quantify and measure semantic similarity between texts. The general approach is to rely on linguistic information at various levels, including lexical, lexico-semantic, and syntactic. The approach starts by mapping texts onto structured representations that include lexical, lexico-semantic, and syntactic information. The representation is then used as input to methods designed to measure the semantic similarity between texts based on the available linguistic information. While world knowledge is needed to properly assess semantic similarity of texts, in our approach world knowledge is not used, which is a weakness of it. We limit ourselves to answering the question of how successfully one can measure the semantic similarity of texts using just linguistic information. The lexical information in the original texts is retained by using the words in the corresponding representations of the texts. Syntactic information is encoded using dependency relations trees, which represent explicitly the syntactic relations between words. Word-level semantic information is relatively encoded through the use of semantic similarity measures like WordNet Similarity or explicitly encoded using vectorial representations such as Latent Semantic Analysis (LSA). Several methods are being studied to compare the representations, ranging from simple lexical overlap, to more complex methods such as comparing semantic representations in vector spaces as well as syntactic structures. Furthermore, a few powerful kernel models are proposed to use in combination with Support Vector Machine (SVM) classifiers for the case in which the semantic similarity problem is modeled as a classification task.
Recommendations
Measuring Semantic Similarity between Words Using HowNet
ICCSIT '08: Proceedings of the 2008 International Conference on Computer Science and Information TechnologySemantic similarity between words is a fundamental issue for many natural language processing applications. The difficulty lies in that how to develop a computational method that is capable of generating satisfactory results close to how humans ...
Ontology-based approach for measuring semantic similarity
The challenge of measuring semantic similarity between words is to find a method that can simulate the thinking process of human. The use of computers to quantify and compare semantic similarities has become an important area of research in various ...
Semantic similarity measures for Malay sentences
ICADL'07: Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiersThe concept of semantic similarity is an important element in many applications such as information extraction, information retrieval, document clustering and ontology learning. Most of the previous works regarding semantic similarity measures have been ...