With an exponential growth of published documents, text mining becomes a vital tool for an automated extraction of information and discovery of hidden information/knowledge. We begin this dissertation with an overview of text mining covering key definitions, pre-processing, feature selection, text representation and types of text mining. Then, we describe a fundamental text mining approach that we used for the development of a chromosome-21 database. Next, we present our three novel text mining techniques: (i) text association mining with cross-sentence inference, (ii) structure-based document model, and (iii) multi-relational text mining. Our techniques emphasize novel hypothesis generation, document representation and multi-relational discovery, respectively. In the text association mining with cross-sentence inference, statistical co-occurrences of terms and syntactic sentence structure analysis are initially used to find associations among key terms in documents. Subsequently, potential novel hypotheses are derived from the discovered associations. In a different way, the structure-based document model introduces two novel document representations for text documents that take into account not only term frequencies and patterns of term occurrences, but also the document's structural information. Based on the experimental results, our structure-based document models are superior to existing non-structure-based ones. Finally, the multi-relational text mining enhances a literature-based discovery method with multi-relational data mining and Inductive Logic Programming. It is aimed to discover relational knowledge in forms of frequent relational patterns and relational association rules from disjoint sets of literatures. These relational patterns and rules are complementary to the indirect connections found by existing literature-based discovery, and can be used for exploratory research.
Recommendations
Mining Text Using Keyword Distributions
Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work ...
Generalized association rule mining using an efficient data structure
Research highlights We designed a data structure to generate the association rules between the items at different levels in a taxonomy tree. The proposed algorithms generate fewer candidate itemsets. The method prunes a large amount of irrelevant rules ...