Text association mining with cross-sentence inference, structure-based document model and multi-relational text mining

January 2009

Author:
Supphachai Thaicharoen
University of Colorado at Denver
,
Advisers:
Tom Altman
University of Colorado at Denver
,
Krzysztof J. Cios
University of Colorado at Denver

Publisher:

University of Colorado at Denver
Denver, CO
United States

ISBN:978-1-109-18848-6

Order Number:AAI3360857

Pages:

131

Purchase on ProQuest

Bibliometrics

Abstract

With an exponential growth of published documents, text mining becomes a vital tool for an automated extraction of information and discovery of hidden information/knowledge. We begin this dissertation with an overview of text mining covering key definitions, pre-processing, feature selection, text representation and types of text mining. Then, we describe a fundamental text mining approach that we used for the development of a chromosome-21 database. Next, we present our three novel text mining techniques: (i) text association mining with cross-sentence inference, (ii) structure-based document model, and (iii) multi-relational text mining. Our techniques emphasize novel hypothesis generation, document representation and multi-relational discovery, respectively. In the text association mining with cross-sentence inference, statistical co-occurrences of terms and syntactic sentence structure analysis are initially used to find associations among key terms in documents. Subsequently, potential novel hypotheses are derived from the discovered associations. In a different way, the structure-based document model introduces two novel document representations for text documents that take into account not only term frequencies and patterns of term occurrences, but also the document's structural information. Based on the experimental results, our structure-based document models are superior to existing non-structure-based ones. Finally, the multi-relational text mining enhances a literature-based discovery method with multi-relational data mining and Inductive Logic Programming. It is aimed to discover relational knowledge in forms of frequent relational patterns and relational association rules from disjoint sets of literatures. These relational patterns and rules are complementary to the indirect connections found by existing literature-based discovery, and can be used for exploratory research.

Contributors

Tom Altman
University of Colorado Denver
- Publication Years1984 - 2014
- Publication counts15
- Citation count16
- Available for Download1
- Downloads (cumulative)69
- Downloads (12 months)23
- Downloads (6 weeks)3
- Average Downloads per Article69
- Average Citation per Article1
View Full Profile
Krzysztof J Cios
Virginia Commonwealth University
- Publication Years1989 - 2023
- Publication counts36
- Citation count387
- Available for Download1
- Downloads (cumulative)69
- Downloads (12 months)23
- Downloads (6 weeks)3
- Average Downloads per Article69
- Average Citation per Article11
View Full Profile
Supphachai Thaicharoen
University of Colorado Denver
- Publication Years2008 - 2009
- Publication counts2
- Citation count1
- Available for Download1
- Downloads (cumulative)69
- Downloads (12 months)23
- Downloads (6 weeks)3
- Average Downloads per Article69
- Average Citation per Article1
View Full Profile

Recommendations

Association rule mining and quantitative association rule mining among infrequent items
Read More
Mining Text Using Keyword Distributions

Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work ...
Read More
Generalized association rule mining using an efficient data structure

Research highlights We designed a data structure to generate the association rules between the items at different levels in a taxonomy tree. The proposed algorithms generate fewer candidate itemsets. The method prunes a large amount of irrelevant rules ...
Read More

Comments

Browse Theses

Sections

Association rule mining and quantitative association rule mining among infrequent items

Mining Text Using Keyword Distributions

Generalized association rule mining using an efficient data structure

Sections

Save to Binder

Recommendations

Association rule mining and quantitative association rule mining among infrequent items

Mining Text Using Keyword Distributions

Generalized association rule mining using an efficient data structure