Mining and monitoring evolving data

January 2000

Author:
Venkatesh Ganti,
Supervisor:
Raghu Ramakrishnan

Publisher:

The University of Wisconsin - Madison

ISBN:978-0-599-88402-1

Order Number:AAI9981907

Pages:

137

Purchase on ProQuest

Bibliometrics

Abstract

Data mining algorithms have been the focus of much recent research. Most previous data mining algorithms have either assumed that the input data is static, or have been designed for arbitrary insertions and deletions of data records. In practice, the input data to a data mining process resides in a large data warehouse whose data is kept up-to-date through periodic or occasional addition of blocks of data. In this dissertation, we study two important issues: (1) exploiting the systematic data evolution for efficiently maintaining data mining models, and (2) monitoring changes in data characteristics.

Considering a dynamic environment that evolves through systematic addition or deletion of blocks of data, we introduce a new dimension called the data span dimension, which allows user-defined selections of a time-varying subset of the database. We then describe efficient model maintenance algorithms for such time-varying subsets.

A data mining algorithm builds a model that captures interesting characteristics in the underlying data. Therefore, we develop the FOCUS framework for quantifying the difference, called deviation, between two datasets in terms of the models they induce. Our framework covers a wide variety of models including frequent itemsets, decision tree classifiers, and clusters, and captures standard measures of deviation such as the misclassification rate (in Machine Learning) and the chi-squared metric (in Statistics) as special cases. We also show how statistical techniques can be applied to the deviation measure to assess whether the difference between two models is meaningful (i.e., whether the underlying datasets have statistically significant differences in their characteristics). We then apply the FOCUS framework to monitor changes in data characteristics and to interactively explore datasets for unusual behavior.

Cited By

Fujiwara Y, Sakurai Y and Kitsuregawa M (2009). Fast likelihood search for hidden Markov models, ACM Transactions on Knowledge Discovery from Data, 3:4, (1-37), Online publication date: 1-Nov-2009.

Contributors

Venkatesh Ganti
Google LLC
- Publication Years1996 - 2013
- Publication counts39
- Citation count2,636
- Available for Download23
- Downloads (cumulative)23,919
- Downloads (12 months)786
- Downloads (6 weeks)85
- Average Downloads per Article1,040
- Average Citation per Article68
View Full Profile
Raghu Ramakrishnan
Microsoft Corporation
- Publication Years1985 - 2024
- Publication counts222
- Citation count21,906
- Available for Download144
- Downloads (cumulative)330,337
- Downloads (12 months)21,660
- Downloads (6 weeks)3,459
- Average Downloads per Article2,294
- Average Citation per Article99
View Full Profile

Recommendations

Mining and monitoring evolving data
Handbook of massive data sets

Data mining algorithms have been the focus of much recent research. The initial spurt of research on data mining algorithms typically considered static datasets. In practice, the input data to a data mining process resides in a large data warehouse ...
Read More
Data mining without data: a novel approach to privacy-preserving collaborative distributed data mining
WPES '11: Proceedings of the 10th annual ACM workshop on Privacy in the electronic society

With the proliferation of organizations that independently collect various types of data, with the growing awareness of corporations and public to keep their sensitive data private, and with the ever-increasing need of government and corporate policy ...
Read More
Provenance for data mining
TaPP '13: Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance

Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates ...
Read More

Comments

Browse Theses

Sections

Cited By

Mining and monitoring evolving data

Data mining without data: a novel approach to privacy-preserving collaborative distributed data mining

Provenance for data mining

Sections

Cited By

Save to Binder

Recommendations

Mining and monitoring evolving data

Data mining without data: a novel approach to privacy-preserving collaborative distributed data mining

Provenance for data mining