Predictive data mining: a practical guide | Guide books

Predictive data mining: a practical guideJanuary 1998

Authors:
Sholom M. Weiss
Rutgers Univ., New Brunswick, NJ
,
Nitin Indurkhya
Univ. of Sydney, Sydney, Australia

Publisher:

Morgan Kaufmann Publishers Inc.
340 Pine Street, Sixth Floor
San Francisco
CA
United States

ISBN:978-1-55860-403-2

Published:01 January 1998

Pages:

227

Available at Amazon

Bibliometrics

Abstract

No abstract available.

Cited By

Contributors

Sholom Menachem Weiss
IBM Research
- Publication Years1974 - 2016
- Publication counts71
- Citation count1,292
- Available for Download9
- Downloads (cumulative)7,728
- Downloads (12 months)315
- Downloads (6 weeks)43
- Average Downloads per Article859
- Average Citation per Article18
View Full Profile
Nitin Indurkhya
UNSW Sydney
- Publication Years1991 - 2016
- Publication counts29
- Citation count244
- Available for Download4
- Downloads (cumulative)1,458
- Downloads (12 months)13
- Downloads (6 weeks)1
- Average Downloads per Article365
- Average Citation per Article8
View Full Profile

Index Terms

Predictive data mining: a practical guide

Recommendations

Statistics based predictive geo-spatial data mining: forest fire hazardous area mapping application
APWeb'03: Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications

In this paper, we propose two statistics based predictive geo-spatial data mining methods and apply them to predict the forest fire hazardous area. The proposed prediction models used in geo-spatial data mining are likelihood ratio and conditional ...
Read More
Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner
Read More
An improved data mining approach using predictive itemsets

In this paper, we present a mining algorithm to improve the efficiency of finding large itemsets. Based on the concept of prediction proposed in the (n,p) algorithm, our method considers the data dependency in the given transactions to predict promising ...
Read More

Reviewer: Svetlana Segarceanu

Data mining is roughly defined as the “search for valuable information in large volumes of data.” This book represents an effort to systematize recent developments in the analysis and management of such data. The authors present the aspects of and approaches to a data mining process, and show how to integrate several techniques, by describing some real-life case studies. The book traces the development of data mining applications, making it a technical guide to performing large-scale analysis of real-life data warehouses. The structure of the work takes into account the main steps to be accomplished in a data mining process: data preparation; data reduction; data modeling and prediction; and case and solution analysis. The book begins with an attempt to define the concept of data mining and establish the framework for the subsequent discussion. The authors identify the underlying principles of data mining and related concepts, including the storage of massive quantities of data in electronic form (big data); centralized resources for these data (data warehouses); and timeliness (efficient storage and query of time-dependent information). They also discuss the main problems associated with this emerging field, which fall into two general types: prediction (classification, regression, and time series) and knowledge discovery (deviation detection, clustering, and association rules). The spreadsheet model, with two primary dimensions (cases and features), is used throughout the chapter to model the data. Chapter 2 analyzes classical statistics and prediction and applies them to the evaluation of big data. Because good predictive performance is an important goal, much of the chapter is devoted to error estimation. Chapter 3 concerns the data preparation phase and describes a standard spreadsheet form for data organization. It examines several forms of raw data and considers transformations that may help improve results, such as normalization, and several techniques for data smoothing. Among the topics covered are missing data, data with strong time-dependencies, and free-text data. Chapter 4 reviews techniques for reducing data dimensions. This chapter mainly addresses the use of optimal feature selection methods to reduce the number of features; clustering techniques for reducing the number of values; and reducing the number of cases. Methods such as Karhunen-Loeve expansion, decision trees, k -means clustering, nearest neighbor, and class entropy are examined. The authors suggest the use of decision trees as an alternative to the more frequently used methods of feature selection. Chapter 5 summarizes classification and applied prediction methods, which are broken down into three groups: mathematical (linear solutions, neural nets, and multiple adaptive regression by splines), distance (nearest neighbor), and logic (decision trees and decision rules). The authors analyze several facets of these methods—including solution complexity, data preparation and training, and the effects of data dimensions—and discuss their advantages and drawbacks. Chapter 6 compares the data reduction techniques from chapter 4 and the prediction methods from chapter 5 in several spreadsheets, so that readers can evaluate them side-by-side. The datasets are from medical, telecommunications, media, service, control, and sales data applications. Chapter 7 sketches some data mining problems and outlines their solutions, which are a combination of art and science. The examples focus on real-life data mining applications: text mining, process control, and outcome analysis. The chapter describes an organizational model for unifying the tasks of the previous chapters, and presents the protocols for preparing data and organizing the mining effort. Each chapter is supplemented with bibliographic and historical notes, most related to databases, statistics, and machine learning, which spawned data mining. The bibliography contains recent works. The book is richly illustrated, embodying the authors' stress on the role of visualization in offering a better understanding of the book's topics. Designers of data warehouses, or of any application involving massive quantities of data, will find the book helpful. A mathematical or statistical background is not required; college-level mathematics would suffice. Readers are also invited to test the authors' software at http:/www.data-miner.com .

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Browse Books

Sections

Cited By

Index Terms

Statistics based predictive geo-spatial data mining: forest fire hazardous area mapping application

Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner

An improved data mining approach using predictive itemsets

Access critical reviews of Computing literature here

Save to Binder

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Statistics based predictive geo-spatial data mining: forest fire hazardous area mapping application

Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner

An improved data mining approach using predictive itemsets

Access critical reviews of Computing literature here