skip to main content
Skip header Section
Predictive data mining: a practical guideJanuary 1998
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-1-55860-403-2
Published:01 January 1998
Pages:
227
Skip Bibliometrics Section
Bibliometrics
Abstract

No abstract available.

Cited By

  1. Khan M, Ali A, Alharbi Y and Farias G (2022). Predicting and Preventing Crime, Complexity, 2022, Online publication date: 1-Jan-2022.
  2. Alves L, Vasconcellos F and Nogueira B (2022). SeSG: a search string generator for Secondary Studies with hybrid search strategies using text mining, Empirical Software Engineering, 27:5, Online publication date: 1-Sep-2022.
  3. Palacios-González F and García-Fernández R (2020). A faster algorithm to estimate multiresolution densities, Computational Statistics, 35:3, (1207-1230), Online publication date: 1-Sep-2020.
  4. ACM
    Alexopoulos C, Lachana Z, Androutsopoulou A, Diamantopoulou V, Charalabidis Y and Loutsaris M How Machine Learning is Changing e-Government Proceedings of the 12th International Conference on Theory and Practice of Electronic Governance, (354-363)
  5. Attaran M and Attaran S (2018). The Rise of Embedded Analytics, International Journal of Business Intelligence Research, 9:1, (16-37), Online publication date: 1-Jan-2018.
  6. Attaran M and Attaran S (2018). Opportunities and Challenges of Implementing Predictive Analytics for Competitive Advantage, International Journal of Business Intelligence Research, 9:2, (1-26), Online publication date: 1-Jul-2018.
  7. ACM
    Grillini A, Ombelet D, Soans R and Cornelissen F Towards using the spatio-temporal properties of eye movements to classify visual field defects Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications, (1-5)
  8. Shah C, Hendahewa C and González-Ibáñez R (2016). Rain or shine? Forecasting search process performance in exploratory search tasks, Journal of the Association for Information Science and Technology, 67:7, (1607-1623), Online publication date: 1-Jul-2016.
  9. Ala'raj M and Abbod M (2016). A new hybrid ensemble credit scoring model based on classifiers consensus system approach, Expert Systems with Applications: An International Journal, 64:C, (36-55), Online publication date: 1-Dec-2016.
  10. ACM
    de Carvalho D, Rocha R, Fernandes V and Neves S Business Intelligence Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering, (89-92)
  11. ACM
    Moreno V, Génova G, Alejandres M and Fraga A Automatic classification of web images as UML diagrams Proceedings of the 4th Spanish Conference on Information Retrieval, (1-8)
  12. Tüfekci P (2016). Classification-based prediction models for stock price index movement, Intelligent Data Analysis, 20:2, (357-376), Online publication date: 1-Jan-2016.
  13. Pehrsson L, Frantzén M, Aslam T and Ng A Aggregated line modeling for simulation and optimization of manufacturing systems Proceedings of the 2015 Winter Simulation Conference, (3632-3643)
  14. ACM
    G.C. P, Sun C, K. K, Zhang H, Yang F, Rampalli N, Prasad S, Arcaute E, Krishnan G, Deep R, Raghavendra V and Doan A Why Big Data Industrial Systems Need Rules and What We Can Do About It Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, (265-276)
  15. ACM
    Fan W and Bifet A (2013). Mining big data, ACM SIGKDD Explorations Newsletter, 14:2, (1-5), Online publication date: 30-Apr-2013.
  16. Wang T, Guan S and Liu F Entropic feature discrimination ability for pattern classification based on neural IAL Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II, (30-37)
  17. Niu J, Qiu M, Wang X, Li J, Wu G and Chen T (2012). Cost Minimization with HPDFG and Data Mining for Heterogeneous DSP, Journal of Signal Processing Systems, 67:3, (213-228), Online publication date: 1-Jun-2012.
  18. ACM
    Kosina P and Gama J Very Fast Decision Rules for multi-class problems Proceedings of the 27th Annual ACM Symposium on Applied Computing, (795-800)
  19. Srivastava R, Roy S, Yan S and Sim T Multi-actor emotion recognition in movies using a bimodal approach Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II, (465-475)
  20. Podgorelec V Expert-assisted classification rules extraction algorithm Proceedings of the 14th east European conference on Advances in databases and information systems, (450-462)
  21. Ren Z Study on building data mining application Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing, (5059-5062)
  22. Zhao H, Sinha A and Ge W (2009). Effects of feature construction on classification performance, Expert Systems with Applications: An International Journal, 36:2, (2633-2644), Online publication date: 1-Mar-2009.
  23. Abdelzaher T, Khan M, Le H, Ahmadi H and Han J Data mining for diagnostic debugging in sensor networks Proceedings of the Second international conference on Knowledge Discovery from Sensor Data, (1-24)
  24. Cardoso G and Gomide F (2007). Newspaper demand prediction and replacement model based on fuzzy clustering and rules, Information Sciences: an International Journal, 177:21, (4799-4809), Online publication date: 1-Nov-2007.
  25. Di Gesù V Data analysis and bioinformatics Proceedings of the 2nd international conference on Pattern recognition and machine intelligence, (373-388)
  26. Gámez J, Mateo J and Puerta J Improving revisitation browsers capability by using a dynamic bookmarks personal toolbar Proceedings of the 8th international conference on Web information systems engineering, (643-652)
  27. Zhang K, Orgun M and Zhang K A prediction-based visual approach for cluster exploration and cluster validation by HOV3 Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, (336-349)
  28. Evangelopoulos N Text mining for customer satisfaction monitoring Proceedings of the 5th WSEAS international conference on Simulation, modelling and optimization, (202-207)
  29. Libralao G, Almeida O and Carvalho A Classification of ophthalmologic images using an ensemble of classifiers Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence, (380-389)
  30. Bonacina S, Masseroli M and Pinciroli F Foreseeing promising bio-medical findings for effective applications of data mining Proceedings of the 6th International conference on Biological and Medical Data Analysis, (130-136)
  31. Podgorelec V, Kokol P, Stiglic M, Heriko M and Rozman I (2005). Knowledge discovery with classification rules in a cardiovascular dataset, Computer Methods and Programs in Biomedicine, 80, (S39-S49), Online publication date: 1-Dec-2005.
  32. Mamčenko J and Kulvietiene R Data mining technique for collaborative server log file analysis Proceedings of the 9th WSEAS International Conference on Communications, (1-5)
  33. ACM
    Crone S, Lessmann S and Stahlbock R Utility based data mining for time series analysis Proceedings of the 1st international workshop on Utility-based data mining, (59-68)
  34. ACM
    Filimon S (2004). Multilevel security, XRDS: Crossroads, The ACM Magazine for Students, 10:3, (4-4), Online publication date: 1-Apr-2004.
  35. Sinha A and May J (2004). Evaluating and Tuning Predictive Data Mining Models Using Receiver Operating Characteristic Curves, Journal of Management Information Systems, 21:3, (249-280), Online publication date: 1-Nov-2004.
  36. Uysal İ and Güvenir H (2004). Instance-Based Regression by Partitioning Feature Projections, Applied Intelligence, 21:1, (57-79), Online publication date: 1-Jul-2004.
  37. Scaringella A On the size of a classification tree Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition, (65-72)
  38. Terano T and Inada M Data mining from clinical data using interactive evolutionary computation Advances in evolutionary computing, (847-861)
  39. Lee H, Park W and Park D An efficient algorithm for mining quantitative association rules to raise reliance of data in large databases Design and application of hybrid intelligent systems, (672-681)
  40. Freitas A A survey of evolutionary algorithms for data mining and knowledge discovery Advances in evolutionary computing, (819-845)
  41. Povinelli R and Feng X (2003). A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events, IEEE Transactions on Knowledge and Data Engineering, 15:2, (339-352), Online publication date: 1-Feb-2003.
  42. Reinartz T Stages of the discovery process Handbook of data mining and knowledge discovery, (185-192)
  43. Bacon L Marketing Handbook of data mining and knowledge discovery, (715-725)
  44. ACM
    Little B, Johnston W, Lovell A, Rejesus R and Steed S Collusion in the U.S. crop insurance program Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, (594-598)
  45. Liu H and Motoda H (2002). On Issues of Instance Selection, Data Mining and Knowledge Discovery, 6:2, (115-130), Online publication date: 1-Apr-2002.
  46. Dězeroski S Data mining in a nutshell Relational Data Mining, (3-27)
  47. Dounias G, Tselentis G and Moustakis V (2001). Machine learning based feature extraction for quality control in a production line, Integrated Computer-Aided Engineering, 8:4, (325-336), Online publication date: 1-Dec-2001.
  48. ACM
    Adderley R and Musgrove P Data mining case study Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, (215-220)
  49. ACM
    Allard S (2001). Introduction to the database issue, XRDS: Crossroads, The ACM Magazine for Students, 7:3, (2), Online publication date: 15-Mar-2001.
  50. Kim J, Lee B, Shaw M, Chang H and Nelson M (2001). Application of Decision-Tree Induction Techniques to Personalized Advertisements on Internet Storefronts, International Journal of Electronic Commerce, 5:3, (45-62), Online publication date: 1-Mar-2001.
  51. Toole J A hybrid approach to the identification and expansion of abbreviations Content-Based Multimedia Information Access - Volume 1, (725-736)
  52. Toole J Categorizing unknown words Proceedings of the sixth conference on Applied natural language processing, (173-179)
  53. Romaniuk S (2000). Using Intelligent Agents to Identify Missing and Exploited Children, IEEE Intelligent Systems, 15:2, (27-30), Online publication date: 1-Mar-2000.
  54. Spangler W, May J and Vargas L (1999). Choosing data-mining methods for multiple classification, Journal of Management Information Systems, 16:1, (37-62), Online publication date: 1-Jun-1999.
  55. ACM
    Shewhart M and Wasson M Monitoring a newsfeed for hot topics Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, (402-404)
  56. ACM
    Bonchi F, Giannotti F, Mainetto G and Pedreschi D A classification-based methodology for planning audit strategies in fraud detection Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, (175-184)
  57. Weiss S, Apte C, Damerau F, Johnson D, Oles F, Goetz T and Hampp T (1999). Maximizing Text-Mining Performance, IEEE Intelligent Systems, 14:4, (63-69), Online publication date: 1-Jul-1999.
  58. Kontkanen P, Myllymäki P, Silander T and Tirri H BAYDA Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, (254-258)
  59. Wang H, Düntsch I and Bell D Data reduction based on hyper relations Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, (349-353)
  60. Indurkhya N and Weiss S (1998). Estimating Performance Gains for Voted Decision Trees, Intelligent Data Analysis, 2:4, (303-310), Online publication date: 1-Jul-1998.
Contributors
  • UNSW Sydney

Recommendations

Svetlana Segarceanu

Data mining is roughly defined as the “search for valuable information in large volumes of data.” This book represents an effort to systematize recent developments in the analysis and management of such data. The authors present the aspects of and approaches to a data mining process, and show how to integrate several techniques, by describing some real-life case studies. The book traces the development of data mining applications, making it a technical guide to performing large-scale analysis of real-life data warehouses. The structure of the work takes into account the main steps to be accomplished in a data mining process: data preparation; data reduction; data modeling and prediction; and case and solution analysis. The book begins with an attempt to define the concept of data mining and establish the framework for the subsequent discussion. The authors identify the underlying principles of data mining and related concepts, including the storage of massive quantities of data in electronic form (big data); centralized resources for these data (data warehouses); and timeliness (efficient storage and query of time-dependent information). They also discuss the main problems associated with this emerging field, which fall into two general types: prediction (classification, regression, and time series) and knowledge discovery (deviation detection, clustering, and association rules). The spreadsheet model, with two primary dimensions (cases and features), is used throughout the chapter to model the data. Chapter 2 analyzes classical statistics and prediction and applies them to the evaluation of big data. Because good predictive performance is an important goal, much of the chapter is devoted to error estimation. Chapter 3 concerns the data preparation phase and describes a standard spreadsheet form for data organization. It examines several forms of raw data and considers transformations that may help improve results, such as normalization, and several techniques for data smoothing. Among the topics covered are missing data, data with strong time-dependencies, and free-text data. Chapter 4 reviews techniques for reducing data dimensions. This chapter mainly addresses the use of optimal feature selection methods to reduce the number of features; clustering techniques for reducing the number of values; and reducing the number of cases. Methods such as Karhunen-Loeve expansion, decision trees, k -means clustering, nearest neighbor, and class entropy are examined. The authors suggest the use of decision trees as an alternative to the more frequently used methods of feature selection. Chapter 5 summarizes classification and applied prediction methods, which are broken down into three groups: mathematical (linear solutions, neural nets, and multiple adaptive regression by splines), distance (nearest neighbor), and logic (decision trees and decision rules). The authors analyze several facets of these methods—including solution complexity, data preparation and training, and the effects of data dimensions—and discuss their advantages and drawbacks. Chapter 6 compares the data reduction techniques from chapter 4 and the prediction methods from chapter 5 in several spreadsheets, so that readers can evaluate them side-by-side. The datasets are from medical, telecommunications, media, service, control, and sales data applications. Chapter 7 sketches some data mining problems and outlines their solutions, which are a combination of art and science. The examples focus on real-life data mining applications: text mining, process control, and outcome analysis. The chapter describes an organizational model for unifying the tasks of the previous chapters, and presents the protocols for preparing data and organizing the mining effort. Each chapter is supplemented with bibliographic and historical notes, most related to databases, statistics, and machine learning, which spawned data mining. The bibliography contains recent works. The book is richly illustrated, embodying the authors' stress on the role of visualization in offering a better understanding of the book's topics. Designers of data warehouses, or of any application involving massive quantities of data, will find the book helpful. A mathematical or statistical background is not required; college-level mathematics would suffice. Readers are also invited to test the authors' software at http:/www.data-miner.com .

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.