Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data | Guide books

Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real DataJuly 2010

Go to Guide to Intelligent Data Analysis

July 2010

Publisher:

Springer Publishing Company, Incorporated

ISBN:978-1-84882-259-7

Published:07 July 2010

Pages:

397

Available at Amazon

Bibliometrics

Sections

2010

Abstract

Each passing year bears witness to the development of ever more powerful computers, increasingly fast and cheap storage media, and even higher bandwidth data connections. This makes it easy to believe that we can now at least in principle solve any problem we are faced with so long as we only have enough data. Yet this is not the case. Although large databases allow us to retrieve many different single pieces of information and to compute simple aggregations, general patterns and regularities often go undetected. Furthermore, it is exactly these patterns, regularities and trends that are often most valuable. To avoid the danger of drowning in information, but starving for knowledge the branch of research known as data analysis has emerged, and a considerable number of methods and software tools have been developed. However, it is not these tools alone but the intelligent application of human intuition in combination with computational power, of sound background knowledge with computer-aided modeling, and of critical reflection with convenient automatic model construction, that results in successful intelligent data analysis projects. Guide to Intelligent Data Analysis provides a hands-on instructional approach to many basic data analysis techniques, and explains how these are used to solve data analysis problems. Topics and features: guides the reader through the process of data analysis, following the interdependent steps of project understanding, data understanding, data preparation, modeling, and deployment and monitoring; equips the reader with the necessary information in order to obtain hands-on experience of the topics under discussion; provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms; includes numerous examples using R and KNIME, together with appendices introducing the open source software; integrates illustrations and case-study-style examples to support pedagogical exposition. This practical and systematic textbook/reference for graduate and advanced undergraduate students is also essential reading for all professionals who face data analysis problems. Moreover, it is a book to be used following ones exploration of it. Dr. Michael R. Berthold is Nycomed-Professor of Bioinformatics and Information Mining at the University of Konstanz, Germany. Dr. Christian Borgelt is Principal Researcher at the Intelligent Data Analysis and Graphical Models Research Unit of the European Centre for Soft Computing, Spain. Dr. Frank Hppner is Professor of Information Systems at Ostfalia University of Applied Sciences, Germany. Dr. Frank Klawonn is a Professor in the Department of Computer Science and Head of the Data Analysis and Pattern Recognition Laboratory at Ostfalia University of Applied Sciences, Germany. He is also Head of the Bioinformatics and Statistics group at the Helmholtz Centre for Infection Research, Braunschweig, Germany.

Cited By

Contributors

Michael R. Berthold
University of Konstanz
- Publication Years1994 - 2023
- Publication counts82
- Citation count1,120
- Available for Download11
- Downloads (cumulative)6,398
- Downloads (12 months)278
- Downloads (6 weeks)30
- Average Downloads per Article582
- Average Citation per Article14
View Full Profile
Christian Borgelt
University of Salzburg
- Publication Years1997 - 2021
- Publication counts61
- Citation count509
- Available for Download8
- Downloads (cumulative)5,305
- Downloads (12 months)226
- Downloads (6 weeks)44
- Average Downloads per Article663
- Average Citation per Article8
View Full Profile
Frank Hppner
- Publication Years2010 - 2010
- Publication counts1
- Citation count13
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article13
View Full Profile
Frank Klawonn
Helmholtz Centre for Infection Research (HZI)
- Publication Years1991 - 2022
- Publication counts80
- Citation count327
- Available for Download3
- Downloads (cumulative)543
- Downloads (12 months)36
- Downloads (6 weeks)4
- Average Downloads per Article181
- Average Citation per Article4
View Full Profile

Index Terms

Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory
  2. Probability and statistics

Recommendations

Reviews

Reviewer: Corrado Mencar

The clear and complete exposition of arguments, the attention to formalization, and the balanced number of bibliographic references make this book a bright introduction to intelligent data analysis. It is an excellent choice for graduate or advanced undergraduate courses, as well as for researchers and professionals who want get acquainted with this field of study. Intelligent data analysis is the complex process of acquiring useful knowledge from massive amounts of real data (data collected from real-world processes). Such data is possibly incomplete, distributed among several sources, and polluted by noise. Intelligent data analysis is similar to knowledge discovery in data (KDD), but it places more emphasis on the role of the analyst, who intelligently applies available tools to analyze data and design models. After an introduction to general data analysis concepts, the authors reserve the next chapter for playfully but effectively comparing two approaches to data analysis. In the first situation, they apply a number of tools, almost mechanically. In the second situation, they apply an intelligent approach. The chapter clearly shows the risks of a naive approach to data analysis: extracting no useful knowledge from data, or, even more dangerously, extracting false knowledge. The structure of the book takes the user through each of the stages required for intelligent data analysis. The authors adopt the cross industry standard process for data mining (CRISP-DM) model as a guideline for the description of the various steps. Two chapters examine project and data understanding, the two key stages of CRISP-DM necessary for making the most critical choices in the subsequent stages (or for deciding to abandon the project). The chapter on data understanding, in particular, shows a number of techniques for assessing the quality of available data, including data visualization, descriptive statistics, outlier detection, and missing value analysis. Chapter 5 does not describe any stage of the CRISP-DM process. Instead, it is devoted to the basic principles for a correct model design. The chapter covers general topics such as model fitting strategies and criteria, analysis of the possible sources of errors, and model validation. This chapter prepares the reader for the next part of the book, which presents and discusses several models. Starting with chapter 6, the next part of the book describes the basic techniques for data preparation. The subsequent three chapters focus on the three main objectives of data analysis: finding patterns, finding explanations, and finding predictors. Patterns are regularities hidden in data; one can use exploratory techniques to extract them. The book illustrates cluster and deviation analysis, self-organizing maps, and association rules. Explanatory techniques described in the book include rule-based models, decision trees, regression models, and Bayes classifiers. Finally, the book outlines the most basic predictive models, including the k -nearest neighbor algorithm ( k -NN), neural networks (a brief summary), and support vector machines (SVMs). It also briefly describes ensemble methods. The final chapter briefly covers the last two stages of the CRISP-DM process: evaluation and deployment. Since this is an introductory book, it does not cover advanced arguments such as multi-relational data mining, fuzzy models, and structured datasets. Each chapter, however, includes a well-balanced number of references that are useful for investigating advanced topics. The chapters that contain technical content end with a section that illustrates how to apply the described techniques in R (an open-source statistical tool) and the Konstanz Information Miner (KNIME), free software for setting and running knowledge discovery workflows. These two sections, although quite short, are useful for understanding how to concretely apply the described techniques. The book ends with three appendices. The first appendix is a well-written summary of statistics, which is useful for recalling basic notions and properties from descriptive and inferential statistics and from probability theory. The second appendix is an introduction to the R language. Though short, it is sufficient for following the examples in the chapters. Similarly, the last appendix briefly describes KNIME. Overall, the authors hit their target of producing a textbook that aids in understanding the basic processes, methods, and issues for intelligent data analysis. The level of detail is not homogeneous throughout the book-some sections provide only a big picture of the described arguments while others offer more detail-and there are a few typographical errors, but the rigorous and impartial exposition, the use of a uniform notation, the consistent use of the same dataset (Iris) to show the examples, and the adequate bibliography make this book a good selection for the target audience. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Browse Books

Sections

Cited By

Index Terms

Intelligent Data Analysis: An Introduction

Intelligent Data Analysis in Medicine and Pharmacology

Intelligent Multidimensional Data Clustering and Analysis

Reviews

Access critical reviews of Computing literature here

Save to Binder

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Intelligent Data Analysis: An Introduction

Intelligent Data Analysis in Medicine and Pharmacology

Intelligent Multidimensional Data Clustering and Analysis

Reviews

Access critical reviews of Computing literature here