Data Mining and Analysis: Fundamental Concepts and Algorithms | Guide books

Data Mining and Analysis: Fundamental Concepts and AlgorithmsJune 2014

June 2014

Publisher:

Cambridge University Press
40 W. 20 St. New York, NY
United States

ISBN:978-0-521-76633-3

Published:30 June 2014

Pages:

624

Available at Amazon

Bibliometrics

Abstract

The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike. Key features: Covers both core methods and cutting-edge research Algorithmic approach with open-source implementations Minimal prerequisites: all key mathematical concepts are presented, as is the intuition behind the formulas Short, self-contained chapters with class-tested examples and exercises allow for flexibility in designing a course and for easy reference Supplementary website with lecture slides, videos, project ideas, and more

Cited By

Contributors

Mohammed J Zaki
- Publication Years2014 - 2014
- Publication counts1
- Citation count88
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article88
View Full Profile
Wagner Meira Jr
Federal University of Minas Gerais
- Publication Years2014 - 2020
- Publication counts3
- Citation count89
- Available for Download2
- Downloads (cumulative)326
- Downloads (12 months)32
- Downloads (6 weeks)2
- Average Downloads per Article163
- Average Citation per Article30
View Full Profile

Index Terms

Data Mining and Analysis: Fundamental Concepts and Algorithms
1. General and reference
  1. Document types
    1. Reference works
2. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining

Recommendations

Reviews

Reviewer: Van Van Dyke Parunak

This volume is a well-organized presentation of several major themes in current data mining research and practice. Readers should not be misled by its preface, which justifies a new data mining book by observing that many existing texts "are either too high-level or too advanced." Zaki and Meira declare their intention to offer "an introductory text" that provides not only mathematical foundations, but also "the intuition behind the formulas." In fact, the volume assumes a fair level of mathematical maturity on the part of the reader and relatively little intuitive justification for the details that it presents. Experienced practitioners will find it a useful reference, but in an introductory data mining class, it will need supplementation by clarifying lectures or other readings. An introductory chapter frames the discussion by presenting data as a matrix of entities and properties that can be viewed algebraically, geometrically, or probabilistically. This structure (supported by the ubiquitous Iris dataset) is well suited to the approaches that the authors discuss, but does not accommodate some other important areas of data mining. In particular, mining of unstructured text is an area of growing importance, covered by some other books on data mining, but this volume does not discuss it. Part 1, "Data Analysis Foundations," describes various kinds of data, namely numeric and categorical attributes and graph-structured data. It introduces the idea of a kernel, which features in several of the methods presented. The discussion on high-dimensional data is an excellent mathematical summary of the counterintuitive behavior of points in high dimensions, and is followed by a chapter on formal mechanisms for dimensionality reduction. Part 1 is an important foundation for the specific data mining methods in later sections. For example, the kernel methods introduced in chapter 5 are invoked repeatedly in each of the following sections (though, strangely, the index misses the reference to the "kernel trick" introduced in this chapter and instead directs the reader to chapter 13, which does indeed cite the "trick," but without reminding the reader where to find it). The book's back cover, preface, and first chapter all misleadingly summarize Part 1 as "exploratory data analysis." Exploratory data analysis, in the sense in which John Tukey popularized the term, refers to a nonformal, intuition-based search for hypotheses, contrasted with formal methods for testing those hypotheses. The importance of starting data mining with an informal engagement with the data cannot be overemphasized, but Part 1 does not provide any guidance for this engagement. Parts 2, 3, and 4 discuss three specific approaches to data mining. Each part concludes with a chapter on validating or assessing the results extracted by the methods discussed in the part, an accessible organization that will make the book a frequent reference for the practitioner. Part 2 describes how to mine three kinds of frequent patterns: itemsets (described by association rules), sequences, and graph patterns. Here and throughout the book, the emphasis on graph-structured data is a valuable extension beyond what some other books on data mining offer. Part 3 provides details on four approaches to data clustering: representative-based (such as K -means), hierarchical agglomerative, density-based, and graph-based methods centered on the graph spectrum. This latter category is an important set of techniques that are not sufficiently discussed in many other references, but this volume does not tell the reader what the graph spectrum is or offer an intuitive explanation for how it is valuable for clustering. A brief discussion of the relation between the graph spectrum and the structure of a graph would greatly encourage readers to engage the mathematical details that the authors provide. Part 4 discusses classification methods, including probabilistic classification, decision trees, linear discriminant analysis, and support vector machines. This volume is a detailed, well-organized reference on three major approaches to data mining, and practitioners will keep it close at hand. Its popularity will be enhanced by the fact that the authors have made a PDF copy of the entire book available for private use on the book website, www.dataminingbook.info. More reviews about this item: Amazon Online Computing Reviews Service

Reviewer: Dimitrios Katsaros

In their Harvard Business Review article at the end of 2012 [1], Davenport and Patil characterize data scientist as the sexiest job of this century; they argue that among the qualities of a data scientist is expertise in computer science and statistics. I would extend their argument and say that knowledge of data mining tasks for big data is eventually the principal quality of any data scientist. This book is about educating and training the next generation of data mining people, those who will build new enterprises and move our knowledge one big step ahead. The book is divided into four parts. Part 1's chapters describe basic notions and background knowledge useful for building the advanced knowledge found in subsequent sections. In particular, these chapters present the concepts of numerical, categorical, graph, and high-dimensional data, along with useful statistical tools such as kernel methods and dimensionality reduction procedures, for example, singular value decomposition (SVD) and principal component analysis (PCA). The second part deals with the issue of mining frequent patterns: patterns that emerge in set-based data, in sequence-based (sets with ordering) data, and in graph data. The third part investigates the topic of clustering, explaining the basic algorithms for representative, hierarchical, density-based, spectral, and graph clustering. Finally, the last part of the book describes methods for classification, namely Bayes and decision tree classifiers, support vector machines (SVM), and linear discriminant analysis. Despite the fact that there are several good books in the literature on data mining, this new one is really special. It manages to include all of the latest developments in the data mining area, along with those past ideas that have survived the test of time. The book does not overload the reader with many variants of classic algorithms just to cover in breadth all methods, but it presents only those algorithms that have been the roots of large families of algorithmic ideas. In this way, the book offers to the reader a deep comprehension of original thinking. Overall, this is an excellent textbook for both undergraduate and postgraduate students, but it is also appropriate for scientists and engineers looking for solutions to their big data analysis problems. The solved and unsolved exercises in the book are carefully selected to enhance the reader's understanding and to challenge him or her to further investigate the specific topic the exercise is about. I expect that the quality of the book will cause demanding data scientists to save a place for it in their hearts. More reviews about this item: Amazon Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Browse Books

Sections

Cited By

Index Terms

Data Mining: Foundations and Intelligent Paradigms VOLUME 2 Statistical, Bayesian, Time Series and other Theoretical Aspects

Data Mining: The Textbook

Data Mining: The Textbook

Reviews

Access critical reviews of Computing literature here

Save to Binder

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Data Mining: Foundations and Intelligent Paradigms VOLUME 2 Statistical, Bayesian, Time Series and other Theoretical Aspects

Data Mining: The Textbook

Data Mining: The Textbook

Reviews

Access critical reviews of Computing literature here