skip to main content
Skip header Section
Data Mining and Analysis: Fundamental Concepts and AlgorithmsJune 2014
Publisher:
  • Cambridge University Press
  • 40 W. 20 St. New York, NY
  • United States
ISBN:978-0-521-76633-3
Published:30 June 2014
Pages:
624
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike. Key features: Covers both core methods and cutting-edge research Algorithmic approach with open-source implementations Minimal prerequisites: all key mathematical concepts are presented, as is the intuition behind the formulas Short, self-contained chapters with class-tested examples and exercises allow for flexibility in designing a course and for easy reference Supplementary website with lecture slides, videos, project ideas, and more

Cited By

  1. ACM
    Chungnoi K, Kongkachandra R and Gulyanon S (2023). The Computational Method for Supporting Thai VerbNet Construction, ACM Transactions on Asian and Low-Resource Language Information Processing, 0:0
  2. Zhang A, Deng S, Cui D, Yuan Y and Wang G (2023). An Experimental Evaluation of Anomaly Detection in Time Series, Proceedings of the VLDB Endowment, 17:3, (483-496), Online publication date: 1-Nov-2023.
  3. ACM
    Nguyen T, Nguyen T, Nguyen T, Yin H, Nguyen T, Jo J and Nguyen Q (2023). Isomorphic Graph Embedding for Progressive Maximal Frequent Subgraph Mining, ACM Transactions on Intelligent Systems and Technology, 0:0
  4. Sousa M, Vieira P, Queluz M and Rodrigues A (2024). Towards the use of Unsupervised Causal Learning in Wireless Networks Operation, Journal of King Saud University - Computer and Information Sciences, 35:9, Online publication date: 1-Oct-2023.
  5. Qiu H, Yang Y and Pan H (2023). Underestimation modification for intrinsic dimension estimation, Pattern Recognition, 140:C, Online publication date: 1-Aug-2023.
  6. Bardou A and Begin T (2023). Analysis of a decentralized Bayesian optimization algorithm for improving spatial reuse in dense WLANs, Computer Communications, 208:C, (158-170), Online publication date: 1-Aug-2023.
  7. Feres C and Ding Z (2023). An Unsupervised Learning Paradigm for User Scheduling in Large Scale Multi-Antenna Systems, IEEE Transactions on Wireless Communications, 22:5, (2932-2945), Online publication date: 1-May-2023.
  8. Puspitasari R, Wintarti A and Imah E (2023). Comparison of feature extraction for noise-robust gamelan tone signal recognition, Procedia Computer Science, 216:C, (698-705), Online publication date: 1-Jan-2023.
  9. ACM
    Tey F, Wu T and Chen J (2022). Machine Learning-based Short-term Rainfall Prediction from Sky Data, ACM Transactions on Knowledge Discovery from Data, 16:6, (1-18), Online publication date: 31-Dec-2022.
  10. Park Y (2022). Developing a COVID-19 Crisis Management Strategy Using News Media and Social Media in Big Data Analytics, Social Science Computer Review, 40:6, (1358-1375), Online publication date: 1-Dec-2022.
  11. Gad A, Sallam K, Chakrabortty R, Ryan M and Abohany A (2022). An improved binary sparrow search algorithm for feature selection in data classification, Neural Computing and Applications, 34:18, (15705-15752), Online publication date: 1-Sep-2022.
  12. ACM
    Kim J, Luo S, Cong G and Yu W DMCS : Density Modularity based Community Search Proceedings of the 2022 International Conference on Management of Data, (889-903)
  13. ACM
    Tang X, Wu S, Song M, Ying S, Li F and Chen G PreQR: Pre-training Representation for SQL Understanding Proceedings of the 2022 International Conference on Management of Data, (204-216)
  14. ACM
    Chowdhury M, Ahmed C and Leung C (2021). A New Approach for Mining Correlated Frequent Subgraphs, ACM Transactions on Management Information Systems, 13:1, (1-28), Online publication date: 31-Mar-2022.
  15. ACM
    Bernardini G, Chen H, Fici G, Loukides G and Pissis S (2021). Reverse-Safe Text Indexing, ACM Journal of Experimental Algorithmics, 26, (1-26), Online publication date: 31-Dec-2022.
  16. ACM
    Ermakova T, Fabian B, Alexander Fradin D and Gross S A Framework for Internet Connectivity Risk Assessment Based on Graph Models IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, (576-581)
  17. ACM
    Januzaj E, Weber M, Keller M, Auch M and Mandl P CoSim: An Approach to Calculate Complex Object Similarity The 23rd International Conference on Information Integration and Web Intelligence, (324-327)
  18. ACM
    Stappen L, Schumann L, Sertolli B, Baird A, Weigell B, Cambria E and Schuller B MuSe-Toolbox: The Multimodal Sentiment Analysis Continuous Annotation Fusion and Discrete Class Transformation Toolbox Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, (75-82)
  19. ACM
    Brito D, Assunção R, Souza R and JR. W (2020). SCPP, ACM Transactions on Spatial Algorithms and Systems, 7:1, (1-30), Online publication date: 6-Jan-2021.
  20. Kaur I, Doja M, Ahmad T, Ahmad M, Hussain A, Nadeem A, Abd El-Latif A and Doulamis A (2021). An Integrated  Approach for Cancer Survival Prediction Using Data Mining Techniques, Computational Intelligence and Neuroscience, 2021, Online publication date: 1-Jan-2021.
  21. Molina-Coronado B, Mori U, Mendiburu A and Miguel-Alonso J (2020). Survey of Network Intrusion Detection Methods From the Perspective of the Knowledge Discovery in Databases Process, IEEE Transactions on Network and Service Management, 17:4, (2451-2479), Online publication date: 1-Dec-2020.
  22. Pazhaniraja N, Sountharrajan S and Sathis Kumar B (2020). High utility itemset mining: a Boolean operators-based modified grey wolf optimization algorithm, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 24:21, (16691-16704), Online publication date: 1-Nov-2020.
  23. Alves F, Andongabo A, Gashi I, Ferreira P and Bessani A Follow the Blue Bird: A Study on Threat Data Published on Twitter Computer Security – ESORICS 2020, (217-236)
  24. ACM
    Hamilton N and Fulp E An evolutionary approach for constructing multi-stage classifiers Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, (1730-1738)
  25. Zerabi S, Meshoul S and Boucherkha S (2020). Models for Internal Clustering Validation Indexes Based on Hadoop-MapReduce, International Journal of Distributed Systems and Technologies, 11:3, (42-67), Online publication date: 1-Jul-2020.
  26. Forouzandeh S, Aghdam A, Forouzandeh S and Xu S (2020). Addressing the Cold-Start Problem Using Data Mining Techniques and Improving Recommender Systems by Cuckoo Algorithm: A Case Study of Facebook, Computing in Science and Engineering, 22:4, (62-73), Online publication date: 1-Jul-2020.
  27. Azmi E, Strobl M, van Pruijssen R, Ehret U, Meyer J and Streit A Evolutionary Approach of Clustering to Optimize Hydrological Simulations Computational Science and Its Applications – ICCSA 2020, (617-633)
  28. Mahato S, Goyal N, Ram D and Paul S (2020). Detection of Depression and Scaling of Severity Using Six Channel EEG Data, Journal of Medical Systems, 44:7, Online publication date: 21-May-2020.
  29. Scitovski R and Sabo K (2019). DBSCAN-like clustering method for various data densities, Pattern Analysis & Applications, 23:2, (541-554), Online publication date: 1-May-2020.
  30. Mansouri N, Javidi M and Mohammad Hasani Zade B (2019). Using data mining techniques to improve replica management in cloud environment, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 24:10, (7335-7360), Online publication date: 1-May-2020.
  31. Khanali H and Vaziri B (2019). An improved approach to fuzzy clustering based on FCM algorithm and extended VIKOR method, Neural Computing and Applications, 32:2, (473-484), Online publication date: 1-Jan-2020.
  32. Tianxing M, Baimuratov I and Zhukova N (2020). A Knowledge-Oriented Recommendation System for Machine Learning Algorithm Finding and Data Processing, International Journal of Embedded and Real-Time Communication Systems, 10:4, (20-38), Online publication date: 1-Oct-2019.
  33. Feremans L, Vercruyssen V, Cule B, Meert W and Goethals B Pattern-Based Anomaly Detection in Mixed-Type Time Series Machine Learning and Knowledge Discovery in Databases, (240-256)
  34. ACM
    Walton N, Sheppard J and Shaw J Using a genetic algorithm with histogram-based feature selection in hyperspectral image classification Proceedings of the Genetic and Evolutionary Computation Conference, (1364-1372)
  35. ACM
    Zhang X, Qiao Z, Ahuja A, Fan W, Fox E and Reddy C Discovering Product Defects and Solutions from Online User Generated Contents The World Wide Web Conference, (3441-3447)
  36. ACM
    Oraby S, Bhuiyan M, Gundecha P, Mahmud J and Akkiraju R (2019). Modeling and Computational Characterization of Twitter Customer Service Conversations, ACM Transactions on Interactive Intelligent Systems, 9:2-3, (1-28), Online publication date: 25-Apr-2019.
  37. Santos R, Sousa M, Vieira P, Queluz M and Rodrigues A An Unsupervised Learning Approach for Performance and Configuration Optimization of 4G Networks 2019 IEEE Wireless Communications and Networking Conference (WCNC), (1-6)
  38. Vodyaho A, Osipov V, Zhukova N and Chervontsev M (2019). Cognitive Technologies in Monitoring Management, Automatic Documentation and Mathematical Linguistics, 53:2, (71-80), Online publication date: 1-Mar-2019.
  39. Zhukova N and Andriyanova N (2019). Cognitive Monitoring of Distributed Objects, Automatic Documentation and Mathematical Linguistics, 53:1, (32-43), Online publication date: 1-Jan-2019.
  40. Abid A and Zou J Autowarp Proceedings of the 32nd International Conference on Neural Information Processing Systems, (10568-10578)
  41. Kul G, Luong D, Xie T, Chandola V, Kennedy O and Upadhyaya S (2018). Similarity Metrics for SQL Query Clustering, IEEE Transactions on Knowledge and Data Engineering, 30:12, (2408-2420), Online publication date: 1-Dec-2018.
  42. ACM
    Garcia del Molino A, Lim J and Tan A Predicting Visual Context for Unsupervised Event Segmentation in Continuous Photo-streams Proceedings of the 26th ACM international conference on Multimedia, (10-17)
  43. Doyle C, Meandzija A, Korniss G, Szymanski B, Asher D and Bowman E Mining personal media thresholds for opinion dynamics and social influence Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, (1258-1265)
  44. Caetano J, Almeida J and Marques-Neto H Characterizing politically engaged users' behavior during the 2016 US presidential campaign Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, (523-530)
  45. ACM
    Tuhkala A, Kärkkäinen T and Nieminen P Semi-automatic literature mapping of participatory design studies 2006--2016 Proceedings of the 15th Participatory Design Conference: Short Papers, Situated Actions, Workshops and Tutorial - Volume 2, (1-5)
  46. ACM
    Wu L, Chen P, Yen I, Xu F, Xia Y and Aggarwal C Scalable Spectral Clustering Using Random Binning Features Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2506-2515)
  47. ACM
    Castro Fernandez R, Culhane W, Watcharapichat P, Weidlich M, Lopez Morales V and Pietzuch P Meta-Dataflows Proceedings of the 2018 International Conference on Management of Data, (1157-1172)
  48. Baimuratov I and Zhukova N An Approach to Clustering Models Estimation Proceedings of the 22st Conference of Open Innovations Association FRUCT, (19-24)
  49. Kuznetsov S and Makhalova T (2018). On interestingness measures of formal concepts, Information Sciences: an International Journal, 442:C, (202-219), Online publication date: 1-May-2018.
  50. Scitovski S (2018). A density-based clustering algorithm for earthquake zoning, Computers & Geosciences, 110:C, (90-95), Online publication date: 1-Jan-2018.
  51. Swarup Das A, Mehta S and Subramaniam L (2017). AnnoFinA hybrid algorithm to annotate financial text, Expert Systems with Applications: An International Journal, 88:C, (270-275), Online publication date: 1-Dec-2017.
  52. ACM
    Abulaish M and Jahiruddin A Novel Weighted Distance Measure for Multi-Attributed Graph Proceedings of the 10th Annual ACM India Compute Conference, (39-47)
  53. ACM
    Zhang B and Al Hasan M Name Disambiguation in Anonymized Graphs using Network Embedding Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, (1239-1248)
  54. ACM
    Mlambo M, Gasela N, Esiefarienrhe M and Isong B On the Optimization of Improved Apriori Algorithm via Linked-list Trie Proceedings of the 1st International Conference on Big Data Research, (62-66)
  55. ACM
    Chen Q, Wan Y, Zhang X, Lei Y, Zobel J and Verspoor K (2018). Comparative Analysis of Sequence Clustering Methods for Deduplication of Biological Databases, Journal of Data and Information Quality, 9:3, (1-27), Online publication date: 30-Sep-2017.
  56. ACM
    Brandão M, de Melo P and Moro M Tie strength dynamics over temporal co-authorship social networks Proceedings of the International Conference on Web Intelligence, (306-313)
  57. Costa E, Fonseca B, Santana M, de Arajo F and Rego J (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students' academic failure in introductory programming courses, Computers in Human Behavior, 73:C, (247-256), Online publication date: 1-Aug-2017.
  58. Santos W, Carvalho L, de P. Avelar G, Silva Á, Ponce L, Guedes D and Meira W Lemonade Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, (745-748)
  59. ACM
    Zerabi S, Meshoul S, Merniz A and Melal R Towards Clustering Validation in Big Data Context Proceedings of the 2nd international Conference on Big Data, Cloud and Applications, (1-6)
  60. Anwar T, Liu C, Vu H and Leckie C (2017). Partitioning road networks using density peak graphs, Information Systems, 64:C, (22-40), Online publication date: 1-Mar-2017.
  61. Brando M and Moro M (2017). Social professional networks, Computer Communications, 100:C, (20-31), Online publication date: 1-Mar-2017.
  62. Pacella M, Grieco A and Blaco M (2016). On the Use of Self-Organizing Map for Text Clustering in Engineering Change Process Analysis, Computational Intelligence and Neuroscience, 2016, (7), Online publication date: 1-Dec-2016.
  63. Marbouti F, Diefes-Dux H and Madhavan K (2016). Models for early prediction of at-risk students in a course using standards-based grading, Computers & Education, 103:C, (1-15), Online publication date: 1-Dec-2016.
  64. ACM
    Rehioui H, Idrissi A and Abourezq M The Research and Selection of Ideal Cloud Services using Clustering Techniques Proceedings of the International Conference on Big Data and Advanced Wireless Technologies, (1-6)
  65. Song W, Zhang Z and Li J (2016). A high utility itemset mining algorithm based on subsume index, Knowledge and Information Systems, 49:1, (315-340), Online publication date: 1-Oct-2016.
  66. ACM
    Ponde P, Shirwaikar S and Kreiner C An analytical study of security patterns Proceedings of the 21st European Conference on Pattern Languages of Programs, (1-26)
  67. ACM
    He J, Veltri E, Santoro D, Li G, Mecca G, Papotti P and Tang N Interactive and Deterministic Data Cleaning Proceedings of the 2016 International Conference on Management of Data, (893-907)
  68. ACM
    Nezhadbiglari M, Gonçalves M and Almeida J Early Prediction of Scholar Popularity Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, (181-190)
  69. Rieck B and Leitte H (2016). Exploring and Comparing Clusterings of Multivariate Data Sets Using Persistent Homology, Computer Graphics Forum, 35:3, (81-90), Online publication date: 1-Jun-2016.
  70. ACM
    Kreutzer P, Dotzler G, Ring M, Eskofier B and Philippsen M Automatic clustering of code changes Proceedings of the 13th International Conference on Mining Software Repositories, (61-72)
  71. (2016). Tutorial on practical tips of the most influential data preprocessing algorithms in data mining, Knowledge-Based Systems, 98:C, (1-29), Online publication date: 15-Apr-2016.
  72. Avci U and Aran O (2016). Predicting the Performance in Decision-Making Tasks: From Individual Cues to Group Interaction, IEEE Transactions on Multimedia, 18:4, (643-658), Online publication date: 1-Apr-2016.
  73. Ben Hamza A (2016). Graph regularized sparse coding for 3D shape clustering, Knowledge-Based Systems, 92:C, (92-103), Online publication date: 15-Jan-2016.
  74. Fersini E, Messina E and Pozzi F (2016). Expressive signals in social media languages to improve polarity detection, Information Processing and Management: an International Journal, 52:1, (20-35), Online publication date: 1-Jan-2016.
  75. Bhattacharya S and Selvakumar S (2015). LAWRA, Security and Communication Networks, 8:18, (3459-3468), Online publication date: 1-Dec-2015.
  76. Hamrouni T, Slimani S and Charrada F (2015). A Critical Survey of Data Grid Replication Strategies Based on Data Mining Techniques, Procedia Computer Science, 51:C, (2779-2788), Online publication date: 1-Sep-2015.
  77. Brandão M and Moro M Analyzing the Strength of Co-authorship Ties with Neighborhood Overlap Proceedings, Part I, of the 26th International Conference on Database and Expert Systems Applications - Volume 9261, (527-542)
  78. ACM
    Imran M, Castillo C, Diaz F and Vieweg S (2015). Processing Social Media Messages in Mass Emergency, ACM Computing Surveys, 47:4, (1-38), Online publication date: 21-Jul-2015.
  79. Hadzic F, Hecker M and Tagarelli A (2015). Ordered subtree mining via transactional mapping using a structure-preserving tree database schema, Information Sciences: an International Journal, 310:C, (97-117), Online publication date: 20-Jul-2015.
  80. ACM
    Gonçalves E, Plastino A and Freitas A Simpler is Better Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, (559-566)
  81. da Silva P, Gonçalves E, Rios E, Muhammad A, Moss A, Pritchard T, Glassborow B, Plastino A and Azeredo R (2015). Automatic classification of carbonate rocks permeability from 1H NMR relaxation data, Expert Systems with Applications: An International Journal, 42:9, (4299-4309), Online publication date: 1-Jun-2015.
  82. ACM
    Abujabal A and Berberich K Important Events in the Past, Present, and Future Proceedings of the 24th International Conference on World Wide Web, (1315-1320)
  83. Saleem A, Asif K, Ali A, Awan S and Alghamdi M Pre-processing Methods of Data Mining Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, (451-456)
  84. Fersini E, Messina E and Pozzi F (2014). Sentiment analysis, Decision Support Systems, 68:C, (26-38), Online publication date: 1-Dec-2014.
  85. Anwar T and Abulaish M (2014). A social graph based text mining framework for chat log investigation, Digital Investigation: The International Journal of Digital Forensics & Incident Response, 11:4, (349-362), Online publication date: 1-Dec-2014.
  86. ACM
    Avci U and Aran O Effect of nonverbal behavioral patterns on the performance of small groups Proceedings of the 2014 workshop on Understanding and Modeling Multiparty, Multimodal Interactions, (9-14)
  87. Anwar T and Abulaish M (2014). Namesake alias mining on the Web and its role towards suspect tracking, Information Sciences: an International Journal, 276:C, (123-145), Online publication date: 20-Aug-2014.
  88. Naik N, Diao R and Shen Q Choice of effective fitness functions for genetic algorithm-aided dynamic fuzzy rule interpolation 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-8)
Contributors
  • Federal University of Minas Gerais

Recommendations

Reviews

Van Van Dyke Parunak

This volume is a well-organized presentation of several major themes in current data mining research and practice. Readers should not be misled by its preface, which justifies a new data mining book by observing that many existing texts "are either too high-level or too advanced." Zaki and Meira declare their intention to offer "an introductory text" that provides not only mathematical foundations, but also "the intuition behind the formulas." In fact, the volume assumes a fair level of mathematical maturity on the part of the reader and relatively little intuitive justification for the details that it presents. Experienced practitioners will find it a useful reference, but in an introductory data mining class, it will need supplementation by clarifying lectures or other readings. An introductory chapter frames the discussion by presenting data as a matrix of entities and properties that can be viewed algebraically, geometrically, or probabilistically. This structure (supported by the ubiquitous Iris dataset) is well suited to the approaches that the authors discuss, but does not accommodate some other important areas of data mining. In particular, mining of unstructured text is an area of growing importance, covered by some other books on data mining, but this volume does not discuss it. Part 1, "Data Analysis Foundations," describes various kinds of data, namely numeric and categorical attributes and graph-structured data. It introduces the idea of a kernel, which features in several of the methods presented. The discussion on high-dimensional data is an excellent mathematical summary of the counterintuitive behavior of points in high dimensions, and is followed by a chapter on formal mechanisms for dimensionality reduction. Part 1 is an important foundation for the specific data mining methods in later sections. For example, the kernel methods introduced in chapter 5 are invoked repeatedly in each of the following sections (though, strangely, the index misses the reference to the "kernel trick" introduced in this chapter and instead directs the reader to chapter 13, which does indeed cite the "trick," but without reminding the reader where to find it). The book's back cover, preface, and first chapter all misleadingly summarize Part 1 as "exploratory data analysis." Exploratory data analysis, in the sense in which John Tukey popularized the term, refers to a nonformal, intuition-based search for hypotheses, contrasted with formal methods for testing those hypotheses. The importance of starting data mining with an informal engagement with the data cannot be overemphasized, but Part 1 does not provide any guidance for this engagement. Parts 2, 3, and 4 discuss three specific approaches to data mining. Each part concludes with a chapter on validating or assessing the results extracted by the methods discussed in the part, an accessible organization that will make the book a frequent reference for the practitioner. Part 2 describes how to mine three kinds of frequent patterns: itemsets (described by association rules), sequences, and graph patterns. Here and throughout the book, the emphasis on graph-structured data is a valuable extension beyond what some other books on data mining offer. Part 3 provides details on four approaches to data clustering: representative-based (such as K -means), hierarchical agglomerative, density-based, and graph-based methods centered on the graph spectrum. This latter category is an important set of techniques that are not sufficiently discussed in many other references, but this volume does not tell the reader what the graph spectrum is or offer an intuitive explanation for how it is valuable for clustering. A brief discussion of the relation between the graph spectrum and the structure of a graph would greatly encourage readers to engage the mathematical details that the authors provide. Part 4 discusses classification methods, including probabilistic classification, decision trees, linear discriminant analysis, and support vector machines. This volume is a detailed, well-organized reference on three major approaches to data mining, and practitioners will keep it close at hand. Its popularity will be enhanced by the fact that the authors have made a PDF copy of the entire book available for private use on the book website, www.dataminingbook.info. More reviews about this item: Amazon Online Computing Reviews Service

Dimitrios Katsaros

In their Harvard Business Review article at the end of 2012 [1], Davenport and Patil characterize data scientist as the sexiest job of this century; they argue that among the qualities of a data scientist is expertise in computer science and statistics. I would extend their argument and say that knowledge of data mining tasks for big data is eventually the principal quality of any data scientist. This book is about educating and training the next generation of data mining people, those who will build new enterprises and move our knowledge one big step ahead. The book is divided into four parts. Part 1's chapters describe basic notions and background knowledge useful for building the advanced knowledge found in subsequent sections. In particular, these chapters present the concepts of numerical, categorical, graph, and high-dimensional data, along with useful statistical tools such as kernel methods and dimensionality reduction procedures, for example, singular value decomposition (SVD) and principal component analysis (PCA). The second part deals with the issue of mining frequent patterns: patterns that emerge in set-based data, in sequence-based (sets with ordering) data, and in graph data. The third part investigates the topic of clustering, explaining the basic algorithms for representative, hierarchical, density-based, spectral, and graph clustering. Finally, the last part of the book describes methods for classification, namely Bayes and decision tree classifiers, support vector machines (SVM), and linear discriminant analysis. Despite the fact that there are several good books in the literature on data mining, this new one is really special. It manages to include all of the latest developments in the data mining area, along with those past ideas that have survived the test of time. The book does not overload the reader with many variants of classic algorithms just to cover in breadth all methods, but it presents only those algorithms that have been the roots of large families of algorithmic ideas. In this way, the book offers to the reader a deep comprehension of original thinking. Overall, this is an excellent textbook for both undergraduate and postgraduate students, but it is also appropriate for scientists and engineers looking for solutions to their big data analysis problems. The solved and unsolved exercises in the book are carefully selected to enhance the reader's understanding and to challenge him or her to further investigate the specific topic the exercise is about. I expect that the quality of the book will cause demanding data scientists to save a place for it in their hearts. More reviews about this item: Amazon Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.