skip to main content
Skip header Section
Data Mining: The TextbookApril 2015
Publisher:
  • Springer Publishing Company, Incorporated
ISBN:978-3-319-14141-1
Published:14 April 2015
Pages:
734
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues. It goes beyond the traditional focus on data mining problems to introduce advanced data types such as text, time series, discrete sequences, spatial data, graph data, and social networks. Until now, no single book has addressed all these topics in a comprehensive and integrated way. The chapters of this book fall into one of three categories: Fundamental chapters: Data mining has four main problems, which correspond to clustering, classification, association pattern mining, and outlier analysis. These chapters comprehensively discuss a wide variety of methods for these problems. Domain chapters: These chapters discuss the specific methods used for different domains of data such as text data, time-series data, sequence data, graph data, and spatial data. Application chapters: These chapters study important applications such as stream mining, Web mining, ranking, recommendations, social networks, and privacy preservation. The domain chapters also have an applied flavor. Appropriate for both introductory and advanced data mining courses, Data Mining: The Textbook balances mathematical details and intuition. It contains the necessary mathematical details for professors and researchers, but it is presented in a simple and intuitive style to improve accessibility for students and industrial practitioners (including those with a limited mathematical background). Numerous illustrations, examples, and exercises are included, with an emphasis on semantically interpretable examples. Praise for Data Mining: The Textbook - As I read through this book, I have already decided to use it in my classes. This is a book written by an outstanding researcher who has made fundamental contributions to data mining, in a way that is both accessible and up to date. The book is complete with theory and practical use cases. Its a must-have for students and professors alike!" -- Qiang Yang, Chair of Computer Science and Engineering at Hong Kong University of Science and Technology" This is the most amazing and comprehensive text book on data mining. It covers not only the fundamental problems, such as clustering, classification, outliers and frequent patterns, and different data types, including text, time series, sequences, spatial data and graphs, but also various applications, such as recommenders, Web, social network and privacy. It is a great book for graduate students and researchers as well as practitioners." -- Philip S. Yu, UIC Distinguished Professor and Wexler Chair in Information Technology at University of Illinois at Chicago

Cited By

  1. Khan S and Shaheen M (2023). From data mining to wisdom mining, Journal of Information Science, 49:4, (952-975), Online publication date: 1-Aug-2023.
  2. ACM
    Yates D and Islam M (2022). Data Mining on Smartphones: An Introduction and Survey, ACM Computing Surveys, 55:5, (1-38), Online publication date: 30-Jun-2023.
  3. ACM
    Bustio-Martínez L, Cumplido R, Letras M, Hernández-León R, Feregrino-Uribe C and Hernández-Palancar J (2021). FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review, ACM Computing Surveys, 54:9, (1-35), Online publication date: 31-Dec-2022.
  4. He J, Han X, Wang J and Zhang K (2022). Efficient high-utility occupancy itemset mining algorithm on massive data, Expert Systems with Applications: An International Journal, 210:C, Online publication date: 30-Dec-2022.
  5. ACM
    Stavropoulos V, Michelioudakis E, Akasiadis C and Artikis A Resource-effective exploration of tumor treatments with multi-scale simulations Proceedings of the 12th Hellenic Conference on Artificial Intelligence, (1-10)
  6. ACM
    Bellas C, Kougka G, Naskos A, Gounaris A, Vakali A, Xenakis C and Papadopoulos A Facilitating DoS Attack Detection using Unsupervised Anomaly Detection Proceedings of the 34th International Conference on Scientific and Statistical Database Management, (1-4)
  7. ACM
    Januzaj E, Weber M, Keller M, Auch M and Mandl P CoSim: An Approach to Calculate Complex Object Similarity The 23rd International Conference on Information Integration and Web Intelligence, (324-327)
  8. ACM
    Duricic T, Kowald D, Schedl M and Lex E My friends also prefer diverse music Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, (447-454)
  9. Bruto da Costa A and Dasgupta P (2021). Learning Temporal Causal Sequence Relationships from Real-Time Time-Series, Journal of Artificial Intelligence Research, 70, (205-243), Online publication date: 1-May-2021.
  10. Kaur I, Doja M, Ahmad T, Ahmad M, Hussain A, Nadeem A, Abd El-Latif A and Doulamis A (2021). An Integrated  Approach for Cancer Survival Prediction Using Data Mining Techniques, Computational Intelligence and Neuroscience, 2021, Online publication date: 1-Jan-2021.
  11. Molina-Coronado B, Mori U, Mendiburu A and Miguel-Alonso J (2020). Survey of Network Intrusion Detection Methods From the Perspective of the Knowledge Discovery in Databases Process, IEEE Transactions on Network and Service Management, 17:4, (2451-2479), Online publication date: 1-Dec-2020.
  12. ACM
    Zhu H and Bayley I Exploratory Datamorphic Testing of Classification Applications Proceedings of the IEEE/ACM 1st International Conference on Automation of Software Test, (51-60)
  13. Ryu H, Jung S and Pramanik S (2020). An Effective Clustering Method over CF$^+$+ Tree Using Multiple Range Queries, IEEE Transactions on Knowledge and Data Engineering, 32:9, (1694-1706), Online publication date: 1-Sep-2020.
  14. Raj S, Ramesh D, Sreenu M and Sethi K (2020). EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data, Knowledge and Information Systems, 62:9, (3565-3583), Online publication date: 1-Sep-2020.
  15. Kel’manov A, Khamidullin S, Khandeev V and Pyatkin A (2019). Exact algorithms for two integer-valued problems of searching for the largest subset and longest subsequence, Annals of Mathematics and Artificial Intelligence, 88:1-3, (157-168), Online publication date: 1-Mar-2020.
  16. ACM
    Power J and Waldron J Calibration and Analysis of Source Code Similarity Measures for Verilog Hardware Description Language Projects Proceedings of the 51st ACM Technical Symposium on Computer Science Education, (420-426)
  17. ACM
    Kotsiopoulos C, Doudoumis I, Raftopoulou P and Tryfonopoulos C DaST Proceedings of the 8th Computer Science Education Research Conference, (104-109)
  18. Kel’manov A and Ruzankin P (2019). An Accelerated Exact Algorithm for the One-Dimensional M-Variance Problem, Pattern Recognition and Image Analysis, 29:4, (573-576), Online publication date: 1-Oct-2019.
  19. ACM
    Momotov A and Xie X Determining Lead-Lag Structure between Sentiment Index and Stock Price Returns Proceedings of the 3rd International Conference on Vision, Image and Signal Processing, (1-7)
  20. Kel’manov A and Khandeev V Fast and Exact Algorithms for Some NP-Hard 2-Clustering Problems in the One-Dimensional Case Analysis of Images, Social Networks and Texts, (377-387)
  21. Khandeev V Polynomial-Time Approximation Scheme for a Problem of Searching for the Largest Subset with the Constraint on Quadratic Variation Numerical Computations: Theory and Algorithms, (400-405)
  22. ACM
    Costadopoulos N, Islam M and Tien D Data mining and knowledge discovery from physiological sensors Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments, (468-474)
  23. ACM
    Liu Y, Safavi T, Dighe A and Koutra D (2018). Graph Summarization Methods and Applications, ACM Computing Surveys, 51:3, (1-34), Online publication date: 31-May-2019.
  24. Malgonde O and Chari K (2019). An ensemble-based model for predicting agile software development effort, Empirical Software Engineering, 24:2, (1017-1055), Online publication date: 1-Apr-2019.
  25. Algar M, Martín de Diego I, Fernández-Isabel A, Monjas M, Ortega F, Moguerza J, Jacynycz H and Esparza O (2019). A Quality of Experience Management Framework for Mobile Users, Wireless Communications & Mobile Computing, 2019, Online publication date: 1-Jan-2019.
  26. ACM
    Luo G (2018). Progress Indication for Machine Learning Model Building, ACM SIGKDD Explorations Newsletter, 20:2, (1-12), Online publication date: 11-Dec-2018.
  27. Kel'manov A, Khamidullin S, Khandeev V and Pyatkin A (2018). An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence, Pattern Recognition and Image Analysis, 28:4, (703-711), Online publication date: 1-Oct-2018.
  28. Ouni A, Urruty T and Visani M (2018). A robust CBIR framework in between bags of visual words and phrases models for specific image datasets, Multimedia Tools and Applications, 77:20, (26173-26189), Online publication date: 1-Oct-2018.
  29. Buă?Ar J, ?Nidaršiă? M and Povh J (2018). Annotated news corpora and a lexicon for sentiment analysis in Slovene, Language Resources and Evaluation, 52:3, (895-919), Online publication date: 1-Sep-2018.
  30. ACM
    Mehri V, Ilie D and Tutschku K Privacy and DRM Requirements for Collaborative Development of AI Applications Proceedings of the 13th International Conference on Availability, Reliability and Security, (1-8)
  31. Kel'manov A, Khamidullin S, Khandeev V, Pyatkin A, Shamardin Y and Shenmaier V (2018). A Polynomial-Time Approximation Algorithm for One Problem Simulating the Search in a Time Series for the Largest Subsequence of Similar Elements, Pattern Recognition and Image Analysis, 28:3, (363-370), Online publication date: 1-Jul-2018.
  32. ACM
    Trittenbach H, Bach J and Böhm K On the Tradeoff between Energy Data Aggregation and Clustering Quality Proceedings of the Ninth International Conference on Future Energy Systems, (399-401)
  33. Kel’manov A, Khamidullin S, Khandeev V and Pyatkin A Exact Algorithms for Two Quadratic Euclidean Problems of Searching for the Largest Subset and Longest Subsequence Learning and Intelligent Optimization, (326-336)
  34. ACM
    Zhang J and Yu P (2018). Broad Learning:, ACM SIGKDD Explorations Newsletter, 20:1, (24-50), Online publication date: 29-May-2018.
  35. ACM
    Härtel J, Aksu H and Lämmel R Classification of APIs by hierarchical clustering Proceedings of the 26th Conference on Program Comprehension, (233-243)
  36. Ahmed M (2018). Reservoir-based network traffic stream summarization for anomaly detection, Pattern Analysis & Applications, 21:2, (579-599), Online publication date: 1-May-2018.
  37. D'Emilia G, Gaspari A and Galar D (2018). Improvement of Measurement Contribution for Asset Characterization in Complex Engineering Systems by an Iterative Methodology, International Journal of Service Science, Management, Engineering, and Technology, 9:2, (85-103), Online publication date: 1-Apr-2018.
  38. ACM
    Mrázová I and Zvirinský P Czech Insolvency Proceedings Proceedings of the 2018 10th International Conference on Machine Learning and Computing, (150-156)
  39. Ozkan N, Kahya E and Dauwels J (2018). Classification of BCI Users Based on Cognition, Computational Intelligence and Neuroscience, 2018, Online publication date: 1-Jan-2018.
  40. Bajtoš T, Gajdoš A, Kleinová L, Lučivjanská K and Sokol P (2018). Network Intrusion Detection with Threat Agent Profiling, Security and Communication Networks, 2018, (4), Online publication date: 1-Jan-2018.
  41. Kel'manov A and Motkova A (2018). Approximation Scheme for a Quadratic Euclidean Weighted 2-Clustering Problem, Pattern Recognition and Image Analysis, 28:1, (17-23), Online publication date: 1-Jan-2018.
  42. ACM
    Samonte M, Garcia J, Lucero V and Santos S Sentiment and opinion analysis on Twitter about local airlines Proceedings of the 3rd International Conference on Communication and Information Processing, (415-422)
  43. ACM
    Luo G (2017). Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution, ACM SIGKDD Explorations Newsletter, 19:2, (13-24), Online publication date: 21-Nov-2017.
  44. Jensen S, Pedersen T and Thomsen C (2017). Time Series Management Systems: A Survey, IEEE Transactions on Knowledge and Data Engineering, 29:11, (2581-2600), Online publication date: 1-Nov-2017.
  45. Genovez P, Ebecken N, Freitas C, Bentz C and Freitas R (2017). Intelligent hybrid system for dark spot detection using SAR data, Expert Systems with Applications: An International Journal, 81:C, (384-397), Online publication date: 15-Sep-2017.
  46. Suzuki N and Matsuno H (2017). Radio Wave Environment Analysis at Different Locations Based on Frequent Pattern Mining, Procedia Computer Science, 112:C, (1396-1403), Online publication date: 1-Sep-2017.
  47. Bauder R and Khoshgoftaar T Estimating Outlier Score Probabilities 2017 IEEE International Conference on Information Reuse and Integration (IRI), (559-568)
  48. Katos V, Serketzis N, Ilioudis C, Baltatzis D and Pangalos G (2017). A Socio-Technical Perspective on Threat Intelligence Informed Digital Forensic Readiness, International Journal of Systems and Society, 4:2, (57-68), Online publication date: 1-Jul-2017.
  49. Ageev A, Kel'manov A, Pyatkin A, Khamidullin S and Shenmaier V (2017). Approximation polynomial algorithm for the data editing and data cleaning problem, Pattern Recognition and Image Analysis, 27:3, (365-370), Online publication date: 1-Jul-2017.
  50. ACM
    Patroumpas K and Koutras C Probabilistic k-Nearest Neighbor Monitoring of Moving Gaussians Proceedings of the 29th International Conference on Scientific and Statistical Database Management, (1-12)
  51. Gan G and Ng M (2017). k-means clustering with outlier removal, Pattern Recognition Letters, 90:C, (8-14), Online publication date: 15-Apr-2017.
  52. Di Ciccio C, Maggi F, Montali M and Mendling J (2017). Resolving inconsistencies and redundancies in declarative process models, Information Systems, 64:C, (425-446), Online publication date: 1-Mar-2017.
  53. ACM
    Duong V, Khan K and Lee Y Top-k frequent induced subgraph mining on a sliding window using sampling Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication, (1-7)
  54. ACM
    Du M, State R, Brorsson M and Avanesov T Behavior profiling for mobile advertising Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, (302-307)
  55. ACM
    Salloum S, Huang J and He Y Empirical analysis of asymptotic ensemble learning for big data Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, (8-17)
  56. ACM
    Ouhbi B, Kamoune M, Frikh B, Zemmouri E and Behja H A hybrid feature selection rule measure and its application to systematic review Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services, (106-114)
  57. Khurana U, Parthasarathy S and Turaga D (2016). Graph-based exploration of non-graph datasets, Proceedings of the VLDB Endowment, 9:13, (1557-1560), Online publication date: 1-Sep-2016.
  58. Luger C, Kallinovsky J and Rieberer R (2016). Identification of representative operating conditions of HVAC systems in passenger rail vehicles based on sampling virtual train trips, Advanced Engineering Informatics, 30:2, (157-167), Online publication date: 1-Apr-2016.
  59. Savchenko A (2016). Fast multi-class recognition of piecewise regular objects based on sequential three-way decisions and granular computing, Knowledge-Based Systems, 91:C, (252-262), Online publication date: 1-Jan-2016.
  60. Silva L, Siqueira M, Pinto F, Barros F, Zimbrão G and Souza J (2016). Applying data mining techniques for spatial distribution analysis of plant species co-occurrences, Expert Systems with Applications: An International Journal, 43:C, (250-260), Online publication date: 1-Jan-2016.
  61. ACM
    Rapti A, Tsolis D, Sioutas S and Tsakalidis A A Survey Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS), (1-6)
Contributors
  • IBM Thomas J. Watson Research Center

Recommendations

Reviews

Radu State

Written by one of the most prodigious editors and authors in the data mining community, Data mining: the textbook is a comprehensive introduction to the fundamentals and applications of data mining. The recent drive in industry and academia toward data science and more specifically "big data" makes any well-written book on this topic a welcome addition to the bookshelves of experienced and aspiring data scientists. The content can be roughly divided into three major parts, addressing the fundamentals, the domain, and applications. Each of these sections can serve on its own as background reading for an undergraduate or graduate class. The first part is excellent for an undergraduate computer science course on data mining. Being contained in the first 10 chapters, the author addresses here the four major problems in data mining: clustering, classification, association pattern mining, and outlier detection. In the second part, which can be used in a graduate-level course in data mining, the author goes into the specific approaches for text mining, time series mining, and spatial data. Finally, the third part (chapters 18 to 20) addresses highly specific and timely topics: web mining, social network analytics, and privacy-preserving data mining. The target audience for this part would be mostly graduate students and researchers working on the topics presented. It is not likely that readers will read the book from cover to cover, but they will instead focus on the chapters required for a given project. The writing style is excellent and the author manages to provide sufficient mathematical background in terms of formal proofs and notations, in order to make it self-contained and scientifically appealing to more theory-oriented readers. Covering more than 20 chapters and 700 pages, Aggarwal provides a unique textbook and reference to data mining, which I recommend to every reader working on or learning about data mining. More reviews about this item: Amazon Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.