skip to main content
Skip header Section
Principles of data miningAugust 2001
Publisher:
  • MIT Press
  • 55 Hayward St.
  • Cambridge
  • MA
  • United States
ISBN:978-0-262-08290-7
Published:01 August 2001
Pages:
425
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

Cited By

  1. ACM
    Christen P, Hand D and Kirielle N (2023). A Review of the F-Measure: Its History, Properties, Criticism, and Alternatives, ACM Computing Surveys, 56:3, (1-24), Online publication date: 31-Mar-2024.
  2. Behnamian J, Ghadimi M and Farajiamiri M (2022). Data mining-based firefly algorithm for green vehicle routing problem with heterogeneous fleet and refueling constraint, Artificial Intelligence Review, 56:7, (6557-6589), Online publication date: 1-Jul-2023.
  3. Swain A and Garza V (2023). Key Factors in Achieving Service Level Agreements (SLA) for Information Technology (IT) Incident Resolution, Information Systems Frontiers, 25:2, (819-834), Online publication date: 1-Apr-2023.
  4. ACM
    Zhao L, Qi K, Su Z, Pang L, Zhang L and Wu D A Disk Failure Prediction Algorithm Based on Fusion Model Proceedings of the 2023 15th International Conference on Machine Learning and Computing, (544-550)
  5. Bobrowski L Collinear Data Structures and Interaction Models Computational Collective Intelligence, (378-387)
  6. Mohammadi M and Mobarakeh M (2022). An integrated clustering algorithm based on firefly algorithm and self-organized neural network, Progress in Artificial Intelligence, 11:3, (207-217), Online publication date: 1-Sep-2022.
  7. Jungherr A, Posegga O and An J (2022). Populist Supporters on Reddit, Social Science Computer Review, 40:3, (809-830), Online publication date: 1-Jun-2022.
  8. Calp M, Butuner R, Kose U, Alamri A and Camacho D (2022). IoHT-based deep learning controlled robot vehicle for paralyzed patients of smart cities, The Journal of Supercomputing, 78:9, (11373-11408), Online publication date: 1-Jun-2022.
  9. Bahri M and Bifet A Incremental k-Nearest Neighbors Using Reservoir Sampling for Data Streams Discovery Science, (122-137)
  10. Yeh W, Jiang Y, Tan S, Yeh C and Cherifi H (2021). A New Support Vector Machine Based on Convolution Product, Complexity, 2021, Online publication date: 1-Jan-2021.
  11. Morillo-Salas J, Bolón-Canedo V and Alonso-Betanzos A (2020). Dealing with heterogeneity in the context of distributed feature selection for classification, Knowledge and Information Systems, 63:1, (233-276), Online publication date: 1-Jan-2021.
  12. ACM
    Hafiz P, Miskowiak K, Maxhuni A, Kessing L and Bardram J (2020). Wearable Computing Technology for Assessment of Cognitive Functioning of Bipolar Patients and Healthy Controls, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4:4, (1-22), Online publication date: 17-Dec-2020.
  13. Ceritli T, Williams C and Geddes J (2020). ptype: probabilistic type inference, Data Mining and Knowledge Discovery, 34:3, (870-904), Online publication date: 1-May-2020.
  14. Bahri M, Pfahringer B, Bifet A and Maniu S Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams Advances in Intelligent Data Analysis XVIII, (40-53)
  15. ACM
    ElMenshawy D, Helmy W and El-Tazi N A Novel Approach for Collective Anomaly Detection in Internet of Things Proceedings of 2020 6th International Conference on Computing and Data Engineering, (86-90)
  16. Oliveira R, Pereira A and Tavares J (2019). Computational diagnosis of skin lesions from dermoscopic images using combined features, Neural Computing and Applications, 31:10, (6091-6111), Online publication date: 1-Oct-2019.
  17. ACM
    RASTGOO M, Nakisa B, Rakotonirainy A, Chandran V and Tjondronegoro D (2018). A Critical Review of Proactive Detection of Driver Stress Levels Based on Multimodal Measurements, ACM Computing Surveys, 51:5, (1-35), Online publication date: 30-Sep-2019.
  18. Bobrowski L Local Models of Interaction on Collinear Patterns Computational Collective Intelligence, (259-270)
  19. Coelho P, Goncalves C, Portela F and Santos M Towards to Use Image Mining to Classified Skin Problems - A Melanoma Case Study Progress in Artificial Intelligence, (384-395)
  20. Bolón-Canedo V, Sechidis K, Sánchez-Maroño N, Alonso-Betanzos A and Brown G (2022). Insights into distributed feature ranking, Information Sciences: an International Journal, 496:C, (378-398), Online publication date: 1-Sep-2019.
  21. Deka R, Bhattacharyya D and Kalita J (2019). Active learning to detect DDoS attack using ranked features, Computer Communications, 145:C, (203-222), Online publication date: 1-Sep-2019.
  22. de Oliveira M, Neder R, de Souza P, Maciel C, Campos Silva Freire N, de Almeida Peres J, Vuolo C, dos Anjos A and Mansilla D Indicators of Municipal Public Management: Study of Multiple Performance Measurement Systems Electronic Government and the Information Systems Perspective, (119-132)
  23. ACM
    Chua B and Zhang Y Predicting open source programming language repository file survivability from forking data Proceedings of the 15th International Symposium on Open Collaboration, (1-8)
  24. Clark J and Provost F (2019). Unsupervised dimensionality reduction versus supervised regularization for classification from sparse data, Data Mining and Knowledge Discovery, 33:4, (871-916), Online publication date: 1-Jul-2019.
  25. ACM
    Al Fanah M and Ansari M Understanding E-learners' Behaviour Using Data Mining Techniques Proceedings of the 2019 International Conference on Big Data and Education, (59-65)
  26. Shahrouzi S and Perera D (2019). Optimized hardware accelerators for data mining applications on embedded platforms, Microprocessors & Microsystems, 65:C, (79-96), Online publication date: 1-Mar-2019.
  27. Gürbüz F, Eski İ, Denizhan B and Dağlı C (2019). Prediction of damage parameters of a 3PL company via data mining and neural networks, Journal of Intelligent Manufacturing, 30:3, (1437-1449), Online publication date: 1-Mar-2019.
  28. (2019). Effective selling strategies for online auctions on eBay, International Journal of Business Information Systems, 30:2, (125-151), Online publication date: 1-Jan-2019.
  29. Saeidpour P, Otarkhani A and Shokouhyar S (2018). Predicting Customers' Churn Using Data Mining Technique and its Effect on the Development of Marketing Applications in Value-Added Services in Telecom Industry, International Journal of Information Systems in the Service Sector, 10:4, (59-72), Online publication date: 1-Oct-2018.
  30. Katrakazas C, Quddus M and Chen W (2018). A Simulation Study of Predicting Real-Time Conflict-Prone Traffic Conditions, IEEE Transactions on Intelligent Transportation Systems, 19:10, (3196-3207), Online publication date: 1-Oct-2018.
  31. Haghighi R, Rasouli M, Ahmed S, Tan K, Al–Mamun A and Chew C Depth-based Object Detection using Hierarchical Fragment Matching Method 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), (780-785)
  32. de Santana V and Baranauskas M A Taxonomy for Website Evaluation Tools Grounded on Semiotic Framework Universal Access in Human-Computer Interaction. Methods, Technologies, and Users, (38-49)
  33. Kang B, Lijffijt J, Santos-Rodríguez R and De Bie T (2018). SICA, Data Mining and Knowledge Discovery, 32:4, (949-987), Online publication date: 1-Jul-2018.
  34. ACM
    Chen N, Drouhard M, Kocielnik R, Suh J and Aragon C (2018). Using Machine Learning to Support Qualitative Coding in Social Science, ACM Transactions on Interactive Intelligent Systems, 8:2, (1-20), Online publication date: 30-Jun-2018.
  35. Rojo D, Raya L, Rubio-Sánchez M and Sanchez A A visual interface for feature subset selection using machine learning methods Proceedings of the XXVIII Spanish Computer Graphics Conference, (119-128)
  36. Kolodziej M, Majkowski A, Rak R, Tarnowski P and Rysz A Implementation of Lagged Phase Space for Spike Detection 2018 IEEE International Symposium on Medical Measurements and Applications (MeMeA), (1-6)
  37. Guo J, Yang D, Siegmund N, Apel S, Sarkar A, Valov P, Czarnecki K, Wasowski A and Yu H (2018). Data-efficient performance learning for configurable systems, Empirical Software Engineering, 23:3, (1826-1867), Online publication date: 1-Jun-2018.
  38. Torres C, Pérez-Lantero P and Gutiérrez G (2018). Linear separability in spatial databases, Knowledge and Information Systems, 54:2, (287-314), Online publication date: 1-Feb-2018.
  39. Gosztolya G and Tóth L (2018). A feature selection-based speaker clustering method for paralinguistic tasks, Pattern Analysis & Applications, 21:1, (193-204), Online publication date: 1-Feb-2018.
  40. Oliveira R, Papa J, Pereira A and Tavares J (2018). Computational methods for pigmented skin lesion classification in images, Neural Computing and Applications, 29:3, (613-636), Online publication date: 1-Feb-2018.
  41. Goswami A and Kumar A (2017). Challenges in the Analysis of Online Social Networks, Wireless Personal Communications: An International Journal, 97:3, (4015-4061), Online publication date: 1-Dec-2017.
  42. Dong C and Loukides G (2017). Approximating Private Set Union/Intersection Cardinality With Logarithmic Complexity, IEEE Transactions on Information Forensics and Security, 12:11, (2792-2806), Online publication date: 1-Nov-2017.
  43. Oliveira R, Pereira A and Tavares J (2017). Skin lesion computational diagnosis of dermoscopic images, Computer Methods and Programs in Biomedicine, 149:C, (43-53), Online publication date: 1-Oct-2017.
  44. Chen J, Zhang S, Wang M and Xu C (2017). A novel change feature-based approach to predict the impact of current proposed engineering change, Advanced Engineering Informatics, 33:C, (132-143), Online publication date: 1-Aug-2017.
  45. ACM
    He A, Chen Z, Li W, Li X, Li H and Zhao X DAC-SGD Proceedings of the 2nd International Conference on Intelligent Information Processing, (1-5)
  46. ACM
    Stephens C, Rodríguez-Ramírez R, Mireles V, Hernández-López S, Garcia-Aguirre C, Herrera-Ortiz J and Mantilla-Berniers N Risk Factors Linked to Influenza-like Illness as Identified from the Mexican Participatory Surveillance System Proceedings of the 2017 International Conference on Digital Health, (147-154)
  47. Taylor P, Griffiths N, Bhalerao A, Xu Z, Gelencser A and Popham T (2017). Investigating the Feasibility of Vehicle Telemetry Data as a Means of Predicting Driver Workload, International Journal of Mobile Human Computer Interaction, 9:3, (54-72), Online publication date: 1-Jul-2017.
  48. Zhao Y, Liu Y and Zeng Q (2017). A weight-based item recommendation approach for electronic commerce systems, Electronic Commerce Research, 17:2, (205-226), Online publication date: 1-Jun-2017.
  49. Kunaver M and Porl T (2017). Diversity in recommender systems A survey, Knowledge-Based Systems, 123:C, (154-162), Online publication date: 1-May-2017.
  50. Wang S and Wei J (2017). Feature selection based on measurement of ability to classify subproblems, Neurocomputing, 224:C, (155-165), Online publication date: 8-Feb-2017.
  51. Zandian Z and Keyvanpour M (2017). Systematic identification and analysis of different fraud detection approaches based on the strategy ahead, International Journal of Knowledge-based and Intelligent Engineering Systems, 21:2, (123-134), Online publication date: 1-Jan-2017.
  52. Marbouti F, Diefes-Dux H and Madhavan K (2016). Models for early prediction of at-risk students in a course using standards-based grading, Computers & Education, 103:C, (1-15), Online publication date: 1-Dec-2016.
  53. Robson B (2016). Studies in using a universal exchange and inference language for evidence based medicine. Semi-automated learning and reasoning for PICO methodology, systematic review, and environmental epidemiology, Computers in Biology and Medicine, 79:C, (299-323), Online publication date: 1-Dec-2016.
  54. Bakar N, Kasirun Z, Salleh N and Jalab H (2016). Extracting features from online software reviews to aid requirements reuse, Applied Soft Computing, 49:C, (1297-1315), Online publication date: 1-Dec-2016.
  55. ACM
    Mercado-Varela M, García-Holgado A, García-Peñalvo F and Ramírez-Montoya M Analyzing navigation logs in MOOC Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing Multiculturality, (873-880)
  56. Bolt A, Leoni M and Aalst W (2016). Scientific workflows for process mining, International Journal on Software Tools for Technology Transfer (STTT), 18:6, (607-628), Online publication date: 1-Nov-2016.
  57. ACM
    de Brito Moraes G and da Silva T Systematic Literature Mapping on Eye Tracking and Data Mining Proceedings of the 15th Brazilian Symposium on Human Factors in Computing Systems, (1-4)
  58. Li P, Xu M, Wu J and Shang L Using canonical correlation analysis for parallelized attribute reduction Proceedings of the 14th Pacific Rim International Conference on Trends in Artificial Intelligence, (433-445)
  59. ACM
    Kang B, Lijffijt J, Santos-Rodríguez R and De Bie T Subjectively Interesting Component Analysis Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (1615-1624)
  60. Chernogorov F, Chernov S, Brigatti K and Ristaniemi T (2016). Sequence-based detection of sleeping cell failures in mobile networks, Wireless Networks, 22:6, (2029-2048), Online publication date: 1-Aug-2016.
  61. Geler Z, Kurbalija V, Radovanović M and Ivanović M (2016). Comparison of different weighting schemes for the kNN classifier on time-series data, Knowledge and Information Systems, 48:2, (331-378), Online publication date: 1-Aug-2016.
  62. ACM
    B. Le T, Lo D, Le Goues C and Grunske L A learning-to-rank based fault localization approach using likely invariants Proceedings of the 25th International Symposium on Software Testing and Analysis, (177-188)
  63. Nirmala P, Lekshmi R and Nadarajan R (2016). Vertex cover-based binary tree algorithm to detect all maximum common induced subgraphs in large communication networks, Knowledge and Information Systems, 48:1, (229-252), Online publication date: 1-Jul-2016.
  64. Bendjebar S, Lafifi Y and Seridi H (2016). Modeling and Evaluating Tutors' Function using Data Mining and Fuzzy Logic Techniques, International Journal of Web-Based Learning and Teaching Technologies, 11:2, (39-60), Online publication date: 1-Apr-2016.
  65. Barak A and Gelbard R (2016). Classification by clustering using an extended saliency measure, Expert Systems: The Journal of Knowledge Engineering, 33:1, (46-59), Online publication date: 1-Feb-2016.
  66. Pierazzi F, Casolari S, Colajanni M and Marchetti M (2016). Exploratory security analytics for anomaly detection, Computers and Security, 56:C, (28-49), Online publication date: 1-Feb-2016.
  67. Ballings M, Van den Poel D, Hespeels N and Gryp R (2015). Evaluating multiple classifiers for stock price direction prediction, Expert Systems with Applications: An International Journal, 42:20, (7046-7056), Online publication date: 15-Nov-2015.
  68. ACM
    Cho Y, Moon S and Jeong S Learning Listener's Preference for Music Recommender System Proceedings of the 2015 International Conference on Big Data Applications and Services, (229-232)
  69. ACM
    Cho Y and Jeong S A Recommender System in u-Commerce based on a Segmentation Method Proceedings of the 2015 International Conference on Big Data Applications and Services, (148-150)
  70. Bruni R and Bianchi G (2015). Effective Classification Using a Small Training Set Based on Discretization and Statistical Analysis, IEEE Transactions on Knowledge and Data Engineering, 27:9, (2349-2361), Online publication date: 1-Sep-2015.
  71. ACM
    Tekieh M and Raahemi B Importance of Data Mining in Healthcare Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, (1057-1062)
  72. ACM
    Goswami S and Sangeeta K Anomalies in Landsat Imagery and Imputation Proceedings of the Third International Symposium on Women in Computing and Informatics, (353-358)
  73. Asta D and Shalizi C Geometric network comparisons Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, (102-110)
  74. Miller L and Leen-Kiat Soh (2015). Cluster-Based Boosting, IEEE Transactions on Knowledge and Data Engineering, 27:6, (1491-1504), Online publication date: 1-Jun-2015.
  75. Santana V and Baranauskas M (2015). WELFIT, International Journal of Human-Computer Studies, 76:C, (40-49), Online publication date: 1-Apr-2015.
  76. H.N. L and Mohanty H A Preprocessing of Service Registry Proceedings of the 11th International Conference on Distributed Computing and Internet Technology - Volume 8956, (220-232)
  77. Wei C (2015). Comparing lazy and eager learning models for water level forecasting in river-reservoir basins of inundation regions, Environmental Modelling & Software, 63:C, (137-155), Online publication date: 1-Jan-2015.
  78. Quick D and Choo K (2014). Impacts of increasing volume of digital forensic data, Digital Investigation: The International Journal of Digital Forensics & Incident Response, 11:4, (273-294), Online publication date: 1-Dec-2014.
  79. ACM
    Agrawal R, Golshan B and Terzi E Grouping students in educational settings Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, (1017-1026)
  80. ACM
    Zikos D, Tsiakas K, Qudah F, Athitsos V and Makedon F Evaluation of classification methods for the prediction of hospital length of stay using medicare claims data Proceedings of the 7th International Conference on PErvasive Technologies Related to Assistive Environments, (1-6)
  81. Wolfs V and Willems P (2014). Development of discharge-stage curves affected by hysteresis using time varying models, model trees and neural networks, Environmental Modelling & Software, 55:C, (107-119), Online publication date: 1-May-2014.
  82. López-Yáñez I, Sheremetov L and Yáñez-Márquez C (2014). A novel associative model for time series data mining, Pattern Recognition Letters, 41:C, (23-33), Online publication date: 1-May-2014.
  83. Lucia L, Lo D, Jiang L, Thung F and Budi A (2014). Extended comprehensive study of association measures for fault localization, Journal of Software: Evolution and Process, 26:2, (172-219), Online publication date: 1-Feb-2014.
  84. ACM
    Yahya Y, Ismail R, Vanna S and Saret K Using data mining techniques for predicting individual tree mortality in tropical rain forest Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, (1-5)
  85. ACM
    Foncubierta-Rodríguez A, García Seco de Herrera A and Müller H Medical image retrieval using bag of meaningful visual words Proceedings of the 1st ACM international workshop on Multimedia indexing and information retrieval for healthcare, (75-82)
  86. Chen S and Huang P (2013). The comparisons of the influences of prior knowledge on two game-based learning systems, Computers & Education, 68:C, (177-186), Online publication date: 1-Oct-2013.
  87. Tavakoli S, Mousavi A and Poslad S (2013). Input variable selection in time-critical knowledge integration applications, Advanced Engineering Informatics, 27:4, (519-536), Online publication date: 1-Oct-2013.
  88. Wang Y, Di G, Yu J, Lei J and Coenen F Feature representation for customer attrition risk prediction in retail banking Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects, (229-238)
  89. Mari F, Melatti I, Tronci E and Finzi A (2013). A multi-hop advertising discovery and delivering protocol for multi administrative domain MANET, Mobile Information Systems, 9:3, (261-280), Online publication date: 1-Jul-2013.
  90. Lutu P and Engelbrecht A (2013). Base Model Combination Algorithm for Resolving Tied Predictions for K-Nearest Neighbor OVA Ensemble Models, INFORMS Journal on Computing, 25:3, (517-526), Online publication date: 1-Jul-2013.
  91. Wang Z and Liu D (2013). Data-based stability analysis of a class of nonlinear discrete-time systems, Information Sciences: an International Journal, 235, (36-44), Online publication date: 1-Jun-2013.
  92. Livi L and Rizzi A (2013). Graph ambiguity, Fuzzy Sets and Systems, 221, (24-47), Online publication date: 1-Jun-2013.
  93. ACM
    Galatas G, Zikos D and Makedon F Application of data mining techniques to determine patient satisfaction Proceedings of the 6th International Conference on PErvasive Technologies Related to Assistive Environments, (1-4)
  94. ACM
    Wang B, Wang C, Bu J, Chen C, Zhang W, Cai D and He X Whom to mention Proceedings of the 22nd international conference on World Wide Web, (1331-1340)
  95. Jankowska B and Szymkowiak M Machine Ranking of 2-Uncertain Rules Acquired from Real Data Transactions on Computational Collective Intelligence XI - Volume 8065, (198-222)
  96. De Falco I (2013). Differential Evolution for automatic rule extraction from medical databases, Applied Soft Computing, 13:2, (1265-1283), Online publication date: 1-Feb-2013.
  97. ACM
    Biancalana C, Gasparetti F, Micarelli A and Sansonetti G (2013). An approach to social recommendation for context-aware mobile services, ACM Transactions on Intelligent Systems and Technology, 4:1, (1-31), Online publication date: 1-Jan-2013.
  98. Kourkoutas D, Karanasiou I, Tsekouras G, Moshos M, Iliakis E and Georgopoulos G (2012). Glaucoma risk assessment using a non-linear multivariable regression method, Computer Methods and Programs in Biomedicine, 108:3, (1149-1159), Online publication date: 1-Dec-2012.
  99. d’Amato C, Bryl V and Serafini L Semantic Knowledge Discovery and Data-Driven Logical Reasoning from Heterogeneous Data Sources Uncertainty Reasoning for the Semantic Web III, (163-183)
  100. Lazarou C, Karaolis M, Matalas A and Panagiotakos D (2012). Dietary patterns analysis using data mining method. An application to data from the CYKIDS study, Computer Methods and Programs in Biomedicine, 108:2, (706-714), Online publication date: 1-Nov-2012.
  101. Zhan J, Zheng G, Jiang M, Lu C, Guo H and Lu A Rule-Based text mining of chinese herbal medicines with patterns in traditional chinese medicine for chronic obstructive pulmonary disease Proceedings of the 2012 international conference on Web Information Systems and Mining, (510-520)
  102. Jorge A, Mendes-Moreira J, de Sousa J, Soares C and Azevedo P Finding interesting contexts for explaining deviations in bus trip duration using distribution rules Proceedings of the 11th international conference on Advances in Intelligent Data Analysis, (139-149)
  103. de Herrera A, Markonis D and Müller H Bag---of---Colors for biomedical document image classification Proceedings of the Third MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support, (110-121)
  104. Odachowski K and Grekow J Using bookmaker odds to predict the final result of football matches Proceedings of the 16th international conference on Knowledge Engineering, Machine Learning and Lattice Computing with Applications, (196-205)
  105. Liukkonen M, Havia E and Hiltunen Y (2012). Computational intelligence in mass soldering of electronics - A survey, Expert Systems with Applications: An International Journal, 39:10, (9928-9937), Online publication date: 1-Aug-2012.
  106. Martínez-de-Pisón F, Sanz A, Martínez-de-Pisón E, Jiménez E and Conti D (2012). Mining association rules from time series to explain failures in a hot-dip galvanizing steel line, Computers and Industrial Engineering, 63:1, (22-36), Online publication date: 1-Aug-2012.
  107. Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang J and Hua L (2012). Data Mining in Healthcare and Biomedicine, Journal of Medical Systems, 36:4, (2431-2448), Online publication date: 1-Aug-2012.
  108. ACM
    Huang Z, Zhao H and Zhu D (2012). Two New Prediction-Driven Approaches to Discrete Choice Prediction, ACM Transactions on Management Information Systems, 3:2, (1-32), Online publication date: 1-Jul-2012.
  109. ACM
    van der Aalst W (2012). Process Mining, ACM Transactions on Management Information Systems, 3:2, (1-17), Online publication date: 1-Jul-2012.
  110. Engel R, van der Aalst W, Zapletal M, Pichler C and Werthner H Mining inter-organizational business process models from EDI messages Proceedings of the 24th international conference on Advanced Information Systems Engineering, (222-237)
  111. ACM
    Tarvid A Combining adaptive goal-driven agents with mixed multi-unit combinatorial auctions Proceedings of the 13th International Conference on Computer Systems and Technologies, (79-86)
  112. ACM
    Hollmén J Mixture modeling of gait patterns from sensor data Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments, (1-4)
  113. Juhola M and Siermala v (2012). A scatter method for data and variable importance evaluation, Integrated Computer-Aided Engineering, 19:2, (137-149), Online publication date: 1-Apr-2012.
  114. Demir E, Chahed S, Chaussalet T, Toffa S and Fouladinajed F (2012). A Decision Support Tool for Health Service Re-design, Journal of Medical Systems, 36:2, (621-630), Online publication date: 1-Apr-2012.
  115. ACM
    Palpanas T (2012). A knowledge mining framework for business analysts, ACM SIGMIS Database: the DATABASE for Advances in Information Systems, 43:1, (46-60), Online publication date: 1-Feb-2012.
  116. Subramani S and Balasubramaniam S Post mining of diversified multiple decision trees for actionable knowledge discovery Proceedings of the 2011 international conference on Advanced Computing, Networking and Security, (179-187)
  117. Huang Y, Seck M and Verbraeck A From data to simulation models Proceedings of the Winter Simulation Conference, (3724-3734)
  118. Alazab M, Venkatraman S, Watters P and Alazab M Zero-day malware detection based on supervised learning algorithms of API call signatures Proceedings of the Ninth Australasian Data Mining Conference - Volume 121, (171-182)
  119. ACM
    Dalal A (2011). User-perceived quality assessment of streaming media using reduced feature sets, ACM Transactions on Internet Technology, 11:2, (1-32), Online publication date: 1-Dec-2011.
  120. Abdullah Z, Herawan T and Deris M Visualizing the construction of incremental disorder Trie Itemset data structure (DOSTrieIT) for frequent pattern tree (FP-tree) Proceedings of the Second international conference on Visual informatics: sustaining research and innovations - Volume Part I, (183-195)
  121. Brown D, Famili F, Paass G, Smith-Miles K, Thomas L, Weber R, Baeza-Yates R, Bravo C, L'Huillier G and Maldonado S (2011). Future trends in business analytics and optimization, Intelligent Data Analysis, 15:6, (1001-1017), Online publication date: 1-Nov-2011.
  122. Krempl G The algorithm APT to classify in concurrence of latency and drift Proceedings of the 10th international conference on Advances in intelligent data analysis X, (222-233)
  123. Klawonn F, Höppner F and May S An alternative to ROC and AUC analysis of classifiers Proceedings of the 10th international conference on Advances in intelligent data analysis X, (210-221)
  124. Simiński R, Nowak-Brzezińska A, Jach T and Xięski T Towards a practical approach to discover internal dependencies in rule-based knowledge bases Proceedings of the 6th international conference on Rough sets and knowledge technology, (232-237)
  125. ACM
    Lutu P Empirical comparison of four classifier fusion strategies for positive-versus-negative ensembles Proceedings of the South African Institute of Computer Scientists and Information Technologists Conference on Knowledge, Innovation and Leadership in a Diverse, Multidisciplinary Environment, (302-305)
  126. Bogdan S, Kudinov A and Markov N Manufacturing execution systems intellectualization Proceedings of the First international conference on Model and data engineering, (170-177)
  127. Balaniuk R, do Prado H, da Veiga Guadagnin R, Ferneda E and Cobbe P Predicting evasion candidates in higher education institutions Proceedings of the First international conference on Model and data engineering, (143-151)
  128. Vogel P and Mattfeld D Strategic and operational planning of bike-sharing systems by data mining Proceedings of the Second international conference on Computational logistics, (127-141)
  129. Köksal G, Batmaz İ and Testik M (2011). A review of data mining applications for quality improvement in manufacturing industry, Expert Systems with Applications: An International Journal, 38:10, (13448-13467), Online publication date: 15-Sep-2011.
  130. Santos R, Macdonald C and Ounis I Aggregated search result diversification Proceedings of the Third international conference on Advances in information retrieval theory, (250-261)
  131. Tian D, Zeng X and Keane J (2011). Core-generating approximate minimum entropy discretization for rough set feature selection in pattern classification, International Journal of Approximate Reasoning, 52:6, (863-880), Online publication date: 1-Sep-2011.
  132. Kouno A, Montanier J, Takano S, Bredeche N, Schoenauer M, Sebag M and Suzuki E On-Board Evolutionary Algorithm and Off-Line Rule Discovery for Column Formation in Swarm Robotics Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02, (220-227)
  133. Rose A, Awang M, Hassan H, Zakaria A, Herawan T and Deris M Hybrid reduction in soft set decision making Proceedings of the 7th international conference on Advanced Intelligent Computing, (108-115)
  134. Chan K, Ling S, Dillon T and Nguyen H (2011). Diagnosis of hypoglycemic episodes using a neural network based rule discovery system, Expert Systems with Applications: An International Journal, 38:8, (9799-9808), Online publication date: 1-Aug-2011.
  135. Fonseca N, Santos Costa V and Camacho R Conceptual clustering of multi-relational data Proceedings of the 21st international conference on Inductive Logic Programming, (145-159)
  136. Umek L and Zupan B (2011). Subgroup discovery in data sets with multi-dimensional responses, Intelligent Data Analysis, 15:4, (533-549), Online publication date: 1-Jun-2011.
  137. ACM
    Kostakis O, Papapetrou P and Hollmén J Distance measure for querying sequences of temporal intervals Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, (1-8)
  138. Džeroski S Inductive databases and constraint-based data mining Proceedings of the 9th international conference on Formal concept analysis, (1-17)
  139. Kronberger G, Fink S, Kommenda M and Affenzeller M Macro-economic time series modeling and interaction networks Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part II, (101-110)
  140. Sinha A and Zhao H (2011). Tuning expert systems for cost-sensitive decisions, Advances in Artificial Intelligence, 2011, (1-12), Online publication date: 1-Jan-2011.
  141. Dante C, De Pison Francisco J and Alpha P Temporal association rules mining Proceedings of the 9th WSEAS international conference on computational intelligence, man-machine systems and cybernetics, (69-74)
  142. Dante C, De Pison Francisco J and Alpha P Finding temporal associative rules in financial time-series Proceedings of the 9th WSEAS international conference on computational intelligence, man-machine systems and cybernetics, (60-68)
  143. Yu Y and Zhou Z (2010). A framework for modeling positive class expansion with single snapshot, Knowledge and Information Systems, 25:2, (211-227), Online publication date: 1-Nov-2010.
  144. David N and Begu L A website structure optimization model Proceedings of the 10th WSEAS international conference on Applied computer science, (426-429)
  145. Bifet A Adaptive Stream Mining Proceedings of the 2010 conference on Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams, (1-212)
  146. ACM
    Bhaskar R, Laxman S, Smith A and Thakurta A Discovering frequent patterns in sensitive data Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, (503-512)
  147. Parshutin S Managing product life cycle with multiagent data mining system Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects, (308-322)
  148. Derlatka M and Ihnatouski M Decision tree approach to rules extraction for human gait analysis Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I, (597-604)
  149. Piegat A and Olchowy M Does an optimal form of an expert fuzzy model exist? Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I, (175-184)
  150. Cai Q, He H, Man H and Qiu J IterativeSOMSO Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part I, (325-330)
  151. Rajasekharan J, Scharfenberger U, Gonçalves N and Vigário R Image approach towards document mining in neuroscientific publications Proceedings of the 9th international conference on Advances in Intelligent Data Analysis, (147-158)
  152. Engle K and Gangopadhyay A (2010). An Efficient Method for Discretizing Continuous Attributes, International Journal of Data Warehousing and Mining, 6:2, (1-21), Online publication date: 1-Apr-2010.
  153. ACM
    Chen S, Macredie R, Liu X and Sutcliffe A (2010). Editorial, ACM Transactions on Computer-Human Interaction, 17:1, (1-6), Online publication date: 1-Mar-2010.
  154. Hamdi-Cherif A Machine learning for intelligent bioinformatics Proceedings of the 9th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases, (315-320)
  155. Aronovich L and Spiegler I (2010). Bulk construction of dynamic clustered metric trees, Knowledge and Information Systems, 22:2, (211-244), Online publication date: 1-Feb-2010.
  156. Hotho A, Pedersen R and Wurst M Ubiquitous data Ubiquitous knowledge discovery, (61-74)
  157. Hotho A, Pedersen R and Wurst M Ubiquitous data Ubiquitous knowledge discovery, (61-74)
  158. Carrizosa E, Martin-Barragan B and Morales D (2010). Binarized Support Vector Machines, INFORMS Journal on Computing, 22:1, (154-167), Online publication date: 1-Jan-2010.
  159. Özbakır L, Baykasoğlu A and Kulluk S (2010). A soft computing-based approach for integrated training and rule extraction from artificial neural networks, Applied Soft Computing, 10:1, (304-317), Online publication date: 1-Jan-2010.
  160. ACM
    Gul N, Barki I and Akhtar N MFP Proceedings of the 7th International Conference on Frontiers of Information Technology, (1-7)
  161. Kulczycki P Statistical kernel estimators for data analysis and exploration tasks Proceedings of the 14th WSEAS International Conference on Applied mathematics, (257-262)
  162. Ji W, Chan C, Loh J, Choo F and Chen L Solar radiation prediction using statistical approaches Proceedings of the 7th international conference on Information, communications and signal processing, (646-650)
  163. ACM
    Piton T, Blanchard J, Briand H and Guillet F Domain driven data mining to improve promotional campaign ROI and select marketing channels Proceedings of the 18th ACM conference on Information and knowledge management, (1057-1066)
  164. El-Mouadib F, Zubi Z and Talhi H A modified C-means clustering algorithm Proceedings of the 8th WSEAS international conference on Data networks, communications, computers, (85-94)
  165. ACM
    Smith G, Tan D and Lee B iSee Proceedings of the International Conference on Advances in Computer Entertainment Technology, (190-197)
  166. Goh Y, Giess M and McMahon C (2009). Facilitating design learning through faceted classification of in-service information, Advanced Engineering Informatics, 23:4, (497-511), Online publication date: 1-Oct-2009.
  167. Pei-En F, Zhi-Yong M, Qing-Ying Q, Men-Hong S and Jian Z Application of KDD in mechanical structure symmetry design Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7, (327-331)
  168. Fang X Are you becoming a diabetic? a data mining approach Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 5, (18-22)
  169. Zhuhadar L, Nasraoui O, Wyatt R and Romero E (2009). Metadata as seeds for building an ontology driven information retrieval system, International Journal of Hybrid Intelligent Systems, 6:3, (169-186), Online publication date: 1-Aug-2009.
  170. Gago P and Santos M Closed loop knowledge discovery for decision support in intensive care medicine Proceedings of the WSEAES 13th international conference on Computers, (447-452)
  171. Liu H, Torii M, Xu G, Hu Z and Goll J Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology, (62-70)
  172. Cai Q, He H and Man H SOMSO Proceedings of the 2009 international joint conference on Neural Networks, (2126-2132)
  173. Kaburlasos V, Moussiades L and Vakali A (2009). Fuzzy lattice reasoning (FLR) type neural computation for weighted graph partitioning, Neurocomputing, 72:10-12, (2121-2133), Online publication date: 1-Jun-2009.
  174. Siirtola P, Laurinen P and Röning J Mining an optimal prototype from a periodic time series Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (2818-2824)
  175. Pylvänen M, Äyrämö S and Kärkkäinen T Visualizing time series state changes with prototype based clustering Proceedings of the 9th international conference on Adaptive and natural computing algorithms, (619-628)
  176. Podolak I and Bartocha K A hierarchical classifier with growing neural gas clustering Proceedings of the 9th international conference on Adaptive and natural computing algorithms, (283-292)
  177. Pylvänen M, Äyrämö S and Kärkkäinen T Visualizing Time Series State Changes with Prototype Based Clustering Proceedings of the 2009 conference on Adaptive and Natural Computing Algorithms - Volume 5495, (619-628)
  178. Podolak I and Bartocha K A Hierarchical Classifier with Growing Neural Gas Clustering Proceedings of the 2009 conference on Adaptive and Natural Computing Algorithms - Volume 5495, (283-292)
  179. Bianco A, Mardente G, Mellia M, Munafò M and Muscariello L (2009). Web user-session inference by means of clustering techniques, IEEE/ACM Transactions on Networking, 17:2, (405-416), Online publication date: 1-Apr-2009.
  180. Aittokoski T, Ayramo S and Miettinen K (2009). Clustering aided approach for decision making in computationally expensive multiobjective optimization, Optimization Methods & Software, 24:2, (157-174), Online publication date: 1-Apr-2009.
  181. Cortés C, Díaz-Báòez J, Pérez-Lantero P, Seara C, Urrutia J and Ventura I (2009). Bichromatic separability with two boxes, Journal of Algorithms, 64:2-3, (79-88), Online publication date: 1-Apr-2009.
  182. ACM
    Nehme R, Rundensteiner E and Bertino E Self-tuning query mesh for adaptive multi-route query processing Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, (803-814)
  183. Woon W and Wong K (2009). String alignment for automated document versioning, Knowledge and Information Systems, 18:3, (293-309), Online publication date: 1-Mar-2009.
  184. ACM
    Kriegel H, Kröger P and Zimek A (2009). Clustering high-dimensional data, ACM Transactions on Knowledge Discovery from Data, 3:1, (1-58), Online publication date: 1-Mar-2009.
  185. David N, Patrascu N, Carstea C, Patrascu L, Ratiu I and Damian D Advanced methods for data mining Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases, (407-412)
  186. Chang C, Chu C and Yeh Y (2009). Integrating in-process software defect prediction with association mining to discover defect pattern, Information and Software Technology, 51:2, (375-384), Online publication date: 1-Feb-2009.
  187. Tsekouras G, Kanellos F, Kontargyri V, Karanasiou I, Salis A and Mastorakis N (2008). A new classification pattern recognition methodology for power system typical load profiles, WSEAS Transactions on Circuits and Systems, 7:12, (1090-1104), Online publication date: 1-Dec-2008.
  188. Järvelin A and Järvelin A Comparison of s-gram Proximity Measures in Out-of-Vocabulary Word Translation Proceedings of the 15th International Symposium on String Processing and Information Retrieval, (75-86)
  189. ACM
    Newsam S and Yang Y Integrating gazetteers and remote sensed imagery Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems, (1-10)
  190. ACM
    Blanco L, Crescenzi V, Merialdo P and Papotti P Supporting the automatic construction of entity aware search engines Proceedings of the 10th ACM workshop on Web information and data management, (149-156)
  191. ACM
    Zhuhadar L, Nasraoui O and Wyatt R Metadata domain-knowledge driven search engine in "HyperManyMedia" E-learning resources Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology, (363-370)
  192. Lind G and Kuusik R (2008). New developments for determinacy analysis, WSEAS Transactions on Information Science and Applications, 5:10, (1448-1459), Online publication date: 1-Oct-2008.
  193. ACM
    Kraetzer C and Dittmann J Impact of feature selection in classification for hidden channel detection on the example of audio data hiding Proceedings of the 10th ACM workshop on Multimedia and security, (159-166)
  194. Lind G and Kuusik R Determinacy analysis as a diclique extracting task Proceedings of the 2nd conference on European computing conference, (119-125)
  195. Zhao H and Ram S (2008). Entity matching across heterogeneous data sources, Data & Knowledge Engineering, 66:3, (368-381), Online publication date: 1-Sep-2008.
  196. Palpanas T and Sairamesh J Knowledge Mining for the Business Analyst Proceedings of the 19th international conference on Database and Expert Systems Applications, (770-778)
  197. Abdelzaher T, Khan M, Le H, Ahmadi H and Han J Data mining for diagnostic debugging in sensor networks Proceedings of the Second international conference on Knowledge Discovery from Sensor Data, (1-24)
  198. Petrosino A and Staiano A (2008). Fuzzy modeling for data cleaning in sensor networks, International Journal of Hybrid Intelligent Systems, 5:3, (143-151), Online publication date: 1-Aug-2008.
  199. Tsekouras G, Kanellos F, Kontargyri V, Karanasiou E, Salis A and Mastorakis N Power system typical load profiles using a new pattern recognition methodology Proceedings of the 12th WSEAS international conference on Circuits, (25-31)
  200. Atzori M, Bonchi F, Giannotti F and Pedreschi D (2008). Anonymity preserving pattern discovery, The VLDB Journal — The International Journal on Very Large Data Bases, 17:4, (703-727), Online publication date: 1-Jul-2008.
  201. Bibi S and Stamelos I Analogy Based Cost Estimation Configuration with Rules Proceedings of the 2008 conference on Knowledge-Based Software Engineering: Proceedings of the Eighth Joint Conference on Knowledge-Based Software Engineering, (317-326)
  202. Bibi S, Stamelos I and Angelis L (2008). Combining probabilistic models for explanatory productivity estimation, Information and Software Technology, 50:7-8, (656-669), Online publication date: 1-Jun-2008.
  203. Yu Y and Zhou Z A framework for modeling positive class expansion with single snapshot Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining, (429-440)
  204. Kum H, Duncan D and Stewart C Supporting self-evaluation in local government via KDD Proceedings of the 2008 international conference on Digital government research, (225-233)
  205. ACM
    Chen S and Liu X (2008). An Integrated Approach for Modeling Learning Patterns of Students in Web-Based Instruction, ACM Transactions on Computer-Human Interaction, 15:1, (1-28), Online publication date: 1-May-2008.
  206. Stankovski V, Swain M, Kravtsov V, Niessen T, Wegener D, Kindermann J and Dubitzky W (2008). Grid-enabling data mining applications with DataMiningGrid, Future Generation Computer Systems, 24:4, (259-279), Online publication date: 1-Apr-2008.
  207. Badawy S, Elragal A and Gabr M Multivariate similarity-based conformity measure (MSCM) Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications, (314-320)
  208. Chaovalitwongse W and Pardalos P (2008). On the time series support vector machine using dynamic time warping kernel for brain activity classification, Cybernetics and Systems Analysis, 44:1, (125-138), Online publication date: 1-Jan-2008.
  209. Domenech J and Lorenzo J A tool for web usage mining Proceedings of the 8th international conference on Intelligent data engineering and automated learning, (695-704)
  210. Domenech J and Lorenzo J A Tool for Web Usage Mining Intelligent Data Engineering and Automated Learning - IDEAL 2007, (695-704)
  211. Aronovich L and Spiegler I (2007). CM-tree, Data & Knowledge Engineering, 63:3, (919-946), Online publication date: 1-Dec-2007.
  212. Newsam S and Yang Y Geographic image retrieval using interest point descriptors Proceedings of the 3rd international conference on Advances in visual computing - Volume Part II, (275-286)
  213. Newsam S and Yang Y Geographic Image Retrieval Using Interest Point Descriptors Advances in Visual Computing, (275-286)
  214. ACM
    Newsam S and Yang Y Comparing global and interest point descriptors for similarity retrieval in remote sensed imagery Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems, (1-8)
  215. Pichl L and Narita J Readability factors of Japanese text classification Proceedings of the 5th international conference on Databases in networked information systems, (132-138)
  216. Pichl L and Narita J Readability Factors of Japanese Text Classification Databases in Networked Information Systems, (132-138)
  217. Frota R, Barreto G and Mota J (2007). Anomaly detection in mobile communication networks using the self-organizing map, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, 18:5, (493-500), Online publication date: 1-Oct-2007.
  218. Bhatti R and Grandison T Towards improved privacy policy coverage in healthcare using policy refinement Proceedings of the 4th VLDB conference on Secure data management, (158-173)
  219. ACM
    Kraetzer C, Oermann A, Dittmann J and Lang A Digital audio forensics Proceedings of the 9th workshop on Multimedia & security, (63-74)
  220. Paula A, Ávila B, Scalabrin E and Enembreck F Using Distributed Data Mining and Distributed Artificial Intelligence for Knowledge Integration Proceedings of the 11th international workshop on Cooperative Information Agents XI, (89-103)
  221. Petrosino A and Staiano A A neuro-fuzzy approach for sensor network data cleaning Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III, (140-147)
  222. Paetz J Subrule analysis and the frequency-confidence diagram Proceedings of the 7th international conference on Intelligent data analysis, (219-228)
  223. Hyvönen S, Gionis A and Mannila H Recurrent predictive models for sequence segmentation Proceedings of the 7th international conference on Intelligent data analysis, (195-206)
  224. Tutore V, Siciliano R and Aria M Conditional classification trees using instrumental variables Proceedings of the 7th international conference on Intelligent data analysis, (163-173)
  225. Tzikas D, Kukar M and Likas A Transductive reliability estimation for kernel based classifiers Proceedings of the 7th international conference on Intelligent data analysis, (37-47)
  226. Beck J and Chang K Identifiability Proceedings of the 11th international conference on User Modeling, (137-146)
  227. Sleeman D, Fluck N, Gyftodimos E, Moss L and Christie G An Intelligent Aide for Interpreting a Patient's Dialysis Data Set Proceedings of the 11th conference on Artificial Intelligence in Medicine, (57-66)
  228. Tang Y, Zhang Y and Huang Z (2007). Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4:3, (365-381), Online publication date: 1-Jul-2007.
  229. Sugiyama K and Okumura M TITPI Proceedings of the 4th International Workshop on Semantic Evaluations, (318-321)
  230. Stuart K and Majewski M Selected Problems of Knowledge Discovery Using Artificial Neural Networks Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, Part III, (1049-1057)
  231. Tatti N (2007). Distances between Data Sets Based on Summary Statistics, The Journal of Machine Learning Research, 8, (131-154), Online publication date: 1-May-2007.
  232. Better M, Glover F and Laguna M (2007). Advances in analytics, IBM Journal of Research and Development, 51:3, (477-487), Online publication date: 1-May-2007.
  233. Li J and Cui X Application of fuzzy clustering in financial analysis of logistic companies Proceedings of the 11th WSEAS International Conference on Applied Mathematics, (168-173)
  234. ACM
    Gionis A, Mannila H and Tsaparas P (2007). Clustering aggregation, ACM Transactions on Knowledge Discovery from Data, 1:1, (4-es), Online publication date: 1-Mar-2007.
  235. Pulkkinen P and Koivisto H (2007). Identification of interpretable and accurate fuzzy classifiers and function estimators with hybrid methods, Applied Soft Computing, 7:2, (520-533), Online publication date: 1-Mar-2007.
  236. d'Aquin M, Badra F, Lafrogne S, Lieber J, Napoli A and Szathmary L Case base mining for adaptation knowledge acquisition Proceedings of the 20th international joint conference on Artifical intelligence, (750-755)
  237. Paula A, Avila B, Scalabrin E and Enembreck F Multiagent-Based Model Integration Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology, (11-14)
  238. Baxter R Finding robust models using a stratified design Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence, (1064-1068)
  239. Kolter J and Maloof M (2006). Learning to Detect and Classify Malicious Executables in the Wild, The Journal of Machine Learning Research, 7, (2721-2744), Online publication date: 1-Dec-2006.
  240. Ekdahl M and Koski T (2006). Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation, The Journal of Machine Learning Research, 7, (2449-2480), Online publication date: 1-Dec-2006.
  241. ACM
    Calders T, Lakshmanan L, Ng R and Paredaens J (2006). Expressive power of an algebra for data mining, ACM Transactions on Database Systems, 31:4, (1169-1214), Online publication date: 1-Dec-2006.
  242. Kim B, Johnson P and Baker J Empirical evaluation of a visual interface for exploring message boards Proceedings of the Second international conference on Advances in Visual Computing - Volume Part I, (293-302)
  243. Lieber J, Napoli A, Szathmary L and Toussaint Y First elements on knowledge discovery guided by domain knowledge (KDDK) Proceedings of the 4th international conference on Concept lattices and their applications, (22-41)
  244. Lieber J, Napoli A, Szathmary L and Toussaint Y First Elements on Knowledge Discovery Guided by Domain Knowledge (KDDK) Concept Lattices and Their Applications, (22-41)
  245. Banek M, Jurić D, Pejaković I and Skočir Z Distributed architecture for association rule mining Proceedings of the 4th international conference on Advances in Information Systems, (237-246)
  246. Hurtado C and Levene M Discovering context-topic rules in search engine logs Proceedings of the 13th international conference on String Processing and Information Retrieval, (346-353)
  247. Rasinen A, Hollmén J and Mannila H Analysis of linux evolution using aligned source code segments Proceedings of the 9th international conference on Discovery Science, (209-218)
  248. Jiang T and Tuzhilin A (2006). Segmenting Customers from Population to Individuals, IEEE Transactions on Knowledge and Data Engineering, 18:10, (1297-1311), Online publication date: 1-Oct-2006.
  249. Srihari S, Ball G and Srinivasan H Versatile search of scanned Arabic handwriting Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition, (57-69)
  250. Hirsch M, Tucker A, Swift S, Martin N, Orengo C, Kellam P and Liu X Improved robustness in time series analysis of gene expression data by polynomial model based clustering Proceedings of the Second international conference on Computational Life Sciences, (1-10)
  251. Džeroski S Towards a general framework for data mining Proceedings of the 5th international conference on Knowledge discovery in inductive databases, (259-300)
  252. Tožička J, Jakob M and Pěchouček M Market-Inspired approach to collaborative learning Proceedings of the 10th international conference on Cooperative Information Agents, (213-227)
  253. Zhang Z and Hand D (2006). Detecting groups of anomalously similar objects in large data sets, Intelligent Data Analysis, 10:5, (473-483), Online publication date: 1-Sep-2006.
  254. ACM
    Zhuge H The open and autonomous interconnection semantics Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet, (105-115)
  255. ACM
    Wu W and Hsiao M Mining global constraints for improving bounded sequential equivalence checking Proceedings of the 43rd annual Design Automation Conference, (743-748)
  256. Kaur H, Wasan S, Al-Hegami A and Bhatnagar V A unified approach for discovery of interesting association rules in medical databases Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining, (53-63)
  257. Shalizi C, Camperi M and Klinkner K Discovering functional communities in dynamical networks Proceedings of the 2006 conference on Statistical network analysis, (140-157)
  258. ACM
    Deshpande A and Madden S MauveDB Proceedings of the 2006 ACM SIGMOD international conference on Management of data, (73-84)
  259. Saitta S, Raphael B and Smith I Combining two data mining methods for system identification Proceedings of the 13th international conference on Intelligent Computing in Engineering and Architecture, (606-614)
  260. ACM
    DeBarr D and Eyler-Walker Z (2006). Closing the gap, ACM SIGKDD Explorations Newsletter, 8:1, (11-16), Online publication date: 1-Jun-2006.
  261. Wang H (2006). Nearest Neighbors by Neighborhood Counting, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28:6, (942-953), Online publication date: 1-Jun-2006.
  262. Mielikäinen T (2006). Frequency-based views to pattern collections, Discrete Applied Mathematics, 154:7, (1113-1139), Online publication date: 1-May-2006.
  263. Kelly D (2006). A Study of Design Characteristics in Evolving Software Using Stability as a Criterion, IEEE Transactions on Software Engineering, 32:5, (315-329), Online publication date: 1-May-2006.
  264. Xu Z and Song B A machine learning application for human resource data mining problem Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, (847-856)
  265. Nguyen H Approximate boolean reasoning Transactions on Rough Sets V, (334-506)
  266. Galushka M, Patterson D and Rooney N Temporal data mining for smart homes Designing Smart Homes, (85-108)
  267. Zhang D and Simoff S Informing the curious negotiator Data Mining, (176-191)
  268. Achuthan N, Gopalan R and Rudra A Mining value-based item packages – an integer programming approach Data Mining, (78-89)
  269. Galloway J and Simoff S Network data mining Proceedings of the 3rd Asia-Pacific conference on Conceptual modelling - Volume 53, (21-32)
  270. Chang J and Lee W (2005). Efficient mining method for retrieving sequential patterns over online data streams, Journal of Information Science, 31:5, (420-432), Online publication date: 1-Oct-2005.
  271. ACM
    Adomavicius G and Tuzhilin A (2005). Personalization technologies, Communications of the ACM, 48:10, (83-90), Online publication date: 1-Oct-2005.
  272. ACM
    Lajnef M, Ayed M and Kolski C Convergence possible des processus du data mining et de conception-évaluation d'IHM Proceedings of the 17th Conference on l'Interaction Homme-Machine, (243-246)
  273. Peterson L and Coleman M Comparison of gene identification based on artificial neural network pre-processing with k-means cluster and principal component analysis Proceedings of the 6th international conference on Fuzzy Logic and Applications, (267-276)
  274. De Falco I, Della Cioppa A and Tarantino E Evaluation of particle swarm optimization effectiveness in classification Proceedings of the 6th international conference on Fuzzy Logic and Applications, (164-171)
  275. Podolak I, Biel S and Bobrowski M Hierarchical classifier Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics, (591-598)
  276. Kim W, Lee H, Yoo S and Baik S Neural network based adult image classification Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I, (481-486)
  277. Zhang Z and Hand D Detecting groups of anomalously similar objects in large data sets Proceedings of the 6th international conference on Advances in Intelligent Data Analysis, (509-519)
  278. Ly L, Rinderle S, Dadam P and Reichert M Mining staff assignment rules from event-based data Proceedings of the Third international conference on Business Process Management, (177-190)
  279. ACM
    Kalos A and Rey T Data mining in the chemical industry Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, (763-769)
  280. Liang X Mathematical analysis of classifying convex clusters based on support functionals Proceedings of the First international conference on Advanced Data Mining and Applications, (761-768)
  281. Liu P, El-Darzi E, Lei L, Vasilakis C, Chountas P and Huang W An analysis of missing data treatment methods and their application to health care dataset Proceedings of the First international conference on Advanced Data Mining and Applications, (583-590)
  282. Aggelis V and Christodoulakis D Customer clustering using RFM analysis Proceedings of the 9th WSEAS International Conference on Computers, (1-5)
  283. Bichindaritz I and Akkineni S Concept mining for indexing medical literature Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition, (682-691)
  284. Shimizu K and Miura T Disjunctive sequential patterns on single data sequence and its anti-monotonicity Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition, (376-383)
  285. Minetou C Grouping Users' Communities in an Interactive Web-Based Learning System Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies, (474-475)
  286. Flanagan J Unsupervised clustering of context data and learning user requirements for a mobile device Proceedings of the 5th international conference on Modeling and Using Context, (155-168)
  287. Thabet N Understanding the thematic structure of the Qur'an Proceedings of the ACL Student Research Workshop, (7-12)
  288. ACM
    Stephens C, Waelbroeck H and Talley S Predicting healthcare costs using GAs Proceedings of the 7th annual workshop on Genetic and evolutionary computation, (159-163)
  289. ACM
    Mladenič D Challenges and creativity in IT research Proceedings of the international symposium on Women and ICT: creating global transformation, (7-es)
  290. Nurmi P, Przybilski M, Lindén G and Floréen P An architecture for distributed agent-based data preprocessing Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining, (123-133)
  291. ACM
    Gaber M, Zaslavsky A and Krishnaswamy S (2005). Mining data streams, ACM SIGMOD Record, 34:2, (18-26), Online publication date: 1-Jun-2005.
  292. Knobbe A Multi-Relational Data Mining Proceedings of the 2005 conference on Multi-Relational Data Mining, (1-118)
  293. Kim W, Lee H, Park J and Yoon K Multi class adult image classification using neural networks Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence, (222-226)
  294. Thelwall M (2005). Text characteristics of English language university Web sites, Journal of the American Society for Information Science and Technology, 56:6, (609-619), Online publication date: 1-Apr-2005.
  295. ACM
    De Falco I, Tarantino E, Cioppa A and Fontanella F A novel grammar-based genetic programming approach to clustering Proceedings of the 2005 ACM symposium on Applied computing, (928-932)
  296. ACM
    Eliassi-Rad T and Critchlow T A hybrid approach for multiresolution modeling of large-scale scientific data Proceedings of the 2005 ACM symposium on Applied computing, (511-518)
  297. Trifonova A and Ronchetti M Hoarding Content in M-Learning Context Proceedings of the Third IEEE International Conference on Pervasive Computing and Communications Workshops, (327-331)
  298. Aggelis V and Anagnostou P e-banking prediction using data mining methods Proceedings of the 4th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering Data Bases, (1-6)
  299. Gosztolya G and Kocsor A (2005). A hierarchical evaluation methodology in speech recognition, Acta Cybernetica, 17:2, (213-224), Online publication date: 10-Jan-2005.
  300. Afrati F On approximation algorithms for data mining applications Efficient Approximation and Online Algorithms, (1-29)
  301. Das A, Gehrke J and Riedewald M (2005). Semantic Approximation of Data Stream Joins, IEEE Transactions on Knowledge and Data Engineering, 17:1, (44-59), Online publication date: 1-Jan-2005.
  302. Aupetit M and Catz T (2005). High-dimensional labeled data analysis with topology representing graphs, Neurocomputing, 63, (139-169), Online publication date: 1-Jan-2005.
  303. Mertik M and Zalar B Gaining features in medicine using multimethod data-mining powerful techniques Proceedings of the 4th WSEAS International Conference on Applied Informatics and Communications, (1-4)
  304. Garatti S, Savaresi S, Bittanti S and La Brocca L (2004). On the relationships between user profiles and navigation sessions in virtual communities: A data-mining approach, Intelligent Data Analysis, 8:6, (579-600), Online publication date: 1-Dec-2004.
  305. Christmann A and Steinwart I (2004). On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition, The Journal of Machine Learning Research, 5, (1007-1034), Online publication date: 1-Dec-2004.
  306. Vaidya J and Clifton C (2004). Privacy-Preserving Data Mining, IEEE Security and Privacy, 2:6, (19-27), Online publication date: 1-Nov-2004.
  307. Sinha A and May J (2004). Evaluating and Tuning Predictive Data Mining Models Using Receiver Operating Characteristic Curves, Journal of Management Information Systems, 21:3, (249-280), Online publication date: 1-Nov-2004.
  308. Lavrač N, Motoda H, Fawcett T, Holte R, Langley P and Adriaans P (2004). Introduction, Machine Language, 57:1-2, (13-34), Online publication date: 1-Oct-2004.
  309. Savaresi S and Boley D (2004). A comparative analysis on the bisecting K-means and the PDDP clustering algorithms, Intelligent Data Analysis, 8:4, (345-362), Online publication date: 1-Sep-2004.
  310. Dong X, Halevy A, Madhavan J, Nemes E and Zhang J Similarity search for web services Proceedings of the Thirtieth international conference on Very large data bases - Volume 30, (372-383)
  311. ACM
    Kolter J and Maloof M Learning to detect malicious executables in the wild Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, (470-478)
  312. Liu J, Li J, Xu W and Shi Y Data mining approach in scientific research organizations evaluation via clustering Proceedings of the 2004 Chinese academy of sciences conference on Data Mining and Knowledge Management, (128-134)
  313. ACM
    Zadrozny B Learning and evaluating classifiers under sample selection bias Proceedings of the twenty-first international conference on Machine learning
  314. ACM
    Wang C and Parthasarathy S Parallel algorithms for mining frequent structural motifs in scientific data Proceedings of the 18th annual international conference on Supercomputing, (31-40)
  315. Connelly R (2004). Introducing data mining, Journal of Computing Sciences in Colleges, 19:5, (87-96), Online publication date: 1-May-2004.
  316. Mladenic D and Grobelnik M Visualizing very large graphs using clustering neighborhoods Proceedings of the 2004 international conference on Local Pattern Detection, (89-97)
  317. Bonchi F, Giannotti F and Pedreschi D A relational query primitive for constraint-based pattern mining Proceedings of the 2004 European conference on Constraint-Based Mining and Inductive Databases, (14-37)
  318. Groth D An evaluation of a rule-based language for classification queries Proceedings of the 15th international conference on Applications of Declarative Programming and Knowledge Management, and 18th international conference on Workshop on Logic Programming, (79-97)
  319. Padmanabhan B and Tuzhilin A (2003). On the Use of Optimization for Data Mining, Management Science, 49:10, (1327-1343), Online publication date: 1-Oct-2003.
  320. Kleinberg J (2003). Bursty and Hierarchical Structure in Streams, Data Mining and Knowledge Discovery, 7:4, (373-397), Online publication date: 1-Oct-2003.
  321. ACM
    Elnahrawy E and Nath B Cleaning and querying noisy sensors Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications, (78-87)
  322. ACM
    Ali K and Ketchpel S Golden Path Analyzer Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, (349-358)
  323. ACM
    Bolton R and Adams N An iterative hypothesis-testing strategy for pattern discovery Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, (49-58)
  324. Bunke H Graph-based tools for data mining and machine learning Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition, (7-19)
  325. Kunttu I, Lepistö L, Rauhamaa J and Visa A Binary co-occurrence matrix in image database indexing Proceedings of the 13th Scandinavian conference on Image analysis, (1090-1097)
  326. ACM
    Keppens J and Zeleznikow J A model based reasoning approach for generating plausible crime scenarios from evidence Proceedings of the 9th international conference on Artificial intelligence and law, (51-59)
  327. Lashkia G and Anthony L Learning by discovering conflicts Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence, (492-497)
  328. Jiang L and Hamilton H Methods for mining frequent sequential patterns Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence, (486-491)
  329. ACM
    Das A, Gehrke J and Riedewald M Approximate join processing over data streams Proceedings of the 2003 ACM SIGMOD international conference on Management of data, (40-51)
  330. Mărginean F Computational science and data mining Proceedings of the 2003 international conference on Computational science: PartIII, (644-651)
  331. Bogg P Pattern based approaches to pre-processing structured text Proceedings of the 2003 international conference on Computational science, (859-867)
  332. Mărginean F Computational aspects of data mining Proceedings of the 2003 international conference on Computational science and its applications: PartI, (614-622)
  333. Kum H, Duncan D, Flair K and Wang W Social welfare program administration and evaluation and policy analysis using knowledge discovery and data mining (KDD) on administrative data Proceedings of the 2003 annual national conference on Digital government research, (1-6)
  334. Last M, Shapira B, Elovici Y, Zaafrany O and Kandel A Content-based methodology for anomaly detection on the web Proceedings of the 1st international Atlantic web intelligence conference on Advances in web intelligence, (113-123)
  335. Chawla S, Arunasalam B and Davis J Mining open source software (OSS) data using association rules network Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining, (461-466)
  336. Berthold M and Hand D References Intelligent data analysis, (475-500)
  337. Counsell S, Liu X, McFall J, Swift S and Tucker A (2002). Evolutionary algorithms for grouping high dimensional Email data, Intelligent Data Analysis, 6:6, (503-516), Online publication date: 1-Dec-2002.
  338. ACM
    De Raedt L (2002). A perspective on inductive databases, ACM SIGKDD Explorations Newsletter, 4:2, (69-77), Online publication date: 1-Dec-2002.
  339. ACM
    Li T, Li Q, Zhu S and Ogihara M (2002). A survey on wavelet applications in data mining, ACM SIGKDD Explorations Newsletter, 4:2, (49-68), Online publication date: 1-Dec-2002.
  340. Arotaritei D and Nürnberg P Data mining using links in open hypermedia Proceedings of the 2002 international conference on Metainformatics, (148-154)
  341. ACM
    Bradley P, Gehrke J, Ramakrishnan R and Srikant R (2002). Scaling mining algorithms to large databases, Communications of the ACM, 45:8, (38-43), Online publication date: 1-Aug-2002.
  342. ACM
    Fayyad U and Uthurusamy R (2002). Evolving data into mining solutions for insights, Communications of the ACM, 45:8, (28-31), Online publication date: 1-Aug-2002.
  343. ACM
    Kleinberg J Bursty and hierarchical structure in streams Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, (91-101)
  344. ACM
    Tan P, Kumar V and Srivastava J Selecting the right interestingness measure for association patterns Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, (32-41)
  345. Honda R, Wang S, Kikuchi T and Konishi O (2002). Mining of Moving Objects from Time-Series Images and its Application to Satellite Weather Imagery, Journal of Intelligent Information Systems, 19:1, (79-93), Online publication date: 1-Jul-2002.
  346. Kumar V, Joshi M, Han E, Tan P and Steinbach M High performance data mining Proceedings of the 5th international conference on High performance computing for computational science, (111-125)
  347. Hand D Statistics Handbook of data mining and knowledge discovery, (637-643)
  348. ACM
    Cheng J, Hatzis C, Hayashi H, Krogel M, Morishita S, Page D and Sese J (2002). KDD Cup 2001 report, ACM SIGKDD Explorations Newsletter, 3:2, (47-64), Online publication date: 1-Jan-2002.
  349. ACM
    Hand D (1999). Statistics and data mining, ACM SIGKDD Explorations Newsletter, 1:1, (16-19), Online publication date: 1-Jun-1999.
Contributors
  • Imperial College London
  • University of California, Irvine
  • Helsinki Institute for Information Technology

Recommendations

Edgar Weippl

This book is a comprehensive textbook on basic principles in data mining. Unlike many business-oriented books, the first part focuses on the mathematical foundations of data analysis. Classical approaches to exploring data, including principal component analysis and multi- dimensional scaling, are clearly and thoroughly explained (chapter 3). Relevant basics in statistics (maximum likelihood, Bayesian hypothesis testing) are covered in chapter 4. Chapter 5 provides an overview of fundamental data mining algorithms (CART, back propagation, a priori). The authors emphasize a clear distinction between models and patterns (chapter 6) and show how these structures can be fitted to the data (chapter 7). As the aforementioned topics indicate, the first part is focused on mathematics, whereas the second part mainly covers topics related to computer science. Chapter 8 presents various searching algorithms, a range including the very basic greedy search, branch-and-bound and improvements based on heuristics. I greatly appreciate the clear distinction between descriptive (chapter 9) and predictive (chapters 10 and 11) modeling. “Data Mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner” (p. 1) As this quotation indicates, data sets are generally large. The data, therefore, has to be stored in databases so that the analysis required can be performed efficiently. Essentials in data organization for online analytical processing (OLAP) and databases are presented in chapter 12. The chapter also covers topics relevant to performance in databases such as b-trees, hash indices and multidimensional indices. The penultimate chapter (chapter 13) deals with how to find patterns in large data sets. A brief overview (chapter 14) of retrieval by content (text retrieval, image retrieval, sequence retrieval) concludes the book. I enjoyed the book, and believe it is an excellent choice for courses on data mining. The target group is clearly identified: students (or other people) interested in theoretical background, and definitely not expecting merely high-level business-oriented buzzwords. The first twelve chapters gradually build up basic knowledge. The author indicates which chapters can be skipped, depending on the reader’s focus. In my opinion, chapter 13 and 14 do not support the very clear reasoning of the rest of the book; they seem more as add-ons, - presenting relevant information. The print is clear and easy to read; colors, however, would sometimes be useful to help navigate the text. The physical book is of high quality; the paper and binding make it a durable textbook. That said, the book’s support for lecturers could be improved. To the best of my knowledge, there is no Web site that provides slides or other supporting material. A list of sample implementations of the presented algorithms would also be useful so that lecturers would not have to search for good demonstration programs themselves. Nonetheless, the book is one of the best textbooks I have seen on data mining, and truly earns my unqualified support. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.