skip to main content
Skip header Section
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured DataJune 2006
Publisher:
  • Cambridge University Press
  • 40 W. 20 St. New York, NY
  • United States
ISBN:978-0-521-83657-9
Published:01 June 2006
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.

Cited By

  1. ACM
    Wenskovitch J, Dowling M and North C (2023). Toward Addressing Ambiguous Interactions and Inferring User Intent with Dimension Reduction and Clustering Combinations in Visual Analytics, ACM Transactions on Interactive Intelligent Systems, 14:1, (1-35), Online publication date: 31-Mar-2024.
  2. ACM
    Storey V, Lukyanenko R and Castellanos A (2023). Conceptual Modeling: Topics, Themes, and Technology Trends, ACM Computing Surveys, 55:14s, (1-38), Online publication date: 31-Dec-2024.
  3. ACM
    Lim J An Analysis of Child-Related Metaverse Perceptions through Text Mining Proceedings of the 2023 8th International Conference on Intelligent Information Technology, (112-116)
  4. Piriyakul I, Kunathikornkit S, Piriyakul M and Piriyakul R (2022). Facial Skincare Journey, International Journal of Business Intelligence Research, 13:1, (1-19), Online publication date: 1-Jan-2022.
  5. Schwaiger J, Hammerl T, Florian J and Leist S (2021). UR: SMART–A tool for analyzing social media content, Information Systems and e-Business Management, 19:4, (1275-1320), Online publication date: 1-Dec-2021.
  6. Thakur D, Singh J, Dhiman G, Shabaz M, Gera T and Wang L (2021). Identifying Major Research Areas and Minor Research Themes of Android Malware Analysis and Detection Field Using LSA, Complexity, 2021, Online publication date: 1-Jan-2021.
  7. Khan A and Ghosh S (2021). Student performance analysis and prediction in classroom learning: A review of educational data mining studies, Education and Information Technologies, 26:1, (205-240), Online publication date: 1-Jan-2021.
  8. Mithun S and Luo X Design and Evaluate the Factors for Flipped Classrooms for Data Management Courses 2020 IEEE Frontiers in Education Conference (FIE), (1-8)
  9. Ding Z, Yan C, Liu C, Ji J and Liu Y Short Text Processing for Analyzing User Portraits: A Dynamic Combination Artificial Neural Networks and Machine Learning – ICANN 2020, (733-745)
  10. Chantar H, Mafarja M, Alsawalqah H, Heidari A, Aljarah I and Faris H (2019). Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification, Neural Computing and Applications, 32:16, (12201-12220), Online publication date: 1-Aug-2020.
  11. Park M (2019). Understanding characteristics of semantic associations in health consumer generated knowledge representation in social media, Journal of the Association for Information Science and Technology, 70:11, (1210-1222), Online publication date: 6-Oct-2019.
  12. Alsmadi I and Hoon G (2019). Term weighting scheme for short-text classification: Twitter corpuses, Neural Computing and Applications, 31:8, (3819-3831), Online publication date: 1-Aug-2019.
  13. Shanavas N, Wang H, Lin Z and Hawe G Structure-Based Supervised Term Weighting and Regularization for Text Classification Natural Language Processing and Information Systems, (105-117)
  14. Kim S and Oh J (2018). Information science techniques for investigating research areas, The Journal of Supercomputing, 74:12, (6691-6718), Online publication date: 1-Dec-2018.
  15. Sohrabi B, Vanani I and Abedin E (2018). Human Resources Management and Information Systems Trend Analysis Using Text Clustering, International Journal of Human Capital and Information Technology Professionals, 9:3, (1-24), Online publication date: 1-Jul-2018.
  16. ACM
    Liao C, Xiao F, Wong J, Chiang I, Tsai Y, Liu C and Huang K Special Issue Proceedings of the 2nd International Conference on Medical and Health Informatics, (88-100)
  17. ACM
    Júnior J, Cappelli C, Revoredo K and Nunes V Text mining as a transparency enabler to support decision making in a people management process Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age, (1-8)
  18. ACM
    Jr S, Campos G, Tavares G, Igawa R, Jr M and Guido R (2018). Detection of Human, Legitimate Bot, and Malicious Bot in Online Social Networks Based on Wavelets, ACM Transactions on Multimedia Computing, Communications, and Applications, 14:1s, (1-17), Online publication date: 2-Apr-2018.
  19. Siering M, Deokar A and Janze C (2018). Disentangling consumer recommendations, Decision Support Systems, 107:C, (52-63), Online publication date: 1-Mar-2018.
  20. Carvalho J, Rosa H, Brogueira G and Batista F (2017). MISNIS, Expert Systems with Applications: An International Journal, 89:C, (374-388), Online publication date: 15-Dec-2017.
  21. Solorio-Fernndez S, Martnez-Trinidad J and Carrasco-Ochoa J (2017). A new Unsupervised Spectral Feature Selection Method for mixed data, Pattern Recognition, 72:C, (314-326), Online publication date: 1-Dec-2017.
  22. Winter K, Rinderle-Ma S, Grossmann W, Feinerer I and Ma Z Characterizing Regulatory Documents and Guidelines Based on Text Mining On the Move to Meaningful Internet Systems. OTM 2017 Conferences, (3-20)
  23. ACM
    Ayele W and Juell-Skielse G Social media analytics and internet of things Proceedings of the 1st International Conference on Internet of Things and Machine Learning, (1-11)
  24. ACM
    Spanos G, Angelis L and Toloudis D Assessment of Vulnerability Severity using Text Mining Proceedings of the 21st Pan-Hellenic Conference on Informatics, (1-6)
  25. Moreno I, Boldrini E, Moreda P and Rom-Ferri M (2017). DrugSemantics, Journal of Biomedical Informatics, 72:C, (8-22), Online publication date: 1-Aug-2017.
  26. Martins R, Kruiger J, Minghim R, Telea A and Kerren A MVN-reduce Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: Short Papers, (13-17)
  27. Walter L, Radauer A and Moehrle M (2017). The beauty of brimstone butterfly, Scientometrics, 111:1, (103-115), Online publication date: 1-Apr-2017.
  28. Nokhbeh Zaeem R, Manoharan M, Yang Y and Barber K (2017). Modeling and analysis of identity threat behaviors through text mining of identity theft stories, Computers and Security, 65:C, (50-63), Online publication date: 1-Mar-2017.
  29. Harikumar H, Nguyen T, Rana S, Gupta S, Kaimal R and Venkatesh S Extracting Key Challenges in Achieving Sobriety Through Shared Subspace Learning Advanced Data Mining and Applications, (420-433)
  30. Aldayel H and Azmi A (2016). Arabic tweets sentiment analysis - a hybrid scheme, Journal of Information Science, 42:6, (782-797), Online publication date: 1-Dec-2016.
  31. ACM
    De Lucia Castillo F, Brito J and Santos C Animated Words Clouds to View and Extract Knowledge from Textual Information Proceedings of the 22nd Brazilian Symposium on Multimedia and the Web, (127-134)
  32. Suominen A and Toivanen H (2016). Map of science with topic modeling, Journal of the Association for Information Science and Technology, 67:10, (2464-2476), Online publication date: 1-Oct-2016.
  33. Sundermann C, Domingues M, Conrado M and Rezende S (2016). Privileged contextual information for context-aware recommender systems, Expert Systems with Applications: An International Journal, 57:C, (139-158), Online publication date: 15-Sep-2016.
  34. Jovanovic P, Romero O and Abelló A A Unified View of Data-Intensive Flows in Business Intelligence Systems Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIX - Volume 10120, (66-107)
  35. Wang J (2016). Extracting significant pattern histories from timestamped texts using MapReduce, The Journal of Supercomputing, 72:8, (3236-3260), Online publication date: 1-Aug-2016.
  36. Perovšek M, Kranjc J, Erjavec T, Cestnik B and Lavrač N (2016). TextFlows, Science of Computer Programming, 121:C, (128-152), Online publication date: 1-Jun-2016.
  37. Sorato D, Goularte F, Nassar S and Fileto R Analysis of Methods and Tools for Relevant Words Recognition in Microblogs Proceedings of the XII Brazilian Symposium on Information Systems on Brazilian Symposium on Information Systems: Information Systems in the Cloud Computing Era - Volume 1, (345-352)
  38. Leng J and Jiang P (2016). A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm, Knowledge-Based Systems, 100:C, (188-199), Online publication date: 15-May-2016.
  39. ACM
    Wu B and Knoblock C Maximizing Correctness with Minimal User Effort to Learn Data Transformations Proceedings of the 21st International Conference on Intelligent User Interfaces, (375-384)
  40. ACM
    Chu V, Wong R, Chen F, Ho I and Lee J Market-sentiment boosted predictions on multi-type time-series Proceedings of the Australasian Computer Science Week Multiconference, (1-10)
  41. Tao J, Liao C, Zeng X and Li X (2015). Harvesting Design Knowledge From the Internet: High-Dimensional Performance Tradeoff Modeling for Large-Scale Analog Circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35:1, (23-36), Online publication date: 1-Jan-2016.
  42. Coussement K, Benoit D and Antioco M (2015). A Bayesian approach for incorporating expert opinions into decision support systems, Decision Support Systems, 79:C, (24-32), Online publication date: 1-Nov-2015.
  43. Sun J and Qu Z (2015). Understanding health information technology adoption, Information Systems Frontiers, 17:5, (1177-1190), Online publication date: 1-Oct-2015.
  44. Martie L and van der Hoek A Sameness Proceedings of the 12th Working Conference on Mining Software Repositories, (76-87)
  45. Lai C (2015). Applying knowledge flow mining to group recommendation methods for task-based groups, Journal of the Association for Information Science and Technology, 66:3, (545-563), Online publication date: 1-Mar-2015.
  46. Shigarov A (2015). Table understanding using a rule engine, Expert Systems with Applications: An International Journal, 42:2, (929-937), Online publication date: 1-Feb-2015.
  47. Shravankumar B and Ravi V Text Classification Using Ensemble Features Selection and Data Mining Techniques Swarm, Evolutionary, and Memetic Computing, (176-186)
  48. Rivera S, Minsker B, Work D and Roth D (2014). A text mining framework for advancing sustainability indicators, Environmental Modelling & Software, 62:C, (128-138), Online publication date: 1-Dec-2014.
  49. Smailović J, Grčar M, Lavrač N and Žnidaršič M (2014). Stream-based active learning for sentiment analysis in the financial domain, Information Sciences: an International Journal, 285:C, (181-203), Online publication date: 20-Nov-2014.
  50. Zuccala A, Someren M and Bellen M (2014). A machine-learning approach to coding book reviews as quality indicators, Journal of the Association for Information Science and Technology, 65:11, (2248-2260), Online publication date: 1-Nov-2014.
  51. ACM
    Fragos K and Skourlas C Toward Improving Classification of Real World Biomedical Articles Proceedings of the 18th Panhellenic Conference on Informatics, (1-3)
  52. Yau C, Porter A, Newman N and Suominen A (2014). Clustering scientific documents with topic modeling, Scientometrics, 100:3, (767-786), Online publication date: 1-Sep-2014.
  53. ACM
    Sinoara R, Sundermann C, Marcacini R, Domingues M and Rezende S Named entities as privileged information for hierarchical text clustering Proceedings of the 18th International Database Engineering & Applications Symposium, (57-66)
  54. ACM
    Tunalı V and Bilgin T Text mining and social network analysis on computer science and engineering theses in Turkey Proceedings of the 15th International Conference on Computer Systems and Technologies, (187-193)
  55. ACM
    Fan W and Gordon M (2014). The power of social media analytics, Communications of the ACM, 57:6, (74-81), Online publication date: 1-Jun-2014.
  56. An A, Dauletbakov B and Levner E Multi-attribute Classification of Text Documents as a Tool for Ranking and Categorization of Educational Innovation Projects Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8404, (404-416)
  57. ACM
    Cavalcanti E and Spohn M On the applicability of mobility metrics for user movement pattern recognition in MANETs Proceedings of the 11th ACM international symposium on Mobility management and wireless access, (123-130)
  58. Perovšek M, Cestnik B, UrbanăźIăź T, Colton S and Lavraăź N Towards Narrative Ideation via Cross-Context Link Discovery Using Banded Matrices Proceedings of the 12th International Symposium on Advances in Intelligent Data Analysis XII - Volume 8207, (333-344)
  59. Chiu T, Hong C and Chiu Y Exploring Technology Opportunities in an Industry via Clustering Method and Association Analysis Proceedings of the 5th International Conference on Computational Collective Intelligence. Technologies and Applications - Volume 8083, (593-602)
  60. Yu W and Luna R Exploring user feedback of a e-learning system Proceedings of the 15th international conference on Human Interface and the Management of Information: information and interaction for learning, culture, collaboration and business - Volume Part III, (182-191)
  61. ACM
    Curtotti M, McCreath E and Sridharan S Software tools for the visualization of definition networks in legal contracts Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law, (192-196)
  62. Babin M and Kuznetsov S (2013). Computing premises of a minimal cover of functional dependencies is intractable, Discrete Applied Mathematics, 161:6, (742-749), Online publication date: 1-Apr-2013.
  63. Rodriguez J, Crasso M and Zunino A (2013). An approach for web service discoverability anti-pattern detection for journal of web engineering, Journal of Web Engineering, 12:1-2, (131-158), Online publication date: 1-Feb-2013.
  64. Chen Y, Liu Y and Ho W (2013). A text mining approach to assist the general public in the retrieval of legal documents, Journal of the American Society for Information Science and Technology, 64:2, (280-290), Online publication date: 1-Feb-2013.
  65. ACM
    Garcia-Alvarado C and Ordonez C Query processing on cubes mapped from ontologies to dimension hierarchies Proceedings of the fifteenth international workshop on Data warehousing and OLAP, (57-64)
  66. ACM
    Tunali V and Bilgin T PRETO Proceedings of the 13th International Conference on Computer Systems and Technologies, (134-140)
  67. Ferneda E, do Prado H, Batista A and Pinheiro M Extracting definitions from brazilian legal texts Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III, (631-646)
  68. Wang D, Chen Y, Goldberg S, Grant C and Li K Automatic knowledge base construction using probabilistic extraction, deductive reasoning, and human feedback Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, (106-110)
  69. ACM
    Aiello L, Barrat A, Schifanella R, Cattuto C, Markines B and Menczer F (2012). Friendship prediction and homophily in social media, ACM Transactions on the Web, 6:2, (1-33), Online publication date: 1-May-2012.
  70. Armano G, Chira C and Hatami N Ensemble of binary learners for reliable text categorization with a reject option Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I, (137-146)
  71. Chiu T, Hong C and Chiu Y A proposed IPC-Based clustering and applied to technology strategy formulation Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part II, (62-72)
  72. Dinu L and Iuga I The naive bayes classifier in opinion mining Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I, (556-567)
  73. Leong C, Lee Y and Mak W (2012). Mining sentiments in SMS texts for teaching evaluation, Expert Systems with Applications: An International Journal, 39:3, (2584-2589), Online publication date: 1-Feb-2012.
  74. Juršič M, Cestnik B, Urbančič T and Lavrač N Bisociative literature mining by ensemble heuristics Bisociative Knowledge Discovery, (338-358)
  75. Juršič M, Sluban B, Cestnik B, Grčar M and Lavrač N Bridging concept identification for constructing information networks from text documents Bisociative Knowledge Discovery, (66-90)
  76. Massey L (2012). A Cognitive Framework for Core Language Understanding and its Computational Implementation, International Journal of Cognitive Informatics and Natural Intelligence, 6:1, (1-20), Online publication date: 1-Jan-2012.
  77. ACM
    Issertial L and Tsuji H Information extraction and ontology model for a 'call for paper' manager Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, (539-542)
  78. Dendamrongvit S, Vateekul P and Kubat M (2011). Irrelevant attributes and imbalanced classes in multi-label text-categorization domains, Intelligent Data Analysis, 15:6, (843-859), Online publication date: 1-Nov-2011.
  79. Church J and Motro A Which should we try first? ranking information resources through query classification Proceedings of the 9th international conference on Flexible Query Answering Systems, (364-375)
  80. Grcar M and Lavrac N A methodology for mining document-enriched heterogeneous information networks Proceedings of the 14th international conference on Discovery science, (107-121)
  81. ACM
    Prifti T, Banerjee S and Cukic B Detecting bug duplicate reports through local references Proceedings of the 7th International Conference on Predictive Models in Software Engineering, (1-9)
  82. Massey L Autonomous and adaptive identification of topics in unstructured text Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II, (1-10)
  83. Ammari A, Dimitrova V and Despotakis D Identifying relevant youtube comments to derive socially augmented user models Proceedings of the 19th international conference on Advances in User Modeling, (71-85)
  84. Saga R, Takamizawa S, Kitami K, Tsuji H and Matsumoto K Comparison analysis for text data by using FACT-graph Proceedings of the 1st international conference on Human interface and the management of information: interacting with information - Volume Part II, (75-83)
  85. Yun J, Jing L, Yu J and Huang H Unsupervised feature weighting based on local feature relatedness Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I, (38-49)
  86. Chiu T, Hong C and Chiu Y To propose strategic suggestions for companies via IPC classification and association analysis Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I, (218-227)
  87. ACM
    Largeron C, Moulin C and Géry M Entropy based feature selection for text categorization Proceedings of the 2011 ACM Symposium on Applied Computing, (924-928)
  88. ACM
    Wiesner M and Pfeifer D Adapting recommender systems to the requirements of personal health record systems Proceedings of the 1st ACM International Health Informatics Symposium, (410-414)
  89. Hogenboom A, Hogenboom F, Kaymak U, Wouters P and De Jong F Mining economic sentiment using argumentation structures Proceedings of the 2010 international conference on Advances in conceptual modeling: applications and challenges, (200-209)
  90. Yun J, Jing L, Yu J and Huang H Semantics-based representation model for multi-layer text classification Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II, (1-10)
  91. Alruily M, Ayesh A and Zedan H Automatically Constructing Dictionaries for Extracting Meaningful Crime Information from Arabic Text Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence, (1139-1140)
  92. ACM
    Denton A, Wu J and Dorr D Point-distribution algorithm for mining vector-item patterns Proceedings of the ACM SIGKDD Workshop on Useful Patterns, (36-44)
  93. Kolchinsky A, Abi-Haidar A, Kaur J, Hamed A and Rocha L (2010). Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7:3, (400-411), Online publication date: 1-Jul-2010.
  94. ACM
    Castellanos M, Wang S, Dayal U and Gupta C SIE-OBI Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, (1105-1110)
  95. Matos P, Lombardi L, Pardo T, Ciferri C, Vieira M and Ciferri R An environment for data analysis in biomedical domain Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I, (306-316)
  96. Kehagias D, Tzovaras D, Mavridou E, Kalogirou K and Becker M Implementing an open reference architecture based on web service mining for the integration of distributed applications and multi-agent systems Proceedings of the 6th international conference on Agents and data mining interaction, (162-177)
  97. Jae Yun Lee , Kim H and Pan Jun Kim (2010). Domain analysis with text mining, Journal of Information Science, 36:2, (144-161), Online publication date: 1-Apr-2010.
  98. ACM
    Castellanos M, Gupta C, Wang S and Dayal U Leveraging web streams for contractual situational awareness in operational BI Proceedings of the 2010 EDBT/ICDT Workshops, (1-8)
  99. ACM
    Ang W, Tan A, Seeto W, Tan F, Tang W and Kanagasabai R KnowleTracker Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services, (688-693)
  100. Lai C and Liu D (2009). Integrating knowledge flow mining and collaborative filtering to support document recommendation, Journal of Systems and Software, 82:12, (2023-2037), Online publication date: 1-Dec-2009.
  101. Esposito F, Biba M and Ferilli S Intelligent text processing techniques for textual-profile gene characterization Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics, (33-44)
  102. ACM
    Xu W, Huang L, Fox A, Patterson D and Jordan M Detecting large-scale system problems by mining console logs Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, (117-132)
  103. Lackes R, Bartels J, Berndt E and Frank E A word-frequency based method for detecting plagiarism in documents Proceedings of the 10th IEEE international conference on Information Reuse & Integration, (163-166)
  104. ACM
    Onkov K Effect of OCR-errors on the transformation of semi-structured text data into relational database Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data, (123-124)
  105. Witte R and Gitzinger T Semantic Assistants --- User-Centric Natural Language Processing Services for Desktop Clients Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web, (360-374)
  106. Witte R and Gitzinger T A General Architecture for Connecting NLP Frameworks and Desktop Clients Using Web Services Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems, (317-322)
  107. Kim H and Jae Yun Lee (2008). Exploring the emerging intellectual structure of archival studies using text mining, Journal of Information Science, 34:3, (356-369), Online publication date: 1-Jun-2008.
  108. Ku C, Iriberri A and Leroy G Natural language processing and e-Government Proceedings of the 2008 international conference on Digital government research, (162-170)
  109. ACM
    Witte R and Gitzinger T Connecting wikis and natural language processing systems Proceedings of the 2007 international symposium on Wikis, (165-176)
  110. Pichl L and Narita J Readability Factors of Japanese Text Classification Databases in Networked Information Systems, (132-138)
  111. Batista F and Carvalho J Text based classification of companies in CrunchBase 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-7)
  112. Ferreira C, de Medeiros D and Santana F FCFilter: Feature selection based on clustering and genetic algorithms 2016 IEEE Congress on Evolutionary Computation (CEC), (2106-2113)
Contributors
  • Hebrew University of Jerusalem

Recommendations