skip to main content
Skip header Section
Mining the Web: Discovering Knowledge from HyperText DataAugust 2002
Publisher:
  • Science & Technology Books
ISBN:978-1-55860-754-5
Published:01 August 2002
Pages:
350
Skip Bibliometrics Section
Bibliometrics
Abstract

No abstract available.

Cited By

  1. Deng X, Ding H, Chen Y, Chen C, Lv T and Wu J (2020). Novel Node Centrality-Based Efficient Empirical Robustness Assessment for Directed Network, Complexity, 2020, Online publication date: 1-Jan-2020.
  2. Yoshida M, Matsumoto K and Kita K Modeling Relations Between Profiles and Texts Information Retrieval Technology, (103-109)
  3. ACM
    Gollapalli S and Li X Using PageRank for Characterizing Topic Quality in LDA Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval, (115-122)
  4. ACM
    Boldi P, Marino A, Santini M and Vigna S (2018). BUbiNG, ACM Transactions on the Web, 12:2, (1-26), Online publication date: 31-May-2018.
  5. ACM
    Carvalho N, Nielsen M and Rohman I A Decade of International Conference on Theory and Practice of Electronic Governance Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance, (230-235)
  6. Dehmer M, Emmert-Streib F and Shi Y (2017). Quantitative Graph Theory, Information Sciences: an International Journal, 418:C, (575-580), Online publication date: 1-Dec-2017.
  7. Goswami A and Kumar A (2017). Challenges in the Analysis of Online Social Networks, Wireless Personal Communications: An International Journal, 97:3, (4015-4061), Online publication date: 1-Dec-2017.
  8. Kejriwal M and Szekely P Information Extraction in Illicit Web Domains Proceedings of the 26th International Conference on World Wide Web, (997-1006)
  9. Jiang M, Cui P, Beutel A, Faloutsos C and Yang S (2016). Inferring lockstep behavior from connectivity pattern in large graphs, Knowledge and Information Systems, 48:2, (399-428), Online publication date: 1-Aug-2016.
  10. Shekhar S, Chakraborti S and Khemani D Spreading Activation Way of Knowledge Integration Proceedings of the Third International Conference on Mining Intelligence and Knowledge Exploration - Volume 9468, (1-11)
  11. ACM
    Gu Y, Yoo S, Piao Z, Lin Y, Yan J and Park J User Preference Analysis and Visualization through the Browser History of Smart Devices Proceedings of the 2015 International Conference on Big Data Applications and Services, (264-267)
  12. ACM
    Song J and Lee W High Recall-Low Cost Model for Patent Retrieval Proceedings of the 2015 International Conference on Big Data Applications and Services, (213-216)
  13. ACM
    Vakeel K and Dey S Improving tweet clustering using bigrams formed from word associations Proceedings of the 2015 Conference on research in adaptive and convergent systems, (108-113)
  14. ACM
    Shinde S and Tidke B Knowledge Discovery for research Documents using Improved K-means Technique Proceedings of the Sixth International Conference on Computer and Communication Technology 2015, (15-19)
  15. ACM
    Hwang W and Kim S (2015). Post ranking in a blogosphere, ACM SIGAPP Applied Computing Review, 15:1, (26-32), Online publication date: 27-Mar-2015.
  16. Lakshmi Praba V and Vasantha T (2014). Efficient hyperlink analysis using robust Proportionate Prestige Score in PageRank algorithm, Applied Soft Computing, 24:C, (86-94), Online publication date: 1-Nov-2014.
  17. ACM
    Hwang W and Kim S Post ranking in a blogosphere Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems, (18-22)
  18. Crabtree D, Gao X and Andreae P (2013). Query directed clustering, Knowledge and Information Systems, 36:3, (693-729), Online publication date: 1-Sep-2013.
  19. ACM
    Hodeghatta U Sentiment analysis of Hollywood movies on Twitter Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, (1401-1404)
  20. ACM
    Chakrabarti S, Ramakrishnan G, Ramamritham K, Sarawagi S and Sudarshan S (2013). Data-based research at IIT Bombay, ACM SIGMOD Record, 42:1, (38-43), Online publication date: 1-May-2013.
  21. ACM
    Zerr S, Siersdorfer S, Hare J and Demidova E Privacy-aware image classification and search Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, (35-44)
  22. ACM
    Speiser M, Antonini G, Labbi A and Sutanto J On nested palindromes in clickstream data Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, (1460-1468)
  23. Castro P and Xexéo G Granules of words to represent text Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV, (379-391)
  24. ACM
    Spirin N and Han J (2012). Survey on web spam detection, ACM SIGKDD Explorations Newsletter, 13:2, (50-64), Online publication date: 1-May-2012.
  25. ACM
    Danescu-Niculescu-Mizil C, Lee L, Pang B and Kleinberg J Echoes of power Proceedings of the 21st international conference on World Wide Web, (699-708)
  26. ACM
    Liu J, Yu C, Xu W and Shi Y Clustering web pages to facilitate revisitation on mobile devices Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, (249-252)
  27. Hu F, Ruan T, Shao Z and Ding J Automatic web information extraction based on rules Proceedings of the 12th international conference on Web information system engineering, (265-272)
  28. Dahotre A, Krishnamoorthy V, Corley M and Scaffidi C (2011). Using intelligent tutors to enhance student learning of application programming interfaces, Journal of Computing Sciences in Colleges, 27:1, (195-201), Online publication date: 1-Oct-2011.
  29. Gu Y and Yoo S Mining popular menu items of a restaurant from web reviews Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II, (242-250)
  30. Selamat A and Ahmadi-Abkenari F Architecture for a parallel focused crawler for clickstream analysis Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I, (27-35)
  31. ACM
    Park S, An D and Yoo H Document clustering using NMF and fuzzy relation Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication, (1-5)
  32. ACM
    Magdalinos P, Doulkeridis C and Vazirgiannis M (2011). Enhancing Clustering Quality through Landmark-Based Dimensionality Reduction, ACM Transactions on Knowledge Discovery from Data, 5:2, (1-44), Online publication date: 1-Feb-2011.
  33. ACM
    Singh A, Rose C, Visweswariah K, Chenthamarakshan V and Kambhatla N PROSPECT Proceedings of the 19th ACM international conference on Information and knowledge management, (659-668)
  34. Scholl P, Böhnstedt D, García R, Rensing C and Steinmetz R Extended explicit semantic analysis for calculating semantic relatedness of web resources Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice, (324-339)
  35. ACM
    Blanco R, Bortnikov E, Junqueira F, Lempel R, Telloli L and Zaragoza H Caching search engine results over incremental indices Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, (82-89)
  36. Yoshida M, Sato I, Nakagawa H and Terada A Mining numbers in text using suffix arrays and clustering based on dirichlet process mixture models Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II, (230-237)
  37. Magdalinos P, Vazirgiannis M and Valsamou D Distributed knowledge discovery with non linear dimensionality reduction Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II, (14-26)
  38. Nakagawa T, Inui K and Kurohashi S Dependency tree-based sentiment classification using CRFs with hidden variables Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, (786-794)
  39. ACM
    Siersdorfer S, Chelaru S, Nejdl W and San Pedro J How useful are your comments? Proceedings of the 19th international conference on World wide web, (891-900)
  40. Rebai B, Zacharewicz G, Reymond D and Corbe P AnCaraS: a new webometrics web-spider Proceedings of the 2010 Spring Simulation Multiconference, (1-8)
  41. ACM
    McCown F Teaching web information retrieval to undergraduates Proceedings of the 41st ACM technical symposium on Computer science education, (87-91)
  42. ACM
    Clough P and Pasley R Images and perceptions of neighbourhood extents Proceedings of the 6th Workshop on Geographic Information Retrieval, (1-2)
  43. Baeza-Yates R and Raghavan P Chapter 2 Search Computing, (11-23)
  44. ACM
    del Pilar Bautista Morales S, Fandiño H and Rodríguez J Hypertext classification to filtrate information on the web Proceedings of the 2009 Euro American Conference on Telematics and Information Systems: New Opportunities to increase Digital Citizenship, (1-7)
  45. ACM
    Spaniol M, Denev D, Mazeika A, Weikum G and Senellart P Data quality in web archiving Proceedings of the 3rd workshop on Information credibility on the web, (19-26)
  46. ACM
    San Pedro J and Siersdorfer S Ranking and classifying attractiveness of photos in folksonomies Proceedings of the 18th international conference on World wide web, (771-780)
  47. Hsieh C and Hung J (2009). Improving cache global consistency and hit ratio in dependency objects with semantic spatial locality correlations, WSEAS Transactions on Information Science and Applications, 6:4, (647-659), Online publication date: 1-Apr-2009.
  48. Scaffidi C, Myers B and Shaw M Fast, Accurate Creation of Data Validation Formats by End-User Developers Proceedings of the 2nd International Symposium on End-User Development - Volume 5435, (242-261)
  49. Nørvåg K and Fivelstad O Semantic-Based Temporal Text-Rule Mining Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing, (442-455)
  50. ACM
    Pereira Á, Baeza-Yates R, Ziviani N and Bisbal J A model for fast web mining prototyping Proceedings of the Second ACM International Conference on Web Search and Data Mining, (114-123)
  51. ACM
    Qi X and Davison B (2009). Web page classification, ACM Computing Surveys, 41:2, (1-31), Online publication date: 1-Feb-2009.
  52. Ratprasartporn N, Po J, Cakmak A, Bani-Ahmad S and Ozsoyoglu G (2009). Context-based literature digital collection search, The VLDB Journal — The International Journal on Very Large Data Bases, 18:1, (277-301), Online publication date: 1-Jan-2009.
  53. Hsieh C, Lin H, Hung S, Shen S and Yeh C Exploiting Spatial Locality for Objects Layout in Virtual Environments Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing, (850-854)
  54. ACM
    Li Y and Cunningham H (2008). Geometric and quantum methods for information retrieval, ACM SIGIR Forum, 42:2, (22-32), Online publication date: 30-Nov-2008.
  55. ACM
    Zhang D and Mao R Classifying networked entities with modularity kernels Proceedings of the 17th ACM conference on Information and knowledge management, (113-122)
  56. ACM
    Siersdorfer S and Sizov S (2008). Meta methods for model sharing in personal information systems, ACM Transactions on Information Systems, 26:4, (1-35), Online publication date: 1-Sep-2008.
  57. ACM
    Xue X, Zhou Z and Zhang Z (2008). Improving Web search using image snippets, ACM Transactions on Internet Technology, 8:4, (1-28), Online publication date: 1-Sep-2008.
  58. ACM
    Pereira D, Ribeiro-Neto B, Ziviani N and Laender A Using web information for creating publication venue authority files Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, (295-304)
  59. ACM
    Scaffidi C, Myers B and Shaw M Topes Proceedings of the 30th international conference on Software engineering, (1-10)
  60. ACM
    Scaffidi C and Shaw M Accommodating data heterogeneity in ULS systems Proceedings of the 2nd international workshop on Ultra-large-scale software-intensive systems, (15-18)
  61. ACM
    Scaffidi C, Myers B and Shaw M Toped CHI '08 Extended Abstracts on Human Factors in Computing Systems, (3519-3524)
  62. Ernandes M, Angelini G and Gori M (2008). A Web‐Based Agent Challenges Human Experts on Crosswords, AI Magazine, 29:1, (77-90), Online publication date: 1-Mar-2008.
  63. Sarawagi S (2008). Information Extraction, Foundations and Trends in Databases, 1:3, (261-377), Online publication date: 1-Mar-2008.
  64. Secker A, Freitas A and Timmis J (2008). AISIID, Applied Soft Computing, 8:2, (885-905), Online publication date: 1-Mar-2008.
  65. ACM
    Jindal N and Liu B Opinion spam and analysis Proceedings of the 2008 International Conference on Web Search and Data Mining, (219-230)
  66. Kudělka M, Snášel V, Lehečka O, El-Qawasmeh E and Pokorný J Web pages reordering and clustering based on web patterns Proceedings of the 34th conference on Current trends in theory and practice of computer science, (731-742)
  67. Yoshida M, Nakagawa H and Terada A Gram-free synonym extraction via suffix arrays Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, (276-285)
  68. Álvarez M, Pan A, Raposo J, Bellas F and Cacheda F Finding and extracting data records from web pages Proceedings of the 2007 international conference on Embedded and ubiquitous computing, (466-478)
  69. Domenech J and Lorenzo J A tool for web usage mining Proceedings of the 8th international conference on Intelligent data engineering and automated learning, (695-704)
  70. Álvarez M, Pan A, Raposo J, Bellas F and Cacheda F Using clustering and edit distance techniques for automatic web data extraction Proceedings of the 8th international conference on Web information systems engineering, (212-224)
  71. Dehmer M and Emmert-Streib F (2007). Structural similarity of directed universal hierarchical graphs, Applied Mathematics and Computation, 194:1, (7-20), Online publication date: 1-Dec-2007.
  72. ACM
    Altingovde I, Ozcan R, Cetintas S, Yilmaz H and Ulusoy Ö An automatic approach to construct domain-specific web portals Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, (849-852)
  73. ACM
    Lempel R, Mass Y, Ofek-Koifman S, Sheinwald D, Petruschka Y and Sivan R Just in time indexing for up to the second search Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, (97-106)
  74. Kocibova J, Klos K, Lehecka O, Kudelka M and Snasel V Web Page Analysis Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops, (221-225)
  75. Parreira J, Michel S, Bender M, Crecelius T and Weikum G P2P authority analysis for social communities Proceedings of the 33rd international conference on Very large data bases, (1398-1401)
  76. Mateos M and Figuerola C Architecture of an Hybrid System for Experimentation on Web Information Retrieval Incorporating Clustering Techniques Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference, (427-434)
  77. Kim K, Kang M and Choi Y A site-ranking algorithm for a small group of sites Proceedings of the 2007 international conference on Computational science and Its applications - Volume Part II, (397-405)
  78. ACM
    Tong H, Faloutsos C, Gallagher B and Eliassi-Rad T Fast best-effort pattern matching in large attributed graphs Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, (737-746)
  79. ACM
    Manku G, Jain A and Das Sarma A Detecting near-duplicates for web crawling Proceedings of the 16th international conference on World Wide Web, (141-150)
  80. ACM
    Roberson S and Dicheva D Semi-automatic ontology extraction to create draft topic maps Proceedings of the 45th annual southeast regional conference, (100-105)
  81. Meyer P, Kontos K and Bontempi G Biological network inference using redundancy analysis Proceedings of the 1st international conference on Bioinformatics research and development, (16-27)
  82. Pivk A, Cimiano P, Sure Y, Gams M, Rajkovič V and Studer R (2007). Transforming arbitrary tables into logical form with TARTAR, Data & Knowledge Engineering, 60:3, (567-595), Online publication date: 1-Mar-2007.
  83. Hung S and Liu D (2007). Efficient reduction of access latency through object correlations in virtual environments, EURASIP Journal on Advances in Signal Processing, 2007:1, (178-178), Online publication date: 1-Jan-2007.
  84. Ríos S, Velásquez J, Yasuda H and Aoki T (2006). A hybrid system for concept-based web usage mining, International Journal of Hybrid Intelligent Systems, 3:4, (219-235), Online publication date: 1-Dec-2006.
  85. Laskov P, Gehl C, Krüger S and Müller K (2006). Incremental Support Vector Learning: Analysis, Implementation and Applications, The Journal of Machine Learning Research, 7, (1909-1936), Online publication date: 1-Dec-2006.
  86. Shaban K, Basir O and Kamel M Document mining based on semantic understanding of text Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications, (834-843)
  87. ACM
    Qi X and Davison B Knowing a web page by the company it keeps Proceedings of the 15th ACM international conference on Information and knowledge management, (228-237)
  88. Rachakonda A and Srinivasa S Incremental aggregation of latent semantics using a graph-based energy model Proceedings of the 13th international conference on String Processing and Information Retrieval, (354-359)
  89. Nørvåg K, Eriksen T and Skogstad K Mining association rules in temporal document collections Proceedings of the 16th international conference on Foundations of Intelligent Systems, (745-754)
  90. Chau R, Yeh C and Smith-Miles K Fuzzy-neuro web-based multilingual knowledge management Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery, (1229-1238)
  91. Parreira J, Donato D, Michel S and Weikum G Efficient and decentralized PageRank approximation in a peer-to-peer web search network Proceedings of the 32nd international conference on Very large data bases, (415-426)
  92. ACM
    Rigou M, Sirmakessis S and Tzimas G A method for personalized clustering in data intensive web applications Proceedings of the joint international workshop on Adaptivity, personalization & the semantic web, (35-40)
  93. ACM
    Angelova R and Weikum G Graph-based text classification Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, (485-492)
  94. ACM
    Kleinberg J Social networks, incentives, and search Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, (210-211)
  95. Messina E, Toscani D and Archetti F UP-DRES Proceedings of the 6th Industrial Conference on Data Mining conference on Advances in Data Mining: applications in Medicine, Web Mining, Marketing, Image and Signal Mining, (146-160)
  96. Cheung W, Zhang X, Wong H, Liu J, Luo Z and Tong F (2006). Service-Oriented Distributed Data Mining, IEEE Internet Computing, 10:4, (44-54), Online publication date: 1-Jul-2006.
  97. Dai W, Yu Y, Zhang C, Han J and Xue G A novel web page categorization algorithm based on block propagation using query-log information Proceedings of the 7th international conference on Advances in Web-Age Information Management, (435-446)
  98. Baeza-Yates R Algorithmic challenges in web search engines Proceedings of the 5th international conference on Experimental Algorithms, (277-278)
  99. ACM
    Matsuo Y, Mori J, Hamasaki M, Ishida K, Nishimura T, Takeda H, Hasida K and Ishizuka M POLYPHONET Proceedings of the 15th international conference on World Wide Web, (397-406)
  100. ACM
    da Costa Carvalho A, Chirita P, de Moura E, Calado P and Nejdl W Site level noise removal for search engines Proceedings of the 15th international conference on World Wide Web, (73-82)
  101. Newman D, Chemudugunta C, Smyth P and Steyvers M Analyzing entities and topics in news articles using statistical topic models Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics, (93-104)
  102. Baroni M and Kilgarriff A Large linguistically-processed web corpora for multiple languages Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations, (87-90)
  103. Rayson P, Walkerdine J, Fletcher W and Kilgarriff A Annotated web as corpus Proceedings of the 2nd International Workshop on Web as Corpus, (27-33)
  104. Jatowt A and Ishizuka M (2006). Temporal multi-page summarization, Web Intelligence and Agent Systems, 4:2, (163-180), Online publication date: 1-Apr-2006.
  105. Baeza-Yates R Algorithmic challenges in web search engines Proceedings of the 7th Latin American conference on Theoretical Informatics, (1-7)
  106. ACM
    Sidiropoulos A and Manolopoulos Y (2005). A citation-based system to assist prize awarding, ACM SIGMOD Record, 34:4, (54-60), Online publication date: 1-Dec-2005.
  107. Chen J, Li Q and Jia W (2005). Automatically Generating an E-textbook on the Web, World Wide Web, 8:4, (377-394), Online publication date: 1-Dec-2005.
  108. Michel S, Triantafillou P and Weikum G MINERVA∞ Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware, (60-81)
  109. Papapetrou O, Michel S, Bender M and Weikum G On the usage of global document occurrences in peer-to-peer information systems Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I, (310-328)
  110. Escudeiro N and Jorge A Semi-automatic creation and maintenance of web resources with webtopic Proceedings of the 2005 joint international conference on Semantics, Web and Mining, (82-102)
  111. Saerens M and Fouss F HITS is Principal Components Analysis Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, (782-785)
  112. Sung L, Kuo C, Chen M and Sun Y Progressive Analysis Scheme for Web Document Classification Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, (606-609)
  113. Almpanidis G, Kotropoulos C and Pitas I Focused crawling using latent semantic indexing – an application for vertical search engines Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries, (402-413)
  114. Bender M, Michel S, Triantafillou P, Weikum G and Zimmer C MINERVA Proceedings of the 31st international conference on Very large data bases, (1263-1266)
  115. Gibson D, Kumar R and Tomkins A Discovering large dense subgraphs in massive graphs Proceedings of the 31st international conference on Very large data bases, (721-732)
  116. Almpanidis G and Kotropoulos C Combining text and link analysis for focused crawling Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I, (278-287)
  117. Zhang J, Ishikawa Y, Kurokawa S and Kitagawa H LocalRank Proceedings of the 16th international conference on Database and Expert Systems Applications, (145-155)
  118. ACM
    Bender M, Michel S, Triantafillou P, Weikum G and Zimmer C Improving collection selection with overlap awareness in P2P search engines Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, (67-74)
  119. Chen J, Li Q and Feng L Refining the results of automatic e-textbook construction by clustering Proceedings of the 4th international conference on Advances in Web-Based Learning, (311-319)
  120. Stolz C, Viermetz M, Skubacz M and Neuneier R Improving semantic consistency of web sites by quantifying user intent Proceedings of the 5th international conference on Web Engineering, (308-317)
  121. ACM
    Mooney R and Bunescu R (2005). Mining knowledge from text using information extraction, ACM SIGKDD Explorations Newsletter, 7:1, (3-10), Online publication date: 1-Jun-2005.
  122. ACM
    Wu B and Davison B Identifying link farm spam pages Special interest tracks and posters of the 14th international conference on World Wide Web, (820-829)
  123. ACM
    Zhai Y and Liu B Web data extraction based on partial tree alignment Proceedings of the 14th international conference on World Wide Web, (76-85)
  124. Choi Y, Kim K and Kang M A focused crawling for the web resource discovery using a modified proximal support vector machines Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I, (186-194)
  125. Pal S, Narayan B and Dutta S (2005). A Web Surfer Model Incorporating Topic Continuity, IEEE Transactions on Knowledge and Data Engineering, 17:5, (726-729), Online publication date: 1-May-2005.
  126. Hartmann J, Stojanovic N, Studer R and Schmidt-Thieme L Ontology-Based query refinement for semantic portals From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments, (41-50)
  127. ACM
    Liu B and Chen-Chuan-Chang K (2004). Editorial, ACM SIGKDD Explorations Newsletter, 6:2, (1-4), Online publication date: 1-Dec-2004.
  128. Altingovde I and Ulusoy O (2004). Exploiting Interclass Rules for Focused Crawling, IEEE Intelligent Systems, 19:6, (66-73), Online publication date: 1-Nov-2004.
  129. Narayan B and Pal S A Report of Activities at the WIC-India Research Center Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, (793-794)
  130. Hung S, Kuo T and Liu D An Efficient Mining and Clustering Algorithm for Interactive Walk-Through Traversal Patterns Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, (356-362)
  131. Bonato A A survey of models of the web graph Proceedings of the First international conference on Combinatorial and Algorithmic Aspects of Networking, (159-172)
  132. ACM
    Silvestri F, Orlando S and Perego R Assigning identifiers to documents to enhance the clustering property of fulltext indexes Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, (305-312)
  133. ACM
    Harada M, Sato S and Kazama K Finding authoritative people from the web Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, (306-313)
  134. ACM
    Menczer F Combining link and content analysis to estimate semantic similarity Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, (452-453)
  135. ACM
    Gedov V, Stolz C, Neuneier R, Skubacz M and Seipel D Matching web site structure and content Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, (286-287)
  136. Hartmann J and Sure Y (2004). An Infrastructure for Scalable, Reliable Semantic Portals, IEEE Intelligent Systems, 19:3, (58-65), Online publication date: 1-May-2004.
  137. Yoshida M and Nakagawa H Specification retrieval – how to find attribute-value information on the web Proceedings of the First international joint conference on Natural Language Processing, (338-347)
  138. ACM
    Silvestri F, Perego R and Orlando S Assigning document identifiers to enhance compressibility of Web Search Engines indexes Proceedings of the 2004 ACM symposium on Applied computing, (600-605)
  139. Baeza-Yates R Web mining in search engines Proceedings of the 27th Australasian conference on Computer science - Volume 26, (3-4)
  140. ACM
    Yi L, Liu B and Li X Eliminating noisy information in Web pages for data mining Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, (296-305)
  141. ACM
    Domingos P (2003). Prospects and challenges for multi-relational data mining, ACM SIGKDD Explorations Newsletter, 5:1, (80-83), Online publication date: 1-Jul-2003.
  142. Young J and Dean T Exploiting locality in searching the web Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence, (608-615)
Contributors
  • Indian Institute of Technology Bombay

Recommendations

Dimitrios Katsaros

This provides a good source of information for a young and significant research area: mining the Web's content and structure. A prominent researcher in that community wrote the book. Three parts make up the book. The first part contains introductory material on Web crawlers, as well as fundamental concepts from the field of information retrieval. The second part, which makes up the bulk of the book, focuses on machine learning techniques for hypertext. Essentially, it contains methods appropriate for finding statistical relations between attributes extracted from Web documents. Finally, the third part contains a collection of applications that draw on the techniques discussed in earlier chapters. The main strength of the book is in the second part, and the first chapter of the third part. The three chapters of the second part, which are devoted to clustering and to supervised and semi-supervised learning, manage to provide a thorough presentation of the issues and solutions they cover. The explanations provided, and the language used, reveal the author's deep understanding of the field. I really enjoyed reading the first chapter of the third part, which addresses algorithms for mining the Web's link structure. The author achieves a unified treatment of the presented methods (hits, page rank, and so on), and provides clear and documented arguments for each method's shortcomings and benefits. As for negative aspects in this book, the omission of a Web usage mining area should be mentioned. Although this area has borrowed some techniques from the more traditional areas of the data mining field (for which excellent books already exist), it has made some significant contributions of its own to the general area of data mining, developed to deal with its own particularities. In terms of physical presentation, a partly unsuitable characteristic of the book is the paper-like appearance of some chapters in the third part. These contain many graphs and performance measurements, which are not customarily included in textbooks. Overall, I believe this book is a significant contribution to the literature on data mining, and is a must read for anyone who is involved, as a researcher or practitioner, in the practice of extracting interesting patterns from the Web. I undoubtedly recommend it. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.