Article

Ontology-focused crawling of Web documents

Authors:
Marc Ehrig

University of Karslruhe, Karlsruhe, Germany

University of Karslruhe, Karlsruhe, Germany
View Profile

,
Alexander Maedche

FZI Research Center for Information, Technologies, Karlsruhe, Germany

FZI Research Center for Information, Technologies, Karlsruhe, Germany
View Profile

SAC '03: Proceedings of the 2003 ACM symposium on Applied computingMarch 2003Pages 1174–1178https://doi.org/10.1145/952532.952761

Published:09 March 2003Publication History

SAC '03: Proceedings of the 2003 ACM symposium on Applied computing

Pages 1174–1178

ABSTRACT

The Web, the largest unstructured database of the world, has greatly improved access to documents. However, documents on the Web are largely disorganized. Due to the distributed nature of the World Wide Web it is difficult to use it as a tool for information and knowledge management. Therefore, users doing the difficult task of exploring the Web have to be supported by intelligent means.This paper proposes an approach for document discovery building on a comprehensive framework for ontology-focused crawling of Web documents. Our framework includes means for using a complex ontology and associated instance elements. It defines several relevance computation strategies and provides an empirical evaluation which has shown promising results.

References

C. C. Aggarwal, F. Al-Garawi, and P. Yu. Intelligent crawling on the world wide web with arbitrary predicates. In WWW-10, Hong Kong, 2001. Google ScholarDigital Library
D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In ACM European Conference on Digital Libraries, Rome, September 2002. Google ScholarDigital Library
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In WWW-8, 1999. Google ScholarDigital Library
J. Cho, H. García-Molina, and L. Page. Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1--7):161--172, 1998. Google ScholarDigital Library
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), Philadelphia, July 2002.Google Scholar
M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused Crawling using Context Graphs. In VLDB-00, 2000, 2000. Google ScholarDigital Library
M. Ester and M. Gross. Ariadne: a focused crawler with adaptive classification of the hyperlinks. In Nat. Symp. on Machine Learning (FGML '2000), Birlinghoven, 2000.Google Scholar
S. Handschuh, A. Maedche, and S. Staab. CREAM --- Creating relational metadata with a component-based, ontology driven framework. In SWWS'01, Stanford, USA, August 2001.Google Scholar
S. Handschuh, A. Maedche, L. Stojanovic, and R. Volz. KAON - The KArlsruhe ONtology and Semantic Web Infrastructure. Technical report, Forschungszentrum Informatik Karlsruhe, 2001. http://kaon.semanticweb.org.Google Scholar
G. Neumann, R. Backofen, J. Baur, M. Becker, and C. Braun. An information extraction core system for real world german text processing. In ANLP-97, Washington, USA, 1997. Google ScholarDigital Library
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarCross Ref
J. Rennie and A. McCallum. Using Reinforcement Learning to Spider the Web Efficiently. In ICML-99, 1999. Google ScholarDigital Library
G. Salton. Automatic Text Processing. Add.-Wesley, 1988. Google ScholarDigital Library

Recommendations

Current challenges in web crawling
ICWE'13: Proceedings of the 13th international conference on Web Engineering

Web crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an ...
Read More
Efficient Topical Focused Crawling Through Neighborhood Feature
Abstract
A focused web crawler is an essential tool for gathering domain-specific data used by national web corpora, vertical search engines, and so on, since it is more efficient than general Breadth-First or Depth-First crawlers. The problem in focused ...
Read More
Focused crawling of tagged web resources using ontology

Scrutinizing web resources of interest from a large number of search results is a tedious task for any web user. Fortunately, social sites such as Social Bookmarking Site (SBS) allow web users to store their preferences and searched results of their ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '03: Proceedings of the 2003 ACM symposium on Applied computing
March 2003
1268 pages
ISBN:1581136242
DOI:10.1145/952532
Conference Chair:
Gary B. Lamont
Air Force Institute of Technology
,
Program Chairs:
Hisham Haddad
Kennesaw State University
,
George A. Papadopoulos
University of Cyprus, Cyprus
,
Publications Chair:
Brajendra Panda
University of Arkansas
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 March 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Semantic Web
Web Searching and Crawling
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 853
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ontology-focused crawling of Web documents

SAC '03: Proceedings of the 2003 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Recommendations

Current challenges in web crawling

Efficient Topical Focused Crawling Through Neighborhood Feature

Focused crawling of tagged web resources using ontology