skip to main content
10.1145/952532.952761acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Ontology-focused crawling of Web documents

Authors Info & Claims
Published:09 March 2003Publication History

ABSTRACT

The Web, the largest unstructured database of the world, has greatly improved access to documents. However, documents on the Web are largely disorganized. Due to the distributed nature of the World Wide Web it is difficult to use it as a tool for information and knowledge management. Therefore, users doing the difficult task of exploring the Web have to be supported by intelligent means.This paper proposes an approach for document discovery building on a comprehensive framework for ontology-focused crawling of Web documents. Our framework includes means for using a complex ontology and associated instance elements. It defines several relevance computation strategies and provides an empirical evaluation which has shown promising results.

References

  1. C. C. Aggarwal, F. Al-Garawi, and P. Yu. Intelligent crawling on the world wide web with arbitrary predicates. In WWW-10, Hong Kong, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bergmark, C. Lagoze, and A. Sbityakov. Focused crawls, tunneling, and digital libraries. In ACM European Conference on Digital Libraries, Rome, September 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: a new approach to topic-specific web resource discovery. In WWW-8, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cho, H. García-Molina, and L. Page. Efficient crawling through URL ordering. Computer Networks and ISDN Systems, 30(1--7):161--172, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), Philadelphia, July 2002.Google ScholarGoogle Scholar
  6. M. Diligenti, F. Coetzee, S. Lawrence, C. L. Giles, and M. Gori. Focused Crawling using Context Graphs. In VLDB-00, 2000, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Ester and M. Gross. Ariadne: a focused crawler with adaptive classification of the hyperlinks. In Nat. Symp. on Machine Learning (FGML '2000), Birlinghoven, 2000.Google ScholarGoogle Scholar
  8. S. Handschuh, A. Maedche, and S. Staab. CREAM --- Creating relational metadata with a component-based, ontology driven framework. In SWWS'01, Stanford, USA, August 2001.Google ScholarGoogle Scholar
  9. S. Handschuh, A. Maedche, L. Stojanovic, and R. Volz. KAON - The KArlsruhe ONtology and Semantic Web Infrastructure. Technical report, Forschungszentrum Informatik Karlsruhe, 2001. http://kaon.semanticweb.org.Google ScholarGoogle Scholar
  10. G. Neumann, R. Backofen, J. Baur, M. Becker, and C. Braun. An information extraction core system for real world german text processing. In ANLP-97, Washington, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Rennie and A. McCallum. Using Reinforcement Learning to Spider the Web Efficiently. In ICML-99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Salton. Automatic Text Processing. Add.-Wesley, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SAC '03: Proceedings of the 2003 ACM symposium on Applied computing
    March 2003
    1268 pages
    ISBN:1581136242
    DOI:10.1145/952532

    Copyright © 2003 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 9 March 2003

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate1,650of6,669submissions,25%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader