Information extraction from unstructured web text

January 2007

Author:
Ana-Maria Popescu
University of Washington
,
Adviser:
Oren Etzioni
University of Washington

Publisher:

University of Washington
Computer Science Dept. Fr-35 112 Sieg Hall Seattle, WA
United States

Order Number:AAI3252883

Pages:

152

Purchase on ProQuest

Bibliometrics

Abstract

In the past few years the World Wide Web has emerged as an important source of data, much of it in the form of unstructured text. This thesis describes an extensible model for information extraction that takes advantage of the unique characteristics of Web text and leverages existent search engine technology in order to ensure the quality of the extracted information. The key features of our approach are the use of lexico-syntactic patterns, Web-scale statistics and unsupervised or semi-supervised learning methods. Our information extraction model has been instantiated and extended in order to solve a set of diverse information extraction tasks: subclass and related class extraction, relation property learning, the acquisition of salient product features and corresponding user opinions from customer reviews and finally, the mining of commonsense information from the Web for the benefit of integrated AI systems.

Cited By

Contributors

Oren Willi Etzioni
University of Washington
- Publication Years1989 - 2023
- Publication counts132
- Citation count6,911
- Available for Download54
- Downloads (cumulative)87,162
- Downloads (12 months)13,077
- Downloads (6 weeks)704
- Average Downloads per Article1,614
- Average Citation per Article52
View Full Profile
Ana Maria Popescu
Pinterest Inc.
- Publication Years2003 - 2013
- Publication counts29
- Citation count2,043
- Available for Download22
- Downloads (cumulative)24,855
- Downloads (12 months)504
- Downloads (6 weeks)65
- Average Downloads per Article1,130
- Average Citation per Article70
View Full Profile

Index Terms

Information extraction from unstructured web text
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
    1. Learning settings
2. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. Previous work has exploited reference sets to aid such extraction, but ...
Read More
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

We develop a novel framework that aims at automatically adapting previously learned information extraction knowledge from a source Web site to a new unseen target site in the same domain. Two kinds of features related to the text fragments from the Web ...
Read More
A Template-Based Tibetan Web Text Information Extraction Method
ICINIS '11: Proceedings of the 2011 4th International Conference on Intelligent Networks and Intelligent Systems

In order to build a large Tibetan corpus, the researcher proposes a simple and effective method of text information extraction over Tibetan Web pages. Most web pages too much noise information unrelated to the content of the text, which makes it ...
Read More

Comments

Browse Theses

Sections

Cited By

Index Terms

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

A Template-Based Tibetan Web Text Information Extraction Method

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

Adapting Web information extraction knowledge via mining site-invariant and site-dependent features

A Template-Based Tibetan Web Text Information Extraction Method