skip to main content
Information extraction from unstructured web text
Publisher:
  • University of Washington
  • Computer Science Dept. Fr-35 112 Sieg Hall Seattle, WA
  • United States
Order Number:AAI3252883
Pages:
152
Bibliometrics
Skip Abstract Section
Abstract

In the past few years the World Wide Web has emerged as an important source of data, much of it in the form of unstructured text. This thesis describes an extensible model for information extraction that takes advantage of the unique characteristics of Web text and leverages existent search engine technology in order to ensure the quality of the extracted information. The key features of our approach are the use of lexico-syntactic patterns, Web-scale statistics and unsupervised or semi-supervised learning methods. Our information extraction model has been instantiated and extended in order to solve a set of diverse information extraction tasks: subclass and related class extraction, relation property learning, the acquisition of salient product features and corresponding user opinions from customer reviews and finally, the mining of commonsense information from the Web for the benefit of integrated AI systems.

Contributors
  • University of Washington
  • Pinterest Inc.

Recommendations