research-article

Free Access

Open information extraction using Wikipedia

Authors:
Fei Wu

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
Daniel S. Weld

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

Authors Info & Claims

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational LinguisticsJuly 2010Pages 118–127

Published:11 July 2010Publication History

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Pages 118–127

ABSTRACT

Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform?

This paper presents WOE, an open IE system which improves dramatically on TextRunner's precision and recall. The key to WOE's performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.

References

}}E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In ICDL. Google ScholarDigital Library
}}Alan Akbik and Jügen Broß. 2009. Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In WWW Workshop.Google Scholar
}}Sören Auer and Jens Lehmann. 2007. What have innsbruck and leipzig in common? extracting semantics from wiki content. In ESWC. Google ScholarDigital Library
}}M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the Web. In Procs. of IJCAI. Google ScholarDigital Library
}}Razvan C. Bunescu and Raymond J. Mooney. 2005. Subsequence kernels for relation extraction. In NIPS.Google ScholarDigital Library
}}R. Bunescu and R. Mooney. 2005. A shortest path dependency kernel for relation extraction. In HLT/EMNLP. Google ScholarDigital Library
}}Eugene Charniak and Mark Johnson. 2005. Coarse-to-fine n-best parsing and maxent discriminative reranking. In ACL. Google ScholarDigital Library
}}M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. 1998. Learning to extract symbolic knowledge from the world wide web. In AAAI. Google ScholarDigital Library
}}Dmitry Davidov and Ari Rappoport. 2008. Unsupervised discovery of generic relationships using pattern clusters and its evaluation by automatically generated sat analogy questions. In ACL.Google Scholar
}}Dmitry Davidov, Ari Rappoport, and Moshe Koppel. 2007. Fully unsupervised discovery of concept-specific relationships by web mining. In ACL.Google Scholar
}}Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford typed dependencies manual. http://nlp.stanford.edu/downloads/lex-parser.shtml.Google Scholar
}}Benjamin Van Durme and Lenhart K. Schubert. 2008. Open knowledge extraction using compositional language processing. In STEP. Google ScholarDigital Library
}}R. Hoffmann, C. Zhang, and D. Weld. 2010. Learning 5000 relational extractors. In ACL. Google ScholarDigital Library
}}Jing Jiang and ChengXiang Zhai. 2007. A systematic exploration of the feature space for relation extraction. In HLT/NAACL.Google Scholar
}}A. Gangemi M. Ciaramita. 2005. Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In IJCAI. Google ScholarDigital Library
}}Andrew Kachites McCallum. 2002. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu.Google Scholar
}}Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP. Google ScholarDigital Library
}}T. H. Kotaro Nakayama and S. Nishio. 2008. Wikipedia link structure and text mining for semantic relation extraction. In CEUR Workshop.Google Scholar
}}Dat P. T Nguyen, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS.Google Scholar
}}Marius Pasca. 2008. Turning web text and search queries into factual knowledge: Hierarchical class attribute extraction. In AAAI. Google ScholarDigital Library
}}Fuchun Peng and Andrew McCallum. 2004. Accurate Information Extraction from Research Papers using Conditional Random Fields. In HLT-NAACL. Google ScholarDigital Library
}}Hoifung Poon and Pedro Domingos. 2008. Joint Inference in Information Extraction. In AAAI. Google ScholarDigital Library
}}Y. Shinyama and S. Sekine. 2006. Preemptive information extraction using unristricted relation discovery. In HLT-NAACL. Google ScholarDigital Library
}}Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In NIPS.Google Scholar
}}Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge - unifying WordNet and Wikipedia. In WWW. Google ScholarDigital Library
}}Mengqiu Wang. 2008. A re-examination of dependency path kernels for relation extraction. In IJC-NLP.Google Scholar
}}Fei Wu and Daniel Weld. 2007. Autonomouslly Semantifying Wikipedia. In CIKM. Google ScholarDigital Library
}}Fei Wu, Raphael Hoffmann, and Danel S. Weld. 2008. Information extraction from Wikipedia: Moving down the long tail. In KDD. Google ScholarDigital Library
}}Min Zhang, Jie Zhang, Jian Su, and Guodong Zhou. 2006. A composite kernel to extract relations between entities with both flat and structured features. In ACL. Google ScholarDigital Library
}}Shubin Zhao and Ralph Grishman. 2005. Extracting relations with integrated information using kernel methods. In ACL. Google ScholarDigital Library
}}Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. 2009. Statsnowball: a statistical approach to extracting entity relationships. In WWW. Google ScholarDigital Library

Index Terms

Open information extraction using Wikipedia

Recommendations

Relation extraction from wikipedia using subtree mining
AAAI'07: Proceedings of the 22nd national conference on Artificial intelligence - Volume 2

The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The first challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting ...
Read More
A weighting scheme for open information extraction
NAACL HLT '12: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop

We study the problem of extracting all possible relations among named entities from unstructured text, a task known as Open Information Extraction (Open IE). A state-of-the-art Open IE system consists of natural language processing tools to identify ...
Read More
Outclassing Wikipedia in open-domain information extraction: weakly-supervised acquisition of attributes over conceptual hierarchies
EACL '09: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics

A set of labeled classes of instances is extracted from text and linked into an existing conceptual hierarchy. Besides a significant increase in the coverage of the class labels assigned to individual instances, the resulting resource of labeled classes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
July 2010
1618 pages
Program Chair:
Jan Hajič
Charles University in Prague, Czech Republic
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 11 July 2010
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 102
  Total Citations
  View Citations
- 3,696
  Total Downloads
- Downloads (Last 12 months)169
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Open information extraction using Wikipedia

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relation extraction from wikipedia using subtree mining

A weighting scheme for open information extraction

Outclassing Wikipedia in open-domain information extraction: weakly-supervised acquisition of attributes over conceptual hierarchies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Open information extraction using Wikipedia

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relation extraction from wikipedia using subtree mining

A weighting scheme for open information extraction

Outclassing Wikipedia in open-domain information extraction: weakly-supervised acquisition of attributes over conceptual hierarchies

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media