skip to main content
10.5555/1858681.1858694dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
research-article
Free Access

Open information extraction using Wikipedia

Published:11 July 2010Publication History

ABSTRACT

Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform?

This paper presents WOE, an open IE system which improves dramatically on TextRunner's precision and recall. The key to WOE's performance is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner, WOE's extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes: when restricted to POS tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.

References

  1. }}E. Agichtein and L. Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In ICDL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}Alan Akbik and Jügen Broß. 2009. Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In WWW Workshop.Google ScholarGoogle Scholar
  3. }}Sören Auer and Jens Lehmann. 2007. What have innsbruck and leipzig in common? extracting semantics from wiki content. In ESWC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. 2007. Open information extraction from the Web. In Procs. of IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}Razvan C. Bunescu and Raymond J. Mooney. 2005. Subsequence kernels for relation extraction. In NIPS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}R. Bunescu and R. Mooney. 2005. A shortest path dependency kernel for relation extraction. In HLT/EMNLP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}Eugene Charniak and Mark Johnson. 2005. Coarse-to-fine n-best parsing and maxent discriminative reranking. In ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. 1998. Learning to extract symbolic knowledge from the world wide web. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}Dmitry Davidov and Ari Rappoport. 2008. Unsupervised discovery of generic relationships using pattern clusters and its evaluation by automatically generated sat analogy questions. In ACL.Google ScholarGoogle Scholar
  10. }}Dmitry Davidov, Ari Rappoport, and Moshe Koppel. 2007. Fully unsupervised discovery of concept-specific relationships by web mining. In ACL.Google ScholarGoogle Scholar
  11. }}Marie-Catherine de Marneffe and Christopher D. Manning. 2008. Stanford typed dependencies manual. http://nlp.stanford.edu/downloads/lex-parser.shtml.Google ScholarGoogle Scholar
  12. }}Benjamin Van Durme and Lenhart K. Schubert. 2008. Open knowledge extraction using compositional language processing. In STEP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}R. Hoffmann, C. Zhang, and D. Weld. 2010. Learning 5000 relational extractors. In ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}Jing Jiang and ChengXiang Zhai. 2007. A systematic exploration of the feature space for relation extraction. In HLT/NAACL.Google ScholarGoogle Scholar
  15. }}A. Gangemi M. Ciaramita. 2005. Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}Andrew Kachites McCallum. 2002. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu.Google ScholarGoogle Scholar
  17. }}Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}T. H. Kotaro Nakayama and S. Nishio. 2008. Wikipedia link structure and text mining for semantic relation extraction. In CEUR Workshop.Google ScholarGoogle Scholar
  19. }}Dat P. T Nguyen, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS.Google ScholarGoogle Scholar
  20. }}Marius Pasca. 2008. Turning web text and search queries into factual knowledge: Hierarchical class attribute extraction. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}Fuchun Peng and Andrew McCallum. 2004. Accurate Information Extraction from Research Papers using Conditional Random Fields. In HLT-NAACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}Hoifung Poon and Pedro Domingos. 2008. Joint Inference in Information Extraction. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}Y. Shinyama and S. Sekine. 2006. Preemptive information extraction using unristricted relation discovery. In HLT-NAACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning syntactic patterns for automatic hypernym discovery. In NIPS.Google ScholarGoogle Scholar
  25. }}Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge - unifying WordNet and Wikipedia. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. }}Mengqiu Wang. 2008. A re-examination of dependency path kernels for relation extraction. In IJC-NLP.Google ScholarGoogle Scholar
  27. }}Fei Wu and Daniel Weld. 2007. Autonomouslly Semantifying Wikipedia. In CIKM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}Fei Wu, Raphael Hoffmann, and Danel S. Weld. 2008. Information extraction from Wikipedia: Moving down the long tail. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}Min Zhang, Jie Zhang, Jian Su, and Guodong Zhou. 2006. A composite kernel to extract relations between entities with both flat and structured features. In ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}Shubin Zhao and Ralph Grishman. 2005. Extracting relations with integrated information using kernel methods. In ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. 2009. Statsnowball: a statistical approach to extracting entity relationships. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Open information extraction using Wikipedia

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image DL Hosted proceedings
            ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
            July 2010
            1618 pages
            • Program Chair:
            • Jan Hajič

            Publisher

            Association for Computational Linguistics

            United States

            Publication History

            • Published: 11 July 2010

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate85of443submissions,19%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader