skip to main content
10.5555/1873781.1873838dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
research-article
Free Access

FactRank: random walks on a web of facts

Published:23 August 2010Publication History

ABSTRACT

Fact collections are mostly built using semi-supervised relation extraction techniques and wisdom of the crowds methods, rendering them inherently noisy. In this paper, we propose to validate the resulting facts by leveraging global constraints inherent in large fact collections, observing that correct facts will tend to match their arguments with other facts more often than with incorrect ones. We model this intuition as a graph-ranking problem over a fact graph and explore novel random walk algorithms. We present an empirical study, over a large set of facts extracted from a 500 million document webcrawl, validating the model and showing that it improves fact quality over state-of-the-art methods.

References

  1. {Agichtein and Gravano 2000} Agichtein, Eugene and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In DL-00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {Auer et al. 2008} Auer, S., C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. 2008. Dbpedia: A nucleus for a web of open data. In ISWC+ASWC 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {Banko and Etzioni 2008} Banko, Michele and Oren Etzioni. 2008. The tradeoffs between open and traditional relation extraction. In ACL-08.Google ScholarGoogle Scholar
  4. {Banko et al. 2007} Banko, Michele, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of IJCAI-07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {Cafarella et al. 2007a} Cafarella, Michael, Dan Suciu, and Oren Etzioni. 2007a. Navigating extracted data with schema discovery. In Proceedings of WWW-07.Google ScholarGoogle Scholar
  6. {Cafarella et al. 2007b} Cafarella, Michael J., Christopher Re, Dan Suciu, Oren Etzioni, and Michele Banko. 2007b. Structured querying of web text: A technical challenge. In Proceedings of CIDR-07.Google ScholarGoogle Scholar
  7. {Cohen and McCallum 2003} Cohen, William and Andrew McCallum. 2003. Information extraction from the World Wide Web (tutorial). In KDD.Google ScholarGoogle Scholar
  8. {Davidov and Rappoport 2008} Davidov, Dmitry and Ari Rappoport. 2008. Unsupervised discovery of generic relationships using pattern clusters and its evaluation by automatically generated sat analogy questions. In ACL-08.Google ScholarGoogle Scholar
  9. {Downey et al. 2005} Downey, Doug, Oren Etzioni, and Stephen Soderland. 2005. A probabilistic model of redundancy in information extraction. In Proceedings of IJCAI-05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {Erkan and Radev 2004} Erkan, Güneş and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. JAIR, 22:457--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {Etzioni et al. 2004} Etzioni, Oren, Michael J. Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2004. Web-scale information extraction in KnowItAll. In Proceedings of WWW-04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {Etzioni et al. 2005} Etzioni, Oren, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell., 165:91--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {Hassan et al. 2007} Hassan, Samer, Rada Mihalcea, and Carmen Banea. 2007. Random-walk term weighting for improved text classification. ICSC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {Hearst 1992} Hearst, Marti A. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of COLING-92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {Kleinberg 1999} Kleinberg, Jon Michael. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {Lenat 1995} Lenat, Douglas B. 1995. Cyc: a large-scale investment in knowledge infrastructure. Commun. ACM, 38(11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {Liu and Yang 2008} Liu, Nathan and Qiang Yang. 2008. Eigenrank: a ranking-oriented approach to collaborative filtering. In SIGIR 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {Matuszek et al. 2005} Matuszek, Cynthia, Michael Witbrock, Robert C. Kahlert, John Cabral, Dave Schneider, Purvesh Shah, and Doug Lenat. 2005. Searching for common sense: Populating cyc from the web. In AAAI-05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {Mintz et al. 2009} Mintz, Mike, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In ACL-09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {Paşca et al. 2006} Paşca, Marius, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. 2006. Organizing and searching the world wide web of facts - step one: The one-million fact extraction challenge. In Proceedings of AAAI-06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {Page et al. 1999} Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the Web. Technical Report 1999/66, Stanford University, Computer Science Department.Google ScholarGoogle Scholar
  22. {Pantel and Pennacchiotti 2006} Pantel, Patrick and Marco Pennacchiotti. 2006. Espresso: leveraging generic patterns for automatically harvesting semantic relations. In ACL/COLING-06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {Pantel et al. 2004} Pantel, Patrick, Deepak Ravichandran, and Eduard Hovy. 2004. Towards terascale knowledge acquisition. In COLING-04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. {Pantel et al. 2009} Pantel, Patrick, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, and Vishnu Vyas. 2009. Web-scale distributional similarity and entity set expansion. In EMNLP-09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {Ravichandran and Hovy 2002} Ravichandran, Deepak and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of ACL-08, pages 41--47. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {Riloff and Jones 1999} Riloff, Ellen and Rosie Jones. 1999. Learning dictionaries for information extraction by multilevel bootstrapping. In Proceedings of AAAI-99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {Talukdar et al. 2008} Talukdar, Partha Pratim, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. 2008. Weakly-supervised acquisition of labeled class instances using graph random walks. In Proceedings of EMNLP-08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {Yan et al. 2009} Yan, Yulan, Yutaka Matsuo, Zhenglu Yang, and Mitsuru Ishizuka. 2009. Unsupervised relation extraction by mining wikipedia texts with support from web corpus. In ACL-09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. FactRank: random walks on a web of facts

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image DL Hosted proceedings
      COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics
      August 2010
      1408 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      • Published: 23 August 2010

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,537of1,537submissions,100%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader