skip to main content
10.1145/3184558.3188723acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

Relevant Document Discovery for Fact-Checking Articles

Published:23 April 2018Publication History

ABSTRACT

With the support of major search platforms such as Google and Bing, fact-checking articles, which can be identified by their adoption of the schema.org ClaimReview structured markup, have gained widespread recognition for their role in the fight against digital misinformation. A claim-relevant document is an online document that addresses, and potentially expresses a stance towards, some claim. The claim-relevance discovery problem, then, is to find claim-relevant documents. Depending on the verdict from the fact check, claim-relevance discovery can help identify online misinformation. In this paper, we provide an initial approach to the claim-relevance discovery problem by leveraging various information retrieval and machine learning techniques. The system consists of three phases. First, we retrieve candidate documents based on various features in the fact-checking article. Second, we apply a relevance classifier to filter away documents that do not address the claim. Third, we apply a language feature based classifier to distinguish documents with different stances towards the claim. We experimentally demonstrate that our solution achieves solid results on a large-scale dataset and beats state-of-the-art baselines. Finally, we highlight a rich set of case studies to demonstrate the myriad of remaining challenges and that this problem is far from being solved.

References

  1. Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings ICLR 2017.Google ScholarGoogle Scholar
  2. Sean Baird, Doug Sibley, and Yuxi Pan. 2017. Talos Targets Disinformation with Fake News Challenge Victory. deftempurl%https://blog.talosintelligence.com/2017/06/talos-fake-news-challenge.html tempurlGoogle ScholarGoogle Scholar
  3. Roy Bar-Haim, Indrajit Bhattacharya, Francesco Dinuzzo, Amrita Saha, and Noam Slonim. 2017. Stance Classification of Context-Dependent Claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 1: Long Papers. 251--261.Google ScholarGoogle ScholarCross RefCross Ref
  4. Microsoft Bing. 2017. Bing adds Fact Check label in SERP to support the ClaimReview markup. https://blogs.bing.com/Webmaster-Blog/September-2017/Bing-adds-Fact-Check-label-in-SERP-to-support-the-ClaimReview-markupGoogle ScholarGoogle Scholar
  5. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 a. A large annotated corpus for learning natural language inference 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google ScholarGoogle Scholar
  6. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 b. A large annotated corpus for learning natural language inference Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17--21, 2015. 632--642. deftempurl%http://aclweb.org/anthology/D/D15/D15--1075.pdf tempurlGoogle ScholarGoogle Scholar
  7. Samuel R Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D Manning, and Christopher Potts. 2016. A Fast Unified Model for Parsing and Sentence Understanding ACL 2016.Google ScholarGoogle Scholar
  8. Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In ACL 2017.Google ScholarGoogle Scholar
  9. Nick Craswell and Martin Szummer. 2007. Random Walks on the Click Graph. In SIGIR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine The Annals of Statistics, Volume 29, Number 5 (2001), 1189--1232.Google ScholarGoogle Scholar
  11. FullFact. 2016. The State of Automated Factchecking.Google ScholarGoogle Scholar
  12. Richard Gingras. 2016. Labeling fact-check articles in Google News. https://www.blog.google/topics/journalism-news/labeling-fact-check-articles-google-news/Google ScholarGoogle Scholar
  13. Yichen Gong, Heng Luo, and Jian Zhang. 2017. Natural Language Inference over Interaction Space. In arXiv:1709.04348.Google ScholarGoogle Scholar
  14. Alan Greenblatt. 2017. The Future of Fact-Checking: Moving ahead in political accountability journalism. https://www.americanpressinstitute.org/publications/reports/white-papers/future-of-fact-checking/single-page/Google ScholarGoogle Scholar
  15. Naeemul Hassan, Bill Adair, James T. Hamilton, Chengkai Li, Mark Tremayne, Jun Yang, and Cong Yu. 2015. The Quest to Automate Fact-Checking. In Proceedings of the 2015 ComputationGoogle ScholarGoogle Scholar
  16. Journalism Symposium.Google ScholarGoogle Scholar
  17. IFCN. {n. d.}. International Fact-Checking Network fact-checkers' code of principles. https://www.poynter.org/international-fact-checking-network-fact-checkers-code-principlesGoogle ScholarGoogle Scholar
  18. Justin Kosslyn and Cong Yu. 2017. Fact Check now available in Google Search and News around the world. https://www.blog.google/products/search/fact-check-now-available-google-search-and-news-around-world/Google ScholarGoogle Scholar
  19. Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents ICML 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michelle Ye Hee Lee. 2017. Fighting falsehoods around the world: A dispatch on the growing global fact-checking movement. https://www.washingtonpost.com/news/fact-checker/wp/2017/07/14/fighting-falsehoods-around-the-world-a-dispatch-on-the-global-fact-checking-movement/Google ScholarGoogle Scholar
  21. Pablo N. Mendes, Max Jakob, Andres Garcia-Silva, and Christian Bizer. 2011. DBpedia Spotlight: Shedding Light on the Web of Documents Proceedings of the 7th International Conference on Semantic Systems (I-Semantics). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. 2010. Twitter Under Crisis: Can we trust what we RT. In Proceedings of the first workshop on social media analytics. ACM, 71--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 a. Efficient Estimation of Word Representations in Vector Space Proceedings of Workshop at ICLR, 2013.Google ScholarGoogle Scholar
  24. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 b. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiao-Dan Zhu, and Colin Cherry. 2016. SemEval-2016 Task 6: Detecting Stance in Tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, San Diego, CA, USA, June 16--17, 2016. 31--41. deftempurl%http://aclweb.org/anthology/S/S16/S16--1003.pdf tempurlGoogle ScholarGoogle ScholarCross RefCross Ref
  26. Ankur P. Parikh, Oscar T"ackström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference EMNLP.Google ScholarGoogle Scholar
  27. Dean Pomerleau and Delip Rao. {n. d.}. Fake News Challenge. deftempurl%http://www.fakenewschallenge.org/ tempurlGoogle ScholarGoogle Scholar
  28. Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying Misinformation in Microblogs Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27--31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL. 1589--1599. deftempurl%http://www.aclweb.org/anthology/D11--1147 tempurl Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dafna Shahaf and Carlos Guestrin. 2010. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 623--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shuohang Wang and Jing Jiang. 2016. Learning Natural Language Inference with LSTM. In HLT-NAACL.Google ScholarGoogle Scholar
  31. You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward Computational Fact-Checking. PVLDB Vol. 7, 7 (2014), 589--600. deftempurl%http://www.vldb.org/pvldb/vol7/p589-wu.pdf tempurl Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2017. Computational Fact Checking through Query Perturbations. ACM Trans. Database Syst. Vol. 42, 1 (2017), 4:1--4:41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2007. Truth discovery with multiple conflicting information providers on the web Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12--15, 2007. 1048--1052. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Relevant Document Discovery for Fact-Checking Articles

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '18: Companion Proceedings of the The Web Conference 2018
          April 2018
          2023 pages
          ISBN:9781450356404

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          International World Wide Web Conferences Steering Committee

          Republic and Canton of Geneva, Switzerland

          Publication History

          • Published: 23 April 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format