Relevant Document Discovery for Fact-Checking Articles

Authors:
Xuezhi Wang

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

,
Cong Yu

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

,
Simon Baumgartner

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

,
Flip Korn

Google Research, New York, NY, USA

Google Research, New York, NY, USA
View Profile

WWW '18: Companion Proceedings of the The Web Conference 2018April 2018Pages 525–533https://doi.org/10.1145/3184558.3188723

Published:23 April 2018Publication History

WWW '18: Companion Proceedings of the The Web Conference 2018

Pages 525–533

ABSTRACT

With the support of major search platforms such as Google and Bing, fact-checking articles, which can be identified by their adoption of the schema.org ClaimReview structured markup, have gained widespread recognition for their role in the fight against digital misinformation. A claim-relevant document is an online document that addresses, and potentially expresses a stance towards, some claim. The claim-relevance discovery problem, then, is to find claim-relevant documents. Depending on the verdict from the fact check, claim-relevance discovery can help identify online misinformation. In this paper, we provide an initial approach to the claim-relevance discovery problem by leveraging various information retrieval and machine learning techniques. The system consists of three phases. First, we retrieve candidate documents based on various features in the fact-checking article. Second, we apply a relevance classifier to filter away documents that do not address the claim. Third, we apply a language feature based classifier to distinguish documents with different stances towards the claim. We experimentally demonstrate that our solution achieves solid results on a large-scale dataset and beats state-of-the-art baselines. Finally, we highlight a rich set of case studies to demonstrate the myriad of remaining challenges and that this problem is far from being solved.

References

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings ICLR 2017.Google Scholar
Sean Baird, Doug Sibley, and Yuxi Pan. 2017. Talos Targets Disinformation with Fake News Challenge Victory. deftempurl%https://blog.talosintelligence.com/2017/06/talos-fake-news-challenge.html tempurlGoogle Scholar
Roy Bar-Haim, Indrajit Bhattacharya, Francesco Dinuzzo, Amrita Saha, and Noam Slonim. 2017. Stance Classification of Context-Dependent Claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 1: Long Papers. 251--261.Google ScholarCross Ref
Microsoft Bing. 2017. Bing adds Fact Check label in SERP to support the ClaimReview markup. https://blogs.bing.com/Webmaster-Blog/September-2017/Bing-adds-Fact-Check-label-in-SERP-to-support-the-ClaimReview-markupGoogle Scholar
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 a. A large annotated corpus for learning natural language inference 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 b. A large annotated corpus for learning natural language inference Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17--21, 2015. 632--642. deftempurl%http://aclweb.org/anthology/D/D15/D15--1075.pdf tempurlGoogle Scholar
Samuel R Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D Manning, and Christopher Potts. 2016. A Fast Unified Model for Parsing and Sentence Understanding ACL 2016.Google Scholar
Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In ACL 2017.Google Scholar
Nick Craswell and Martin Szummer. 2007. Random Walks on the Click Graph. In SIGIR. Google ScholarDigital Library
Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine The Annals of Statistics, Volume 29, Number 5 (2001), 1189--1232.Google Scholar
FullFact. 2016. The State of Automated Factchecking.Google Scholar
Richard Gingras. 2016. Labeling fact-check articles in Google News. https://www.blog.google/topics/journalism-news/labeling-fact-check-articles-google-news/Google Scholar
Yichen Gong, Heng Luo, and Jian Zhang. 2017. Natural Language Inference over Interaction Space. In arXiv:1709.04348.Google Scholar
Alan Greenblatt. 2017. The Future of Fact-Checking: Moving ahead in political accountability journalism. https://www.americanpressinstitute.org/publications/reports/white-papers/future-of-fact-checking/single-page/Google Scholar
Naeemul Hassan, Bill Adair, James T. Hamilton, Chengkai Li, Mark Tremayne, Jun Yang, and Cong Yu. 2015. The Quest to Automate Fact-Checking. In Proceedings of the 2015 ComputationGoogle Scholar
Journalism Symposium.Google Scholar
IFCN. {n. d.}. International Fact-Checking Network fact-checkers' code of principles. https://www.poynter.org/international-fact-checking-network-fact-checkers-code-principlesGoogle Scholar
Justin Kosslyn and Cong Yu. 2017. Fact Check now available in Google Search and News around the world. https://www.blog.google/products/search/fact-check-now-available-google-search-and-news-around-world/Google Scholar
Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents ICML 2014. Google ScholarDigital Library
Michelle Ye Hee Lee. 2017. Fighting falsehoods around the world: A dispatch on the growing global fact-checking movement. https://www.washingtonpost.com/news/fact-checker/wp/2017/07/14/fighting-falsehoods-around-the-world-a-dispatch-on-the-global-fact-checking-movement/Google Scholar
Pablo N. Mendes, Max Jakob, Andres Garcia-Silva, and Christian Bizer. 2011. DBpedia Spotlight: Shedding Light on the Web of Documents Proceedings of the 7th International Conference on Semantic Systems (I-Semantics). Google ScholarDigital Library
Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. 2010. Twitter Under Crisis: Can we trust what we RT. In Proceedings of the first workshop on social media analytics. ACM, 71--79. Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 a. Efficient Estimation of Word Representations in Vector Space Proceedings of Workshop at ICLR, 2013.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 b. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013. Google ScholarDigital Library
Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiao-Dan Zhu, and Colin Cherry. 2016. SemEval-2016 Task 6: Detecting Stance in Tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, San Diego, CA, USA, June 16--17, 2016. 31--41. deftempurl%http://aclweb.org/anthology/S/S16/S16--1003.pdf tempurlGoogle ScholarCross Ref
Ankur P. Parikh, Oscar T"ackström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference EMNLP.Google Scholar
Dean Pomerleau and Delip Rao. {n. d.}. Fake News Challenge. deftempurl%http://www.fakenewschallenge.org/ tempurlGoogle Scholar
Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying Misinformation in Microblogs Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27--31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL. 1589--1599. deftempurl%http://www.aclweb.org/anthology/D11--1147 tempurl Google ScholarDigital Library
Dafna Shahaf and Carlos Guestrin. 2010. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 623--632. Google ScholarDigital Library
Shuohang Wang and Jing Jiang. 2016. Learning Natural Language Inference with LSTM. In HLT-NAACL.Google Scholar
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward Computational Fact-Checking. PVLDB Vol. 7, 7 (2014), 589--600. deftempurl%http://www.vldb.org/pvldb/vol7/p589-wu.pdf tempurl Google ScholarDigital Library
You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2017. Computational Fact Checking through Query Perturbations. ACM Trans. Database Syst. Vol. 42, 1 (2017), 4:1--4:41. Google ScholarDigital Library
Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2007. Truth discovery with multiple conflicting information providers on the web Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12--15, 2007. 1048--1052. Google ScholarDigital Library

Index Terms

Relevant Document Discovery for Fact-Checking Articles
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining

Recommendations

Journalism, Misinformation and Fact Checking Chairs' Welcome & Organization
WWW '18: Companion Proceedings of the The Web Conference 2018

It is our pleasure to welcome you to the WWW 2018 Journalism, Misinformation and Fact Checking Alternate Track. Although the problem of misinformation and deceptive information is as old as Web itself, the topic has gained a lot of attention recently. ...
Read More
Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media

Misinformation and fact-checking are opposite forces in the news environment: the former creates inaccuracies to mislead people, while the latter provides evidence to rebut the former. These news articles are often posted on social media and attract user ...
Read More
Characterizing the impact of fact-checking on the COVID-19 misinformation combat
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

The COVID-19, a disease caused by SARS-CoV-2, affected the whole world in 2020 by its pandemic impact. This virus has a very high capacity for contamination through contact with other infected people. One of the main ways to fight the virus is to reduce ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Companion Proceedings of the The Web Conference 2018
April 2018
2023 pages
ISBN:9781450356404
General Chairs:
Pierre-Antoine Champin
Université Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, CNRS, LIRIS, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 23 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
claim-relevance discovery
digital misinformation
fact checking
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 2,026
  Total Downloads
- Downloads (Last 12 months)192
- Downloads (Last 6 weeks)29
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Relevant Document Discovery for Fact-Checking Articles

WWW '18: Companion Proceedings of the The Web Conference 2018

ABSTRACT

References

Cited By

Index Terms

Recommendations

Journalism, Misinformation and Fact Checking Chairs' Welcome & Organization

Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media

Characterizing the impact of fact-checking on the COVID-19 misinformation combat