ABSTRACT
With the support of major search platforms such as Google and Bing, fact-checking articles, which can be identified by their adoption of the schema.org ClaimReview structured markup, have gained widespread recognition for their role in the fight against digital misinformation. A claim-relevant document is an online document that addresses, and potentially expresses a stance towards, some claim. The claim-relevance discovery problem, then, is to find claim-relevant documents. Depending on the verdict from the fact check, claim-relevance discovery can help identify online misinformation. In this paper, we provide an initial approach to the claim-relevance discovery problem by leveraging various information retrieval and machine learning techniques. The system consists of three phases. First, we retrieve candidate documents based on various features in the fact-checking article. Second, we apply a relevance classifier to filter away documents that do not address the claim. Third, we apply a language feature based classifier to distinguish documents with different stances towards the claim. We experimentally demonstrate that our solution achieves solid results on a large-scale dataset and beats state-of-the-art baselines. Finally, we highlight a rich set of case studies to demonstrate the myriad of remaining challenges and that this problem is far from being solved.
- Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings ICLR 2017.Google Scholar
- Sean Baird, Doug Sibley, and Yuxi Pan. 2017. Talos Targets Disinformation with Fake News Challenge Victory. deftempurl%https://blog.talosintelligence.com/2017/06/talos-fake-news-challenge.html tempurlGoogle Scholar
- Roy Bar-Haim, Indrajit Bhattacharya, Francesco Dinuzzo, Amrita Saha, and Noam Slonim. 2017. Stance Classification of Context-Dependent Claims. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3--7, 2017, Volume 1: Long Papers. 251--261.Google ScholarCross Ref
- Microsoft Bing. 2017. Bing adds Fact Check label in SERP to support the ClaimReview markup. https://blogs.bing.com/Webmaster-Blog/September-2017/Bing-adds-Fact-Check-label-in-SERP-to-support-the-ClaimReview-markupGoogle Scholar
- Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 a. A large annotated corpus for learning natural language inference 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015 b. A large annotated corpus for learning natural language inference Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17--21, 2015. 632--642. deftempurl%http://aclweb.org/anthology/D/D15/D15--1075.pdf tempurlGoogle Scholar
- Samuel R Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D Manning, and Christopher Potts. 2016. A Fast Unified Model for Parsing and Sentence Understanding ACL 2016.Google Scholar
- Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for Natural Language Inference. In ACL 2017.Google Scholar
- Nick Craswell and Martin Szummer. 2007. Random Walks on the Click Graph. In SIGIR. Google ScholarDigital Library
- Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine The Annals of Statistics, Volume 29, Number 5 (2001), 1189--1232.Google Scholar
- FullFact. 2016. The State of Automated Factchecking.Google Scholar
- Richard Gingras. 2016. Labeling fact-check articles in Google News. https://www.blog.google/topics/journalism-news/labeling-fact-check-articles-google-news/Google Scholar
- Yichen Gong, Heng Luo, and Jian Zhang. 2017. Natural Language Inference over Interaction Space. In arXiv:1709.04348.Google Scholar
- Alan Greenblatt. 2017. The Future of Fact-Checking: Moving ahead in political accountability journalism. https://www.americanpressinstitute.org/publications/reports/white-papers/future-of-fact-checking/single-page/Google Scholar
- Naeemul Hassan, Bill Adair, James T. Hamilton, Chengkai Li, Mark Tremayne, Jun Yang, and Cong Yu. 2015. The Quest to Automate Fact-Checking. In Proceedings of the 2015 ComputationGoogle Scholar
- Journalism Symposium.Google Scholar
- IFCN. {n. d.}. International Fact-Checking Network fact-checkers' code of principles. https://www.poynter.org/international-fact-checking-network-fact-checkers-code-principlesGoogle Scholar
- Justin Kosslyn and Cong Yu. 2017. Fact Check now available in Google Search and News around the world. https://www.blog.google/products/search/fact-check-now-available-google-search-and-news-around-world/Google Scholar
- Quoc Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents ICML 2014. Google ScholarDigital Library
- Michelle Ye Hee Lee. 2017. Fighting falsehoods around the world: A dispatch on the growing global fact-checking movement. https://www.washingtonpost.com/news/fact-checker/wp/2017/07/14/fighting-falsehoods-around-the-world-a-dispatch-on-the-global-fact-checking-movement/Google Scholar
- Pablo N. Mendes, Max Jakob, Andres Garcia-Silva, and Christian Bizer. 2011. DBpedia Spotlight: Shedding Light on the Web of Documents Proceedings of the 7th International Conference on Semantic Systems (I-Semantics). Google ScholarDigital Library
- Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. 2010. Twitter Under Crisis: Can we trust what we RT. In Proceedings of the first workshop on social media analytics. ACM, 71--79. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 a. Efficient Estimation of Word Representations in Vector Space Proceedings of Workshop at ICLR, 2013.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013 b. Distributed Representations of Words and Phrases and their Compositionality. NIPS 2013. Google ScholarDigital Library
- Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiao-Dan Zhu, and Colin Cherry. 2016. SemEval-2016 Task 6: Detecting Stance in Tweets. In Proceedings of the 10th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2016, San Diego, CA, USA, June 16--17, 2016. 31--41. deftempurl%http://aclweb.org/anthology/S/S16/S16--1003.pdf tempurlGoogle ScholarCross Ref
- Ankur P. Parikh, Oscar T"ackström, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference EMNLP.Google Scholar
- Dean Pomerleau and Delip Rao. {n. d.}. Fake News Challenge. deftempurl%http://www.fakenewschallenge.org/ tempurlGoogle Scholar
- Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying Misinformation in Microblogs Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27--31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL. 1589--1599. deftempurl%http://www.aclweb.org/anthology/D11--1147 tempurl Google ScholarDigital Library
- Dafna Shahaf and Carlos Guestrin. 2010. Connecting the dots between news articles. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 623--632. Google ScholarDigital Library
- Shuohang Wang and Jing Jiang. 2016. Learning Natural Language Inference with LSTM. In HLT-NAACL.Google Scholar
- You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward Computational Fact-Checking. PVLDB Vol. 7, 7 (2014), 589--600. deftempurl%http://www.vldb.org/pvldb/vol7/p589-wu.pdf tempurl Google ScholarDigital Library
- You Wu, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2017. Computational Fact Checking through Query Perturbations. ACM Trans. Database Syst. Vol. 42, 1 (2017), 4:1--4:41. Google ScholarDigital Library
- Xiaoxin Yin, Jiawei Han, and Philip S. Yu. 2007. Truth discovery with multiple conflicting information providers on the web Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12--15, 2007. 1048--1052. Google ScholarDigital Library
Index Terms
- Relevant Document Discovery for Fact-Checking Articles
Recommendations
Journalism, Misinformation and Fact Checking Chairs' Welcome & Organization
WWW '18: Companion Proceedings of the The Web Conference 2018It is our pleasure to welcome you to the WWW 2018 Journalism, Misinformation and Fact Checking Alternate Track. Although the problem of misinformation and deceptive information is as old as Web itself, the topic has gained a lot of attention recently. ...
Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media
Misinformation and fact-checking are opposite forces in the news environment: the former creates inaccuracies to mislead people, while the latter provides evidence to rebut the former. These news articles are often posted on social media and attract ...
Characterizing the impact of fact-checking on the COVID-19 misinformation combat
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied ComputingThe COVID-19, a disease caused by SARS-CoV-2, affected the whole world in 2020 by its pandemic impact. This virus has a very high capacity for contamination through contact with other infected people. One of the main ways to fight the virus is to reduce ...
Comments