ABSTRACT
News entities must select and filter the coverage they broadcast through their respective channels since the set of world events is too large to be treated exhaustively. The subjective nature of this filtering induces biases due to, among other things, resource constraints, editorial guidelines, ideological affinities, or even the fragmented nature of the information at a journalist's disposal. The magnitude and direction of these biases are, however, widely unknown. The absence of ground truth, the sheer size of the event space, or the lack of an exhaustive set of absolute features to measure make it difficult to observe the bias directly, to characterize the leaning's nature and to factor it out to ensure a neutral coverage of the news. In this work, we introduce a methodology to capture the latent structure of media's decision process on a large scale. Our contribution is multi-fold. First, we show media coverage to be predictable using personalization techniques, and evaluate our approach on a large set of events collected from the GDELT database. We then show that a personalized and parametrized approach not only exhibits higher accuracy in coverage prediction, but also provides an interpretable representation of the selection bias. Last, we propose a method able to select a set of sources by leveraging the latent representation. These selected sources provide a more diverse and egalitarian coverage, all while retaining the most actively covered events.
- Eytan Bakshy, Solomon Messing, and Lada A. Adamic. 2015. Political science. Exposure to ideologically diverse news and opinion on Facebook. Science 348 6239 (2015), 1130--2.Google Scholar
- Carlos de Juan Carbonell and Jade Goldstein-Stewart. 1998. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. SIGIR Forum 51 (1998), 209--210. Google ScholarDigital Library
- Stefano DellaVigna, Ethan Kaplan, Alan B. Krueger, Marco Manacorda, Enrico Moretti, Torsten Persson, Sam Popkin, Riccardo Puglisi, Matthew Rabin, Jesse M. Shapiro, Uri Simonsohn, Laura Stoker, David Stromberg, Tatyana Deryugina, Monica Deza, Dylan Fox, Melissa Galicia, Calvin Wai-Loon Ho, Sudhamas Khanchanawong, Richard M. Kim, Martin Kohan, Vipul Surender Kumar, Jonathan J. Leung, Clarice Li, Tze Yang Lim, Ming Mai, Sameer Parekh, Sharmini Radakrishnan, Rohan Relan, Dan Acland, Saurabh Bhargava, Avi Ebenstein, and Devin G. Pope. 2005. The Fox News Effect: Media Bias and Voting.Google Scholar
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A DensityBased Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In KDD. Google ScholarDigital Library
- Seth Flaxman, Sharad Goel, and Justin M. Rao. 2015. Filter Bubbles, Echo Chambers, and Online News Consumption.Google Scholar
- Tim Groseclose and Jeffrey Milyo. 2005. A Measure of Media Bias. The Quarterly Journal of Economics 120, 4 (2005), 1191--1237. http://www.jstor.org/stable/ 25098770Google ScholarCross Ref
- Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. 2008 Eighth IEEE International Conference on Data Mining (2008), 263--272. Google ScholarDigital Library
- Swetha Keertipati, Bastin Tony Roy Savarimuthu, Maryam Purvis, and Martin K. Purvis. 2014. Multi-level Analysis of Peace and Conflict Data in GDELT. In MLSDA@PRICAI. Google ScholarDigital Library
- Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42 (2009). Google ScholarDigital Library
- Haewoon Kwak and Jisun An. 2016. Two Tales of the World: Comparison of Widely Used World News Datasets GDELT and EventRegistry. In ICWSM.Google Scholar
- Kalev Leetaru and Philip A. Schrodt. 2013. GDELT: Global data on events, location, and tone. ISA Annual Convention (2013).Google Scholar
- Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, Enhong Chen, and Yong Rui. 2014. GeoMF: joint geographical modeling and matrix factorization for point-of-interest recommendation. In KDD. Google ScholarDigital Library
- Yu-Ru Lin, James P. Bagrow, and David Lazer. 2011. More Voices Than Ever Quantifying Media Bias in Networks. CoRR abs/1111.1227 (2011).Google Scholar
- Rowland Lorimer and Scannell. 1994. Mass communications: a comparative introduction. pp. 86--87 pages.Google Scholar
- Alexandros Nanopoulos, Dimitrios Rafailidis, Panagiotis Symeonidis, and Yannis Manolopoulos. 2010. MusicBox: Personalized Music Recommendation Based on Cubic Analysis of Social Tags. IEEE Transactions on Audio, Speech, and Language Processing 18 (2010), 407--412.Google ScholarDigital Library
- Alexandra Olteanu, Carlos Castillo, Nicholas Diakopoulos, and Karl Aberer. 2015. Comparing Events Coverage in Online News and Social Media: The Case of Climate Change. In ICWSM.Google Scholar
- Rong Pan, Yunhong Zhou, Bin Cao, Nathan Nan Liu, Rajan M. Lukose, Martin Scholz, and Qiang Yang. 2008. One-Class Collaborative Filtering. 2008 Eighth IEEE International Conference on Data Mining (2008), 502--511. Google ScholarDigital Library
- Martin Piotte and Martin Chabbert. 2009. The pragmatic theory solution to the netflix grand prize. Netflix prize documentation (2009).Google Scholar
- Fengcai Qiao, Pei Li, Jingsheng Deng, Zhaoyun Ding, and Hui Wang. 2015. GraphBased Method for Detecting Occupy Protest Events Using GDELT Dataset. 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (2015), 164--168. Google ScholarDigital Library
- Jérémie Rappaz, Maria-Luiza Vladarean, J. Randall McAuley, and Michele Catasta. 2017. Bartering Books to Beers: A Recommender System for Exchange Platforms. In WSDM. Google ScholarDigital Library
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI. Google ScholarDigital Library
- Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor. 2010. Recommender Systems Handbook (1st ed.). Springer-Verlag New York, Inc., New York, NY, USA. 148--149, 161--168 pages. Google ScholarDigital Library
- Diego Sáez-Trumper, Carlos Castillo, and Mounia Lalmas. 2013. Social media news communities: gatekeeping, coverage, and statement bias. In CIKM. Google ScholarDigital Library
- Guy Shani and Asela Gunawardana. 2011. Evaluating recommendation systems. Springer, 257--297.Google Scholar
- Jonathon Shlens. 2014. A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014).Google Scholar
- Laurens van der Maaten, Geoffrey E. Hinton, and Yoshua Bengio. 2008. Visualizing Data using t-SNE.Google Scholar
- Kevin Wallsten. 2005. Political Blogs and the Bloggers Who Blog Them: Is the Political Blogosphere and Echo ChamberGoogle Scholar
Index Terms
- Selection Bias in News Coverage: Learning it, Fighting it
Recommendations
A Dynamic Embedding Model of the Media Landscape
WWW '19: The World Wide Web ConferenceInformation about world events is disseminated through a wide variety of news channels, each with specific considerations in the choice of their reporting. Although the multiplicity of these outlets should ensure a variety of viewpoints, recent reports ...
Correcting for Selection Bias in Learning-to-rank Systems
WWW '20: Proceedings of The Web Conference 2020Click data collected by modern recommendation systems are an important source of observational data that can be utilized to train learning-to-rank (LTR) systems. However, these data suffer from a number of biases that can result in poor performance for ...
Addressing Selection Bias in Event Studies with General-Purpose Social Media Panels
Challenge Paper and Research PapersData from Twitter have been employed in prior research to study the impacts of events. Conventionally, researchers use keyword-based samples of tweets to create a panel of Twitter users who mention event-related keywords during and after an event. ...
Comments