skip to main content
research-article

Rumor Gauge: Predicting the Veracity of Rumors on Twitter

Published:14 July 2017Publication History
Skip Abstract Section

Abstract

The spread of malicious or accidental misinformation in social media, especially in time-sensitive situations, such as real-world emergencies, can have harmful effects on individuals and society. In this work, we developed models for automated verification of rumors (unverified information) that propagate through Twitter. To predict the veracity of rumors, we identified salient features of rumors by examining three aspects of information spread: linguistic style used to express rumors, characteristics of people involved in propagating information, and network propagation dynamics. The predicted veracity of a time series of these features extracted from a rumor (a collection of tweets) is generated using Hidden Markov Models. The verification algorithm was trained and tested on 209 rumors representing 938,806 tweets collected from real-world events, including the 2013 Boston Marathon bombings, the 2014 Ferguson unrest, and the 2014 Ebola epidemic, and many other rumors about various real-world events reported on popular websites that document public rumors. The algorithm was able to correctly predict the veracity of 75% of the rumors faster than any other public source, including journalists and law enforcement officials. The ability to track rumors and predict their outcomes may have practical applications for news consumers, financial markets, journalists, and emergency services, and more generally to help minimize the impact of false information on Twitter.

References

  1. Pear Analytics. 2009. Twitter Study--August 2009. Available: https://pearanalytics.com/wp-content/uploads/2009/08/Twitter-Study-August-2009.pdf. Accessed 2015 March 13.Google ScholarGoogle Scholar
  2. Sinan Aral and Dylan Walker. 2012. Identifying influential and susceptible members of social networks. Science 337, 6092 (2012), 337--341. Google ScholarGoogle ScholarCross RefCross Ref
  3. Eytan Bakshy, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Everyone’s an influencer: Quantifying influence on Twitter. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. ACM, 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd international conference on World Wide Web. International World Wide Web Conferences Steering Committee, 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Prashant Bordia and Ralph L. Rosnow. 1998. Rumor rest stops on the information highway transmission patterns in a computer-mediated rumor chain. Human Communication Research 25, 2 (1998), 163--179. Google ScholarGoogle ScholarCross RefCross Ref
  6. Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on Twitter. In Proceedings of the 20th International Conference on World Wide Web. ACM, 675--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Damon Centola. 2010. The spread of behavior in an online social network experiment. Science 329, 5996 (2010), 1194--1197. Google ScholarGoogle ScholarCross RefCross Ref
  8. Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, 740--750. Google ScholarGoogle ScholarCross RefCross Ref
  9. Robin Cowan and Nicolas Jonard. 2004. Network structure and the diffusion of knowledge. Journal of economic Dynamics and Control 28, 8 (2004), 1557--1575. Google ScholarGoogle ScholarCross RefCross Ref
  10. David Crystal. 2006. Language and the Internet (2nd). Cambridge: Cambridge University Press. Google ScholarGoogle ScholarCross RefCross Ref
  11. Bertrand De Longueville, Robin S. Smith, and Gianluca Luraschi. 2009. Omg, from here, I can see the flames! A use case of mining location based social networks to acquire spatio-temporal data on forest fires. In Proceedings of the 2009 International Workshop on Location Based Social Networks. ACM, 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data Experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment 1, 2 (2008), 1542--1552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Paul Earle, Michelle Guy, Richard Buckmaster, Chris Ostrum, Scott Horvath, and Amy Vaughan. 2010. OMG earthquake! Can Twitter improve earthquake response? Seismological Research Letters 81, 2 (2010), 246--251. Google ScholarGoogle ScholarCross RefCross Ref
  14. Bradley Efron. 1982. The Jackknife, the Bootstrap, and Other Resampling Plans. (SIAM Monograph #38) Philadelphia: Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarCross RefCross Ref
  15. Eric K. Foster and Ralph L. Rosnow. 2006. Gossip and network relationships. Relating Difficulty: The Process of Constructing and Managing Difficult Interaction (2006), 161--180.Google ScholarGoogle Scholar
  16. Adrien Friggeri, Lada A. Adamic, Dean Eckles, and Justin Cheng. 2014. Rumor cascades. In Proceedings of the 8th International AAAI Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  17. Ayalvadi Ganesh, Laurent Massoulié, and Don Towsley. 2005. The effect of network topology on the spread of epidemics. In Proceedings of the 24th Annual Joint Conference of the IEEE Computer and Communications Societies INFOCOM 2005, Vol. 2. IEEE, 1455--1466. Google ScholarGoogle ScholarCross RefCross Ref
  18. Sharad Goel, Duncan J. Watts, and Daniel G. Goldstein. 2012. The structure of online diffusion networks. In Proceedings of the 13th ACM Conference on Electronic Commerce. ACM, 623--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Frank E. Harrell. 2001. Regression Modeling Strategies. Springer Science 8 Business Media.Google ScholarGoogle Scholar
  20. Amanda Lee Hughes and Leysia Palen. 2009. Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management 6, 3 (2009), 248--260. Google ScholarGoogle ScholarCross RefCross Ref
  21. Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we Twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis. ACM, 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos. 2016a. Spotting suspicious behaviors in multimodal data: A general metric and algorithms. IEEE Transactions on Knowledge and Data Engineering 28, 8 (2016), 2187--2200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014. Catchsync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Meng Jiang, Peng Cui, and Christos Faloutsos. 2016b. Suspicious behavior detection: Current trends and future directions. IEEE Intelligent Systems 31, 1 (2016), 31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fang Jin, Wei Wang, Liang Zhao, Edward Dougherty, Yang Cao, Chang-Tien Lu, and Naren Ramakrishnan. 2014. Misinformation propagation in the age of Twitter. Computer 47, 12 (2014), 90--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Márton Karsai, Gerardo Iñiguez, Kimmo Kaski, and János Kertész. 2014. Complex contagion process in spreading of online innovation. Journal of The Royal Society Interface 11, 101 (2014), 20140694.Google ScholarGoogle ScholarCross RefCross Ref
  27. Max Kaufmann and Jugal Kalita. 2010. Syntactic normalization of Twitter messages. In Proceedings of the International Conference on Natural Language Processing. Kharagpur, India.Google ScholarGoogle Scholar
  28. Kirill Kireyev, Leysia Palen, and K. Anderson. 2009. Applications of topics models to analysis of disaster-related Twitter data. In NIPS Workshop on Applications for Topic Models: Text and Beyond. Amherst, MA.Google ScholarGoogle Scholar
  29. Lingpeng Kong, Nathan Schneider, Swabha Swayamdipta, Archna Bhatia, Chris Dyer, and Noah A. Smith. 2014. A dependency parser for tweets. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'14). ACL, 1001--1012. Google ScholarGoogle ScholarCross RefCross Ref
  30. Lalit Kundani. 2013. When the Tail Wags the Dog: Dangers of Crowdsourcing Justice. Retrieved from http://newamericamedia.org/2013/07/when-the-tail-wags-the-dog-dangers-of-crowdsourcing-justice.php/.Google ScholarGoogle Scholar
  31. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web. ACM, 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. 2013. Prominent features of rumor propagation in online social media. In Proceedings of the 13th International Conference on Data Mining (ICDM). IEEE, 1103--1108. Google ScholarGoogle ScholarCross RefCross Ref
  33. Sam Laird. 2012. “How Social Media Is Taking Over the News Industry”. (April 2012). http://mashable.com/ 2012/04/18/social-media-and-the-news/[mashable.com; posted 18-April-2012].Google ScholarGoogle Scholar
  34. Vasileios Lampos, Tijl De Bie, and Nello Cristianini. 2010. Flu detector-tracking epidemics on Twitter. In Machine Learning and Knowledge Discovery in Databases. Springer, 599--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dave Lee. 2013. Boston bombing: How internet detectives got it very wrong. Retrieved from http://www.bbc.com/news/technology-22214511/.Google ScholarGoogle Scholar
  36. Jure Leskovec, Lars Backstrom, and Jon Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 497--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yixuan Li, Oscar Martinez, Xing Chen, Yi Li, and John E. Hopcraft. 2016. In a world that counts: Clustering and detecting fake social engagement at scale. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Gang Liang, Jin Yang, and Chun Xu. 2016. Automatic rumors identification on Sina Weibo. In Proceedings of the12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD’16). IEEE, 1523--1531. Google ScholarGoogle ScholarCross RefCross Ref
  39. Hugo Liu and Push Singh. 2004. ConceptNeta practical commonsense reasoning tool-kit. BT Technology Journal 22, 4 (2004), 211--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, and Christos Faloutsos. 2012. Rise and fall patterns of information diffusion: Model and implications. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 6--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Marcelo Mendoza, Barbara Poblete, and Carlos Castillo. 2010. Twitter under crisis: Can we trust what we RT? In Proceedings of the 1st Workshop on Social Media Analytics. ACM, 71--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. George Miller and Christiane Fellbaum. 1998. Wordnet: An electronic lexical database. (1998).Google ScholarGoogle Scholar
  43. Mor Naaman, Jeffrey Boase, and Chih-Hui Lai. 2010. Is it really about me? Message content in social awareness streams. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work. ACM, 189--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mark E. J. Newman. 2002. Spread of epidemic disease on networks. Physical review E 66, 1 (2002), 016128.Google ScholarGoogle Scholar
  45. Romualdo Pastor-Satorras and Alessandro Vespignani. 2001. Epidemic spreading in scale-free networks. Physical Review Letters 86, 14 (2001), 3200.Google ScholarGoogle ScholarCross RefCross Ref
  46. James W. Pennebaker, Matthias R. Mehl, and Kate G. Niederhoffer. 2003. Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology 54, 1 (2003), 547--577. Google ScholarGoogle Scholar
  47. The Pew Research Center. 2008. Internet Overtakes Newspapers As News Outlet. (December 2008). http://pewresearch.org/pubs/1066/internet-overtakes-newspapers-as-news-source[pewresearch.org; posted 23-December-2008].Google ScholarGoogle Scholar
  48. The Pew Research Center. 2009. Public Evaluations of the News Media: 1985-2009. Press Accuracy Rating Hits Two Decade Low. Retrieved from http://www.people-press.org/2009/09/13/press-accuracy-rating-hits-two-decade-low/.Google ScholarGoogle Scholar
  49. The Pew Research Center. 2012. Further Decline in Credibility Ratings for Most News Organizations. Retrieved from http://www.people-press.org/2012/08/16/further-decline-in-credibility-ratings-for-most-news-organizations/.Google ScholarGoogle Scholar
  50. Kevin Poulsen. 2007. Firsthand reports from California wildfires pour through Twitter. Available: www.wired.com/threatlevel/2007/10/firsthand. Accessed 2009 Feburary 15.Google ScholarGoogle Scholar
  51. Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor has it: Identifying misinformation in microblogs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1589--1599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Lawrence Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2 (1989), 257--286. Google ScholarGoogle Scholar
  53. Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer. 2011a. Detecting and tracking political abuse in social media. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM'11). AAAI, 297--304.Google ScholarGoogle Scholar
  54. Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno Goncalves, Snehal Patil, Alessandro Flammini, and Filippo Menczer. 2011b. Detecting and tracking the spread of astroturf memes in microblog streams. In Proceedings of the 20th International Conference Companion on World Wide Web. ACM, 249--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ralph L. Rosnow. 1991. Inside rumor: A personal journey. American Psychologist 46, 5 (1991), 484.Google ScholarGoogle ScholarCross RefCross Ref
  56. Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 851--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26, 1 (1978), 43--49. Google ScholarGoogle ScholarCross RefCross Ref
  58. Jagan Sankaranarayanan, Hanan Samet, Benjamin E. Teitler, Michael D. Lieberman, and Jon Sperling. 2009. Twitterstand: News in tweets. In Proceedings of the 17th ACM Sigspatial International Conference on Advances in Geographic Information Systems. ACM, 42--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Devavrat Shah and Tauhid Zaman. 2011. Rumors in a network: Who’s the culprit? IEEE Transactions on Information Theory 57, 8 (2011), 5163--5181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tamotsu Shibutani. 1966. Improvised News: A Sociological Study of Rumor. Ardent Media.Google ScholarGoogle Scholar
  61. Kate Starbird, Leysia Palen, Amanda L. Hughes, and Sarah Vieweg. 2010. Chatter on the red: What hazards threat reveals about the social life of microblogged information. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work. ACM, 241--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Wilma Stassen. 2010. Your news in 140 characters: Exploring the role of social media in journalism. Global Media Journal-African Edition 4, 1 (2010), 116--131.Google ScholarGoogle Scholar
  63. Manuel Valdes. 2013. Innocents accused in online manhunt. Retieved from http://www.3news.co.nz/Innocents-accused-in-online-manhunt/tabid/412/articleID/295143/Default.aspx/.Google ScholarGoogle Scholar
  64. Sarah Vieweg. 2010. Microblogged contributions to the emergency arena: Discovery, interpretation and implications. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work. ACM, 241--250.Google ScholarGoogle Scholar
  65. Sarah Vieweg, Amanda L. Hughes, Kate Starbird, and Leysia Palen. 2010. Microblogging during two natural hazards events: What Twitter may contribute to situational awareness. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1079--1088. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Soroush Vosoughi. 2015. Automatic detection and verification of rumors on Twitter. Ph.D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  67. Soroush Vosoughi and Deb Roy. 2015. A human-machine collaborative system for identifying rumors on Twitter. In 2015 IEEE International Conference on Data Mining Workshop (ICDMW'15). IEEE, 47--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Soroush Vosoughi and Deb Roy. 2016a. A semi-automatic method for efficient detection of stories on social media. In Proceedings of the10th International AAAI Conference on Web and Social Media. AAAI, 707--710.Google ScholarGoogle Scholar
  69. Soroush Vosoughi and Deb Roy. 2016b. Tweet acts: A speech act classifier for Twitter. In Proceedings of the10th International AAAI Conference on Web and Social Media. AAAI, 711--714.Google ScholarGoogle Scholar
  70. Soroush Vosoughi, Helen Zhou, and Deb Roy. 2015. Enhanced Twitter sentiment classification using contextual information. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, 16--24. http://aclweb.org/anthology/W15-2904.Google ScholarGoogle ScholarCross RefCross Ref
  71. Duncan J. Watts and Peter Sheridan Dodds. 2007. Influentials, networks, and public opinion formation. Journal of consumer research 34, 4 (2007), 441--458. Google ScholarGoogle ScholarCross RefCross Ref
  72. Kang Zhao, John Yen, Greta Greer, Baojun Qiu, Prasenjit Mitra, and Kenneth Portier. 2014. Finding influential users of online health communities: A new metric based on sentiment influence. Journal of the American Medical Informatics Association (JAMIA) 21, e2 (2014), e212--e218. Google ScholarGoogle ScholarCross RefCross Ref
  73. Zhe Zhao, Paul Resnick, and Qiaozhu Mei. 2015. Enquiring minds: Early detection of rumors in social media from enquiry posts. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1395--1405. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rumor Gauge: Predicting the Veracity of Rumors on Twitter

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Knowledge Discovery from Data
            ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 4
            Special Issue on KDD 2016 and Regular Papers
            November 2017
            419 pages
            ISSN:1556-4681
            EISSN:1556-472X
            DOI:10.1145/3119906
            • Editor:
            • Jie Tang
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 14 July 2017
            • Accepted: 1 March 2017
            • Revised: 1 October 2016
            • Received: 1 November 2015
            Published in tkdd Volume 11, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader