Abstract
Bots have been important to peer production's success. Wikipedia, OpenStreetMap, and Wikidata all have taken advantage of automation to perform work at a rate and scale exceeding that of human contributors. Understanding the ways in which humans and bots behave in these communities is an important topic, and one that relies on accurate bot recognition. Yet, in many cases, bot activities are not explicitly flagged and could be mistaken for human contributions. We develop a machine classifier to detect previously unidentified bots using implicit behavioral and other informal editing characteristics. We show that this method yields a high level of fitness under both formal evaluation (PR-AUC: 0.845, ROC-AUC: 0.985) and a qualitative analysis of "anonymous" contributor edit sessions. We also show that, in some cases, unflagged bot activities can significantly misrepresent human behavior in analyses. Our model has the potential to support future research and community patrolling activities.
- R. Stuart Geiger. 2011. The lives of bots. (2011).Google Scholar
- R. Stuart Geiger and Aaron Halfaker. 2013. When the levee breaks: without bots, what happens to Wikipedia's quality control processes? In OpenSym, 6. Google ScholarDigital Library
- R. Stuart Geiger and Aaron Halfaker. 2013. Using edit sessions to measure participation in Wikipedia. In CSCW, 861--870. Google ScholarDigital Library
- R. Stuart Geiger and Aaron Halfaker. 2017. Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of "Even Good Bots Fight." (2017).Google Scholar
- Aaron Halfaker, Oliver Keyes, Daniel Kluver, Jacob Thebault-Spieker, Tien Nguyen, Kenneth Shores, Anuradha Uduwage, and Morten Warncke-Wang. 2015. User session identification based on strong regularities in inter-activity time. In WWW, 410--418. Google ScholarDigital Library
- Aaron Halfaker, Aniket Kittur, Robert Kraut, and John Riedl. 2009. A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia. In WikiSym (WikiSym '09), 15:1--15:10. Google ScholarDigital Library
- Aaron Halfaker and John Riedl. 2012. Bots and cyborgs: Wikipedia's immune system. Computer 45, 3 (2012), 79--82. Google ScholarDigital Library
- Andrew Hall, Sarah McRoberts, Jacob Thebault-Spieker, Yilun Lin, Shilad Sen, Brent Hecht, and Loren Terveen. 2017. Freedom versus standardization: structured data generation in a peer production community. In CHI, 6352--6362. Google ScholarDigital Library
- Ah Reum Kang, Jiyoung Woo, Juyong Park, and Huy Kang Kim. 2013. Online game bot detection based on party-play log analysis. Comput. Math. Appl. 65, 9 (2013), 1384--1395.Google ScholarCross Ref
- Hongwen Kang, Kuansan Wang, David Soukal, Fritz Behr, and Zijian Zheng. 2010. Large-scale bot detection for search engines. In GROUP, 501--510. Google ScholarDigital Library
- Aniket Kittur, Ed Chi, Bryan A. Pendleton, Bongwon Suh, and Todd Mytkowicz. 2007. Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie. World Wide Web 1, 2 (2007), 19.Google Scholar
- Olena Medelyan, David Milne, Catherine Legg, and Ian H. Witten. 2009. Mining meaning from Wikipedia. Int. J. Hum.-Comput. Stud. 67, 9 (September 2009), 716--754. Google ScholarDigital Library
- Claudia Müller-Birn, Benjamin Karran, Janette Lehmann, and Markus Luczak-Rösch. 2015. Peer-production system or collaborative ontology engineering effort: What is Wikidata? In OpenSym, 20. Retrieved June 27, 2016 from http://dl.acm.org/citation.cfm?id=2789836 Google ScholarDigital Library
- Katherine Panciera, Aaron Halfaker, and Loren Terveen. 2009. Wikipedians Are Born, Not Made: A Study of Power Editors on Wikipedia. In GROUP (GROUP '09), 51--60. Google ScholarDigital Library
- Alessandro Piscopo, Chris Phethean, and Elena Simperl. 2017. What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata. In SocInfo, 305--322.Google Scholar
- Martin Potthast, Benno Stein, and Teresa Holfeld. 2010. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In CLEF (Notebook Papers/LABs/Workshops).Google Scholar
- Amir Sarabadani, Aaron Halfaker, and Dario Taraborelli. 2017. Building automated vandalism detection tools for Wikidata. In WWW, 1647--1654. Google ScholarDigital Library
- Thomas Steiner. 2014. Bots vs. wikipedians, anons vs. logged-ins (redux): A global study of edit activity on wikipedia and wikidata. In OpenSym, 25. Retrieved June 24, 2016 from http://dl.acm.org/citation.cfm?id=2641613 Google ScholarDigital Library
- Pang-Ning Tan and Vipin Kumar. 2004. Discovery of web robot sessions based on their navigational patterns. In Intelligent Technologies for Information Analysis. Springer, 193--222.Google Scholar
- Ruck Thawonmas, Yoshitaka Kashifuji, and Kuan-Ta Chen. 2008. Detection of MMORPG bots based on behavior analysis. In Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology, 91--94. Google ScholarDigital Library
- Milena Tsvetkova, Ruth García-Gavilanes, Luciano Floridi, and Taha Yasseri. 2016. Even Good Bots Fight. ArXiv Prepr. ArXiv160904285 (2016).Google Scholar
- Morten Warncke-Wang, Vivek Ranjan, Loren Terveen, and Brent Hecht. 2015. Misalignment Between Supply and Demand of Quality Content in Peer Production Communities. In ICWSM. Retrieved September 16, 2016 from http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10591Google Scholar
- Diyi Yang, Aaron Halfaker, Robert Kraut, and Eduard Hovy. 2017. Identifying Semantic Edit Intentions from Revisions in Wikipedia. In EMNLP 2017, 2000--2010. Retrieved July 8, 2018 from https://www.aclweb.org/anthology/D17--1213Google ScholarCross Ref
- Dennis Zielstra, Hartwig H. Hochmair, and Pascal Neis. 2013. Assessing the effect of data imports on the completeness of OpenStreetMap--a United States case study. Trans. GIS 17, 3 (2013), 315--334.Google ScholarCross Ref
- 2017. Wikipedia:Bot Approvals Group. Wikipedia. Retrieved January 20, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:Bot_Approvals_Group&oldid=807843217Google Scholar
- 2017. Wikipedia:History of Wikipedia bots. Wikipedia. Retrieved January 20, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:History_of_Wikipedia_bots&oldid=812914046Google Scholar
- 2018. Wikipedia:Bot policy. Wikipedia. Retrieved January 20, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:Bot_policy&oldid=820435660Google Scholar
- 2018. Coding (social sciences). Wikipedia. Retrieved July 8, 2018 from https://en.wikipedia.org/w/index.php?title=Coding_(social_sciences)&oldid=834193623Google Scholar
- 2018. Wikipedia:AutoWikiBrowser. Wikipedia. Retrieved July 8, 2018 from https://en.wikipedia.org/w/index.php?title=Wikipedia:AutoWikiBrowser&oldid=840931199Google Scholar
- Wikidata:Bots - Wikidata. Retrieved July 4, 2018 from https://www.wikidata.org/wiki/Wikidata:BotsGoogle Scholar
- TIGER - OpenStreetMap Wiki. Retrieved January 20, 2018 from https://wiki.openstreetmap.org/wiki/TIGERGoogle Scholar
- TIGER fixup - OpenStreetMap Wiki. Retrieved January 20, 2018 from https://wiki.openstreetmap.org/wiki/TIGER_fixupGoogle Scholar
- Import/Guidelines - OpenStreetMap Wiki. Retrieved January 20, 2018 from https://wiki.openstreetmap.org/wiki/Import/GuidelinesGoogle Scholar
- Who Writes Wikipedia? (Aaron Swartz's Raw Thought). Retrieved January 19, 2018 from http://www.aaronsw.com/weblog/whowriteswikipediaGoogle Scholar
- Research:Measuring edit productivity - Meta. Retrieved April 17, 2018 from https://meta.wikimedia.org/wiki/Research:Measuring_edit_productivityGoogle Scholar
- Wikidata:Glossary - Wikidata. Retrieved July 1, 2018 from https://www.wikidata.org/wiki/Wikidata:GlossaryGoogle Scholar
- Manual:Tags - MediaWiki. Retrieved July 9, 2018 from https://www.mediawiki.org/wiki/Manual:TagsGoogle Scholar
- Proposed features/changeset tags - OpenStreetMap Wiki. Retrieved September 3, 2018 from https://wiki.openstreetmap.org/wiki/Proposed_features/changeset_tagsGoogle Scholar
Index Terms
Bot Detection in Wikidata Using Behavioral and Other Informal Cues
Recommendations
"Could You Define That in Bot Terms"?: Requesting, Creating and Using Bots on Reddit
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing SystemsBots are estimated to account for well over half of all web traffic, yet they remain an understudied topic in HCI. In this paper we present the findings of an analysis of 2284 submissions across three discussion groups dedicated to the request, creation ...
Behavioral detection of malware on mobile handsets
MobiSys '08: Proceedings of the 6th international conference on Mobile systems, applications, and servicesA novel behavioral detection framework is proposed to detect mobile worms, viruses and Trojans, instead of the signature-based solutions currently available for use in mobile devices. First, we propose an efficient representation of malware behaviors ...
Malware detection using adaptive data compression
AISec '08: Proceedings of the 1st ACM workshop on Workshop on AISecA popular approach in current commercial anti-malware software detects malicious programs by searching in the code of programs for scan strings that are byte sequences indicative of malicious code. The scan strings, also known as the signatures of ...
Comments