skip to main content
research-article

Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse

Published:07 August 2018Publication History
Skip Abstract Section

Abstract

Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalization of abuse (abuse counts divided by network size) and do not take into account other inherent or structural properties of providers. Even the size estimates are subject to measurement errors relating to attribution, aggregation, and various sources of heterogeneity. More precise indicators are costly to measure at Internet scale. We address these issues for the case of hosting providers with a statistical model of the abuse data generation process, using phishing sites in hosting networks as a case study. We decompose error sources and then estimate key parameters of the model, controlling for heterogeneity in size and business model. We find that 84% of the variation in abuse counts across 45,358 hosting providers can be explained with structural factors alone. Informed by the fitted model, we systematically select and enrich a subset of 105 homogeneous “statistical twins” with additional explanatory variables, unreasonable to collect for all hosting providers. We find that abuse is positively associated with the popularity of websites hosted and with the prevalence of popular content management systems. Moreover, hosting providers who charge higher prices (after controlling for level differences between countries) witness less abuse. These structural factors together explain a further 77% of the remaining variation. This calls into question premature inferences from raw abuse indicators about the security efforts of actors, and suggests the adoption of similar analysis frameworks in all domains where network measurement aims at informing technology policy.

References

  1. Greg Aaron and Rod Rasmussen. 2015a. Anti-phishing working group (APWG) global phishing survey: Trends and domain name use in 2H2014. Retrieved from http://internetidentity.com/wp-content/uploads/2015/05/APWG_Global_Phishing_Report_2H_2014.pdf.Google ScholarGoogle Scholar
  2. Greg Aaron and Rod Rasmussen. 2015b. Global phishing survey: Trends and domain name use in 1H2014. Retrieved from http://docs.apwg.org/reports/APWG_Global_Phishing_Report_1H_2014.pdf.Google ScholarGoogle Scholar
  3. Anti-Phishing Working Group. 2016. Retrieved from http://www.antiphishing.org.Google ScholarGoogle Scholar
  4. Hadi Asghari, Michael Ciere, and Michel J. G. Van Eeten. 2015a. Post-mortem of a zombie: Conficker cleanup after six years. In Proceedings of the 24th USENIX Security Symposium (USENIXSecurity’15). 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hadi Asghari, Michel J. G. van Eeten, and Johannes M. Bauer. 2015b. Economics of fighting botnets: Lessons from a decade of mitigation. IEEE Secur. Priv. 5 (2015), 16--23.Google ScholarGoogle ScholarCross RefCross Ref
  6. Leyla Bilge, Engin Kirda, Christopher Kruegel, and Marco Balduzzi. 2011. EXPOSURE: Finding malicious domains using passive DNS analysis. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’11). The Internet Society, 1--17.Google ScholarGoogle Scholar
  7. Xue Cai, John Heidemann, Balachander Krishnamurthy, and Walter Willinger. 2010. Towards an AS-to-organization map. In Proceedings of the 10th Internet Measurement Conference (IMC’10). ACM, 199--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Colin Cameron and Pravin K. Trivedi. 1990. Regression-based tests for overdispersion in the poisson model. J. Econ. 46, 3 (1990), 347--364.Google ScholarGoogle ScholarCross RefCross Ref
  9. A. Colin Cameron and Pravin K. Trivedi. 2013. Regression Analysis of Count Data, vol. 53. Cambridge University Press.Google ScholarGoogle Scholar
  10. Davide Canali, Davide Balzarotti, and Aurélien Francillon. 2013. The role of web hosting providers in detecting compromised websites. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 177--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Orcun Cetin, Mohammad Hanif Jhaveri, Carlos Gañán, Michel van Eeten, and Tyler Moore. 2015. Understanding the role of sender reputation in abuse reporting and cleanup. In Proceedings of the 14th Workshop on the Economics of Information Security (WEIS’15). 1--15.Google ScholarGoogle Scholar
  12. Richard Clayton, Tyler Moore, and Nicolas Christin. 2015. Concentrating correctly on cybercrime concentration. In Proceedings of the 14th Annual Workshop on the Economics of Information Security (WEIS’15). 1--16.Google ScholarGoogle Scholar
  13. M. Patrick Collins, Timothy J. Shimeall, Sidney Faber, Jeff Janies, Rhiannon Weaver, Markus De Shon, and Joseph Kadane. 2007. Using uncleanliness to predict future botnet addresses. In Proceedings of the 7th Internet Measurement Conference (IMC’07). ACM, 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jakub Czyz, Michael Kallitsis, Manaf Gharaibeh, Christos Papadopoulos, Michael Bailey, and Manish Karir. 2014. Taming the 800 pound gorilla: The rise and decline of NTP DDoS attacks. In Proceedings of the 14th Internet Measurement Conference (IMC’14). ACM, 435--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Dimitropoulos, D. Krioukov, G. Riley, and K. Claffy. 2006. Revealing the autonomous system taxonomy: The machine learning approach. In Proceedings of the Passive and Active Network Measurement Workshop (PAM’06). 91--100.Google ScholarGoogle Scholar
  16. DNS Database (DNSDB). 2016. Farsight Security. Retrieved from https://www.dnsdb.info.Google ScholarGoogle Scholar
  17. Dutch Hosting Provider Association. 2013. Nederland paradijs voor internet criminelen? Retrieved from https://www.dhpa.nl/nederland-paradijs-voor-internet-criminelen.html.Google ScholarGoogle Scholar
  18. Kathryn Elliott. 2008. Who, what, where, when, and why of WHOIS: Privacy and accuracy concerns of the WHOIS database. SMU Sci. Technol. Law Rev. 12 (2008), 141.Google ScholarGoogle Scholar
  19. Farsight Security. 2016. Security Information Exchange. Retrieved from https://www.farsightsecurity.com.Google ScholarGoogle Scholar
  20. Vaibhav Garg and L. Jean Camp. 2013. Macroeconomic analysis of malware. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’13). The Internet Society, 1--3.Google ScholarGoogle Scholar
  21. Cyscon GmbH. 2016. Cyscon Security - PhishKiller. Retrieved from http://www.cyscon.de.Google ScholarGoogle Scholar
  22. Max Goncharov. 2015. Criminal Hideouts for Lease: Bulletproof Hosting Services. Retrieved from http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/white-papers/wp-criminal-hideouts-for-lease.pdf.Google ScholarGoogle Scholar
  23. Shuang Hao, Nick Feamster, and Ramakant Pandrangi. 2011. Monitoring the initial DNS behavior of malicious domains. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 269--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shuang Hao, Matthew Thomas, Vern Paxson, Nick Feamster, Christian Kreibich, Chris Grier, and Scott Hollenbeck. 2013. Understanding the domain registration behavior of spammers. In Proceedings of 13th Internet Measurement Conference (IMC’13). ACM, 63--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shu He, Gene Moo Lee, John S. Quarterman, Quarterman Creations, and Andrew B. Whinston. 2015. Cybersecurity policies design and evaluation: Evidence from a large-scale randomized field experiment. In Proceedings of the 14th Annual Workshop on the Economics of Information Security (WEIS’15). 1--50.Google ScholarGoogle Scholar
  26. Harald Heinzl and Martina Mittlböck. 2003. Pseudo R-squared measures for poisson regression models with over-or underdispersion. Comput. Stat. Data Anal. 44, 1 (2003), 253--271.Google ScholarGoogle ScholarCross RefCross Ref
  27. HostExploit. 2017. World Hosts Report. Retrieved from http://hostexploit.com.Google ScholarGoogle Scholar
  28. International Telecommunication Union (ITU). 2014. Measuring the Information Society Report 2014. Retrieved from https://www.itu.int/en/ITU-D/Statistics/Documents/publications/mis2014/MIS2014_without_Annex_4.pdf.Google ScholarGoogle Scholar
  29. Aaron Kleiner, Paul Nicholas, and Kevin Sullivan. 2013. Linking cybersecurity policy and performance. Microsoft Trust. Comput. 1, 1 (2013), 1--20.Google ScholarGoogle Scholar
  30. Maria Konte, Roberto Perdisci, and Nick Feamster. 2015. ASwatch: An AS reputation system to expose bulletproof hosting ASes. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM’15). ACM, 625--638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Marc Kührer, Christian Rossow, and Thorsten Holz. 2014. Paint it black: Evaluating the effectiveness of malware blacklists. In Research in Attacks, Intrusions and Defenses. Springer, 1--21.Google ScholarGoogle Scholar
  32. Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Márk Félegyházi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu et al. 2011. Click trajectories: End-to-end analysis of the spam value chain. In Proceedings of the IEEE Symposium on Security and Privacy (SP’11). IEEE, 431--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Frank Li, Grant Ho, Eric Kuan, Yuan Niu, Lucas Ballard, Kurt Thomas, Elie Bursztein, and Vern Paxson. 2016. Remedying web hijacking: Notification effectiveness and webmaster comprehension. In Proceedings of the 25th International Conference on the World Wide Web (WWW’16). 1009--1019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. He Liu, Kirill Levchenko, Márk Félegyházi, Christian Kreibich, Gregor Maier, Geoffrey M. Voelker, and Stefan Savage. 2011. On the effects of registrar-level intervention. In Proceedings of the Conference on Large-scale Exploits and Emergent Threats (LEET’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. 2015. Who is .com? Learning to parse WHOIS records. In Proceedings of the 15th Internet Measurement Conference (IMC’15). ACM, 369--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yang Liu, Armin Sarabi, Jing Zhang, Parinaz Naghizadeh, Manish Karir, Michael Bailey, and Mingyan Liu. 2015. Cloudy with a chance of breach: Forecasting cyber security incidents. In Proceedings of the 24th USENIX Security Symposium (USENIXSecurity’15). 1009--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M3AAWG. 2015. Anti-Abuse Best Common Practices for Hosting and Cloud Service Providers. Retrieved from https://www.m3aawg.org/sites/maawg/files/news/M3AAWG_Hosting_Abuse_BCPs-2015-03.pdf.Google ScholarGoogle Scholar
  38. MaxMind. 2016. IP Geolocation Databases. Retrieved from https://www.maxmind.com.Google ScholarGoogle Scholar
  39. McAfee Intel Security. 2013. Botnet Control Servers Span the Globe. Retrieved from https://blogs.mcafee.com/mcafee-labs/botnet-control-servers-span-the-globe.Google ScholarGoogle Scholar
  40. Leigh Metcalf and Jonathan M. Spring. 2013. Everything You Wanted to Know About Blacklists But Were Afraid to Ask. Technical Report. CERT Network Situational Awareness Group.Google ScholarGoogle Scholar
  41. Martina Mittlböck. 2002. Calculating adjusted R(2) measures for poisson regression models. Comput. Methods Programs Biomed. 68, 3 (2002), 205--214.Google ScholarGoogle ScholarCross RefCross Ref
  42. Nederlandse Omroep Stichting. 2013. Nederland paradijs cybercriminelen. Retrieved from http://nos.nl/artikel/469969-nederland-paradijs-cybercriminelen.html.Google ScholarGoogle Scholar
  43. Nick Nikiforakis, Wouter Joosen, and Martin Johns. 2011. Abusing locality in shared web hosting. In Proceedings of the Fourth European Workshop on System Security. ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Arman Noroozian, Maciej Korczyński, Samaneh Tajalizadehkhoob, and Michel van Eeten. 2015. Developing security reputation metrics for hosting providers. In Proceedings of the 8th USENIX Workshop on Cyber Security Experimentation and Test (CSET’15). 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Andreas Pitsillidis, Chris Kanich, Geoffrey M. Voelker, Kirill Levchenko, and Stefan Savage. 2012. Taster’s choice: A comparative analysis of spam feeds. In Proceedings of the 12th Internet Measurement Conference (IMC’12). ACM, 427--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Anirudh Ramachandran and Nick Feamster. 2006. Understanding the network-level behavior of spammers. ACM SIGCOMM Comput. Commun. Rev. 36, 4 (2006), 291--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Angelo P. E. Rosiello, Engin Kirda, Christopher Kruegel, and Fabrizio Ferrandi. 2007. A layout-similarity-based approach for detecting phishing pages. In Proceedings of the 3rd International SecureComm Conference. IEEE, 454--463.Google ScholarGoogle ScholarCross RefCross Ref
  48. Craig A. Shue, Andrew J. Kalafut, and Minaxi Gupta. 2012. Abnormally malicious autonomous systems and their internet connectivity. IEEE/ACM Trans. Netw. 20, 1 (2012), 220--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Kyle Soska and Nicolas Christin. 2014. Automatically detecting vulnerable websites before they turn malicious. In Proceedings of the 23rd USENIX Security Symposium (USENIXSecurity’14). USENIX, 625--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Brett Stone-Gross, Christopher Kruegel, Kevin Almeroth, Andreas Moser, and Engin Kirda. 2009. Fire: Finding rogue networks. In Proceedings of the Computer Security Applications Conference. IEEE, 231--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S. Tajalizadehkhoob, M. Korczynski, A. Noroozian, C. Ganán, and M. van Eeten. 2016. Apples, oranges and hosting providers: Heterogeneity and security in the hosting market. In Proceedings of the Network Operations and Management Symposium (NOMS’16). IEEE/IFIP, 289--297.Google ScholarGoogle ScholarCross RefCross Ref
  52. M. Vasek, J. Wadleigh, and T. Moore. 2016. Hacking is not random: A case-control study of webserver-compromise risk. IEEE Trans. Depend. Secure Comput. 13, 2 (2016), 206--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Christoph Wagner, Jérôme François, Radu State, Alexandre Dulaunoy, Thomas Engel, and Gilles Massen. 2013. ASMATRA: Ranking ASs providing transit service to malware hosters. In Proceedings of the Conference on Integrated Management (IM’13). IFIP/IEEE, 260--268.Google ScholarGoogle Scholar
  54. Web-Archive. 2016. Internet Archive. Retrieved from http://archive.org/web.Google ScholarGoogle Scholar
  55. Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-scale automatic classification of phishing pages. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’10). The Internet Society, 1--5.Google ScholarGoogle Scholar
  56. WPScan Team. 2016. WordPress Vulnerability Scanner. Retrieved from http://wpscan.org.Google ScholarGoogle Scholar
  57. Jing Zhang, Zakir Durumeric, Michael Bailey, Mingyan Liu, and Manish Karir. 2014. On the mismanagement and maliciousness of networks. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’14). The Internet Society, 1--12.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 18, Issue 4
          Special Issue on Computational Ethics and Accountability, Special Issue on Economics of Security and Privacy and Regular Papers
          November 2018
          348 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/3210373
          • Editor:
          • Munindar P. Singh
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 August 2018
          • Accepted: 1 June 2017
          • Revised: 1 April 2017
          • Received: 1 November 2016
          Published in toit Volume 18, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader