research-article

Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse

Authors:
Samaneh Tajalizadehkhoob

Delft University of Technology, Jaffalaan, the Netherlands

Delft University of Technology, Jaffalaan, the Netherlands
View Profile

,
Rainer Böhme

University of Innsbruck, Innsbruck, Austria

University of Innsbruck, Innsbruck, Austria
View Profile

,
Carlos Gañán

Delft University of Technology, Jaffalaan, the Netherlands

Delft University of Technology, Jaffalaan, the Netherlands
View Profile

,
Maciej Korczyński

Delft University of Technology, Jaffalaan, the Netherlands

Delft University of Technology, Jaffalaan, the Netherlands
View Profile

,
Michel Van Eeten

Delft University of Technology, Jaffalaan, the Netherlands

Delft University of Technology, Jaffalaan, the Netherlands
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 18 Issue 4Article No.: 49pp 1–25https://doi.org/10.1145/3122985

Published:07 August 2018Publication History

ACM Transactions on Internet Technology

Abstract

Internet security and technology policy research regularly uses technical indicators of abuse to identify culprits and to tailor mitigation strategies. As a major obstacle, current inferences from abuse data that aim to characterize providers with poor security practices often use a naive normalization of abuse (abuse counts divided by network size) and do not take into account other inherent or structural properties of providers. Even the size estimates are subject to measurement errors relating to attribution, aggregation, and various sources of heterogeneity. More precise indicators are costly to measure at Internet scale. We address these issues for the case of hosting providers with a statistical model of the abuse data generation process, using phishing sites in hosting networks as a case study. We decompose error sources and then estimate key parameters of the model, controlling for heterogeneity in size and business model. We find that 84% of the variation in abuse counts across 45,358 hosting providers can be explained with structural factors alone. Informed by the fitted model, we systematically select and enrich a subset of 105 homogeneous “statistical twins” with additional explanatory variables, unreasonable to collect for all hosting providers. We find that abuse is positively associated with the popularity of websites hosted and with the prevalence of popular content management systems. Moreover, hosting providers who charge higher prices (after controlling for level differences between countries) witness less abuse. These structural factors together explain a further 77% of the remaining variation. This calls into question premature inferences from raw abuse indicators about the security efforts of actors, and suggests the adoption of similar analysis frameworks in all domains where network measurement aims at informing technology policy.

References

Greg Aaron and Rod Rasmussen. 2015a. Anti-phishing working group (APWG) global phishing survey: Trends and domain name use in 2H2014. Retrieved from http://internetidentity.com/wp-content/uploads/2015/05/APWG_Global_Phishing_Report_2H_2014.pdf.Google Scholar
Greg Aaron and Rod Rasmussen. 2015b. Global phishing survey: Trends and domain name use in 1H2014. Retrieved from http://docs.apwg.org/reports/APWG_Global_Phishing_Report_1H_2014.pdf.Google Scholar
Anti-Phishing Working Group. 2016. Retrieved from http://www.antiphishing.org.Google Scholar
Hadi Asghari, Michael Ciere, and Michel J. G. Van Eeten. 2015a. Post-mortem of a zombie: Conficker cleanup after six years. In Proceedings of the 24th USENIX Security Symposium (USENIXSecurity’15). 1--16. Google ScholarDigital Library
Hadi Asghari, Michel J. G. van Eeten, and Johannes M. Bauer. 2015b. Economics of fighting botnets: Lessons from a decade of mitigation. IEEE Secur. Priv. 5 (2015), 16--23.Google ScholarCross Ref
Leyla Bilge, Engin Kirda, Christopher Kruegel, and Marco Balduzzi. 2011. EXPOSURE: Finding malicious domains using passive DNS analysis. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’11). The Internet Society, 1--17.Google Scholar
Xue Cai, John Heidemann, Balachander Krishnamurthy, and Walter Willinger. 2010. Towards an AS-to-organization map. In Proceedings of the 10th Internet Measurement Conference (IMC’10). ACM, 199--205. Google ScholarDigital Library
A. Colin Cameron and Pravin K. Trivedi. 1990. Regression-based tests for overdispersion in the poisson model. J. Econ. 46, 3 (1990), 347--364.Google ScholarCross Ref
A. Colin Cameron and Pravin K. Trivedi. 2013. Regression Analysis of Count Data, vol. 53. Cambridge University Press.Google Scholar
Davide Canali, Davide Balzarotti, and Aurélien Francillon. 2013. The role of web hosting providers in detecting compromised websites. In Proceedings of the 22nd International Conference on World Wide Web (WWW’13). 177--188. Google ScholarDigital Library
Orcun Cetin, Mohammad Hanif Jhaveri, Carlos Gañán, Michel van Eeten, and Tyler Moore. 2015. Understanding the role of sender reputation in abuse reporting and cleanup. In Proceedings of the 14th Workshop on the Economics of Information Security (WEIS’15). 1--15.Google Scholar
Richard Clayton, Tyler Moore, and Nicolas Christin. 2015. Concentrating correctly on cybercrime concentration. In Proceedings of the 14th Annual Workshop on the Economics of Information Security (WEIS’15). 1--16.Google Scholar
M. Patrick Collins, Timothy J. Shimeall, Sidney Faber, Jeff Janies, Rhiannon Weaver, Markus De Shon, and Joseph Kadane. 2007. Using uncleanliness to predict future botnet addresses. In Proceedings of the 7th Internet Measurement Conference (IMC’07). ACM, 93--104. Google ScholarDigital Library
Jakub Czyz, Michael Kallitsis, Manaf Gharaibeh, Christos Papadopoulos, Michael Bailey, and Manish Karir. 2014. Taming the 800 pound gorilla: The rise and decline of NTP DDoS attacks. In Proceedings of the 14th Internet Measurement Conference (IMC’14). ACM, 435--448. Google ScholarDigital Library
X. Dimitropoulos, D. Krioukov, G. Riley, and K. Claffy. 2006. Revealing the autonomous system taxonomy: The machine learning approach. In Proceedings of the Passive and Active Network Measurement Workshop (PAM’06). 91--100.Google Scholar
DNS Database (DNSDB). 2016. Farsight Security. Retrieved from https://www.dnsdb.info.Google Scholar
Dutch Hosting Provider Association. 2013. Nederland paradijs voor internet criminelen? Retrieved from https://www.dhpa.nl/nederland-paradijs-voor-internet-criminelen.html.Google Scholar
Kathryn Elliott. 2008. Who, what, where, when, and why of WHOIS: Privacy and accuracy concerns of the WHOIS database. SMU Sci. Technol. Law Rev. 12 (2008), 141.Google Scholar
Farsight Security. 2016. Security Information Exchange. Retrieved from https://www.farsightsecurity.com.Google Scholar
Vaibhav Garg and L. Jean Camp. 2013. Macroeconomic analysis of malware. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’13). The Internet Society, 1--3.Google Scholar
Cyscon GmbH. 2016. Cyscon Security - PhishKiller. Retrieved from http://www.cyscon.de.Google Scholar
Max Goncharov. 2015. Criminal Hideouts for Lease: Bulletproof Hosting Services. Retrieved from http://www.trendmicro.com/cloud-content/us/pdfs/security-intelligence/white-papers/wp-criminal-hideouts-for-lease.pdf.Google Scholar
Shuang Hao, Nick Feamster, and Ramakant Pandrangi. 2011. Monitoring the initial DNS behavior of malicious domains. In Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 269--278. Google ScholarDigital Library
Shuang Hao, Matthew Thomas, Vern Paxson, Nick Feamster, Christian Kreibich, Chris Grier, and Scott Hollenbeck. 2013. Understanding the domain registration behavior of spammers. In Proceedings of 13th Internet Measurement Conference (IMC’13). ACM, 63--76. Google ScholarDigital Library
Shu He, Gene Moo Lee, John S. Quarterman, Quarterman Creations, and Andrew B. Whinston. 2015. Cybersecurity policies design and evaluation: Evidence from a large-scale randomized field experiment. In Proceedings of the 14th Annual Workshop on the Economics of Information Security (WEIS’15). 1--50.Google Scholar
Harald Heinzl and Martina Mittlböck. 2003. Pseudo R-squared measures for poisson regression models with over-or underdispersion. Comput. Stat. Data Anal. 44, 1 (2003), 253--271.Google ScholarCross Ref
HostExploit. 2017. World Hosts Report. Retrieved from http://hostexploit.com.Google Scholar
International Telecommunication Union (ITU). 2014. Measuring the Information Society Report 2014. Retrieved from https://www.itu.int/en/ITU-D/Statistics/Documents/publications/mis2014/MIS2014_without_Annex_4.pdf.Google Scholar
Aaron Kleiner, Paul Nicholas, and Kevin Sullivan. 2013. Linking cybersecurity policy and performance. Microsoft Trust. Comput. 1, 1 (2013), 1--20.Google Scholar
Maria Konte, Roberto Perdisci, and Nick Feamster. 2015. ASwatch: An AS reputation system to expose bulletproof hosting ASes. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM’15). ACM, 625--638. Google ScholarDigital Library
Marc Kührer, Christian Rossow, and Thorsten Holz. 2014. Paint it black: Evaluating the effectiveness of malware blacklists. In Research in Attacks, Intrusions and Defenses. Springer, 1--21.Google Scholar
Kirill Levchenko, Andreas Pitsillidis, Neha Chachra, Brandon Enright, Márk Félegyházi, Chris Grier, Tristan Halvorson, Chris Kanich, Christian Kreibich, He Liu et al. 2011. Click trajectories: End-to-end analysis of the spam value chain. In Proceedings of the IEEE Symposium on Security and Privacy (SP’11). IEEE, 431--446. Google ScholarDigital Library
Frank Li, Grant Ho, Eric Kuan, Yuan Niu, Lucas Ballard, Kurt Thomas, Elie Bursztein, and Vern Paxson. 2016. Remedying web hijacking: Notification effectiveness and webmaster comprehension. In Proceedings of the 25th International Conference on the World Wide Web (WWW’16). 1009--1019. Google ScholarDigital Library
He Liu, Kirill Levchenko, Márk Félegyházi, Christian Kreibich, Gregor Maier, Geoffrey M. Voelker, and Stefan Savage. 2011. On the effects of registrar-level intervention. In Proceedings of the Conference on Large-scale Exploits and Emergent Threats (LEET’11). Google ScholarDigital Library
Suqi Liu, Ian Foster, Stefan Savage, Geoffrey M. Voelker, and Lawrence K. Saul. 2015. Who is .com? Learning to parse WHOIS records. In Proceedings of the 15th Internet Measurement Conference (IMC’15). ACM, 369--380. Google ScholarDigital Library
Yang Liu, Armin Sarabi, Jing Zhang, Parinaz Naghizadeh, Manish Karir, Michael Bailey, and Mingyan Liu. 2015. Cloudy with a chance of breach: Forecasting cyber security incidents. In Proceedings of the 24th USENIX Security Symposium (USENIXSecurity’15). 1009--1024. Google ScholarDigital Library
M3AAWG. 2015. Anti-Abuse Best Common Practices for Hosting and Cloud Service Providers. Retrieved from https://www.m3aawg.org/sites/maawg/files/news/M3AAWG_Hosting_Abuse_BCPs-2015-03.pdf.Google Scholar
MaxMind. 2016. IP Geolocation Databases. Retrieved from https://www.maxmind.com.Google Scholar
McAfee Intel Security. 2013. Botnet Control Servers Span the Globe. Retrieved from https://blogs.mcafee.com/mcafee-labs/botnet-control-servers-span-the-globe.Google Scholar
Leigh Metcalf and Jonathan M. Spring. 2013. Everything You Wanted to Know About Blacklists But Were Afraid to Ask. Technical Report. CERT Network Situational Awareness Group.Google Scholar
Martina Mittlböck. 2002. Calculating adjusted R(2) measures for poisson regression models. Comput. Methods Programs Biomed. 68, 3 (2002), 205--214.Google ScholarCross Ref
Nederlandse Omroep Stichting. 2013. Nederland paradijs cybercriminelen. Retrieved from http://nos.nl/artikel/469969-nederland-paradijs-cybercriminelen.html.Google Scholar
Nick Nikiforakis, Wouter Joosen, and Martin Johns. 2011. Abusing locality in shared web hosting. In Proceedings of the Fourth European Workshop on System Security. ACM, 2. Google ScholarDigital Library
Arman Noroozian, Maciej Korczyński, Samaneh Tajalizadehkhoob, and Michel van Eeten. 2015. Developing security reputation metrics for hosting providers. In Proceedings of the 8th USENIX Workshop on Cyber Security Experimentation and Test (CSET’15). 1--8. Google ScholarDigital Library
Andreas Pitsillidis, Chris Kanich, Geoffrey M. Voelker, Kirill Levchenko, and Stefan Savage. 2012. Taster’s choice: A comparative analysis of spam feeds. In Proceedings of the 12th Internet Measurement Conference (IMC’12). ACM, 427--440. Google ScholarDigital Library
Anirudh Ramachandran and Nick Feamster. 2006. Understanding the network-level behavior of spammers. ACM SIGCOMM Comput. Commun. Rev. 36, 4 (2006), 291--302. Google ScholarDigital Library
Angelo P. E. Rosiello, Engin Kirda, Christopher Kruegel, and Fabrizio Ferrandi. 2007. A layout-similarity-based approach for detecting phishing pages. In Proceedings of the 3rd International SecureComm Conference. IEEE, 454--463.Google ScholarCross Ref
Craig A. Shue, Andrew J. Kalafut, and Minaxi Gupta. 2012. Abnormally malicious autonomous systems and their internet connectivity. IEEE/ACM Trans. Netw. 20, 1 (2012), 220--230. Google ScholarDigital Library
Kyle Soska and Nicolas Christin. 2014. Automatically detecting vulnerable websites before they turn malicious. In Proceedings of the 23rd USENIX Security Symposium (USENIXSecurity’14). USENIX, 625--640. Google ScholarDigital Library
Brett Stone-Gross, Christopher Kruegel, Kevin Almeroth, Andreas Moser, and Engin Kirda. 2009. Fire: Finding rogue networks. In Proceedings of the Computer Security Applications Conference. IEEE, 231--240. Google ScholarDigital Library
S. Tajalizadehkhoob, M. Korczynski, A. Noroozian, C. Ganán, and M. van Eeten. 2016. Apples, oranges and hosting providers: Heterogeneity and security in the hosting market. In Proceedings of the Network Operations and Management Symposium (NOMS’16). IEEE/IFIP, 289--297.Google ScholarCross Ref
M. Vasek, J. Wadleigh, and T. Moore. 2016. Hacking is not random: A case-control study of webserver-compromise risk. IEEE Trans. Depend. Secure Comput. 13, 2 (2016), 206--219. Google ScholarDigital Library
Christoph Wagner, Jérôme François, Radu State, Alexandre Dulaunoy, Thomas Engel, and Gilles Massen. 2013. ASMATRA: Ranking ASs providing transit service to malware hosters. In Proceedings of the Conference on Integrated Management (IM’13). IFIP/IEEE, 260--268.Google Scholar
Web-Archive. 2016. Internet Archive. Retrieved from http://archive.org/web.Google Scholar
Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-scale automatic classification of phishing pages. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’10). The Internet Society, 1--5.Google Scholar
WPScan Team. 2016. WordPress Vulnerability Scanner. Retrieved from http://wpscan.org.Google Scholar
Jing Zhang, Zakir Durumeric, Michael Bailey, Mingyan Liu, and Manish Karir. 2014. On the mismanagement and maliciousness of networks. In Proceedings of the Network 8 Distributed System Security Symposium (NDSS’14). The Internet Society, 1--12.Google ScholarCross Ref

Index Terms

Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse
1. General and reference
  1. Cross-computing tools and techniques
    1. Metrics
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Economics of security and privacy
  2. Systems security

Recommendations

Image-based sexual abuse: The extent, nature, and predictors of perpetration in a community sample of Australian residents
Abstract
Image-based sexual abuse (IBSA) involves three key behaviors: the non-consensual taking or creation of nude or sexual images; the non-consensual sharing or distribution of nude or sexual images; and threats made to distribute nude or ...
Highlights
- 1 in 10 (11.1%) of Australians aged 16–49 years surveyed, engaged in Image-Based Sexual Abuse (IBSA) perpetration.
Read More
Cyber dating abuse in adolescents: Myths of romantic love, sexting practices and bullying
Abstract
Cyber dating abuse (CDA) is a growing problem with serious consequences for adolescents, hence the importance of understanding its relationship to other variables for developing more effective prevention strategies. The current study aimed first ...
Highlights
- Boys show more bullying, sexting, and myths of romantic love than girls.
- Results support the direct effect of sexting engagement on cyber dating abuse.
- Myths of romantic love have mediational effects on cyber dating abuse.
- ...
Read More
Measuring Child Maltreatment in Community-Based Trials
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Internet Technology Volume 18, Issue 4
Special Issue on Computational Ethics and Accountability, Special Issue on Economics of Security and Privacy and Regular Papers
November 2018
348 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3210373
Editor:
Munindar P. Singh
Department of Computer Science, North Carolina State University
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 August 2018
- Accepted: 1 June 2017
- Revised: 1 April 2017
- Received: 1 November 2016
Published in toit Volume 18, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Statistical modeling
abuse concentrations
hosting providers
measurement errors
web security
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 250
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring Abuse

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Image-based sexual abuse: The extent, nature, and predictors of perpetration in a community sample of Australian residents

Cyber dating abuse in adolescents: Myths of romantic love, sexting practices and bullying

Measuring Child Maltreatment in Community-Based Trials