ABSTRACT
A tracking flow is a flow between an end user and a Web tracking service. We develop an extensive measurement methodology for quantifying at scale the amount of tracking flows that cross data protection borders, be it national or international, such as the EU28 border within which the General Data Protection Regulation (GDPR) applies. Our methodology uses a browser extension to fully render advertising and tracking code, various lists and heuristics to extract well known trackers, passive DNS replication to get all the IP ranges of trackers, and state-of-the art geolocation. We employ our methodology on a dataset from 350 real users of the browser extension over a period of more than four months, and then generalize our results by analyzing billions of web tracking flows from more than 60 million broadband and mobile users from 4 large European ISPs. We show that the majority of tracking flows cross national borders in Europe but, unlike popular belief, are pretty well confined within the larger GDPR jurisdiction. Simple DNS redirection and PoP mirroring can increase national confinement while sealing almost all tracking flows within Europe. Last, we show that cross boarder tracking is prevalent even in sensitive and hence protected data categories and groups including health, sexual orientation, minors, and others.
- AdBlock Plus - Surf the web without annoying ads! https://adblockplus.org/.Google Scholar
- Amazon - AWS IP Address Ranges in JSON format. https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html.Google Scholar
- Children's Online Privacy Protection Act (COPPA). https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule.Google Scholar
- CrowdFlower. https://www.crowdflower.com/.Google Scholar
- Data protection in the EU, The General Data Protection Regulation (GDPR); Regulation (EU) 2016/679. https://ec.europa.eu/info/law/law-topic/data-protection/.Google Scholar
- Digital Ocean network. https://status.digitalocean.com/.Google Scholar
- Easylist - The primary filter list that removes most adverts from international webpages. https://easylist.to/.Google Scholar
- Equinix: Global Data Centers and Colocation Services. https://www.equinix.com/locations/.Google Scholar
- Ghostery - Makes the Web Cleaner, Faster and Safer! https://www.ghostery.com/.Google Scholar
- Google AdWords. https://adwords.google.com/.Google Scholar
- Google Cloud Locations. https://cloud.google.com/about/locations/.Google Scholar
- Google: Our Infrastructure. htttps://peering.google.com/#/infrastructure.Google Scholar
- Google: Real-Time Bidding Protocol. https://developers.google.com/ad-exchange/rtb/start.Google Scholar
- Interactive Advertising Bureau: OpenRTB (Real-Time Bidding). https://www.iab.com/guidelines/real-time-bidding-rtb-project/.Google Scholar
- IP-API - Free Geolocation API. http://ip-api.com/.Google Scholar
- MaxMind: IP Geolocation and Online Fraud Prevention. https://www.maxmind.com.Google Scholar
- Microsoft Azure Datacenter IP Ranges. https://www.microsoft.com/en-us/download/details.aspx?id=41653.Google Scholar
- ORACLE: Data Regions for Platform and Infrastructure Services. https://cloud.oracle.com/data-regions.Google Scholar
- Rackspace Global Infrastructure. https://www.rackspace.com/about/datacenters.Google Scholar
- RIPE Atlas. https://atlas.ripe.net/.Google Scholar
- RIPE NCC OpenIPmap: Geolocating Internet Infrastructure with Inference Engines and Crowdsourcing. https://ipmap.ripe.net/.Google Scholar
- Robtex - Everything you need to know about domains, DNS, IP, Routes, Autonomous Systems, and much, much more! https://www.robtex.com/.Google Scholar
- The Cloudflare Global Anycast Network. https://www.cloudflare.com/network/.Google Scholar
- The IBM Cloud network. https://www.ibm.com/cloud-computing/bluemix/ournetwork.Google Scholar
- OpenRTB API Specification Version 2.3.1. https://www.iab.com/wp-content/uploads/2015/05/OpenRTB_API_Specification_Version_2_3_1.pdf, 2015.Google Scholar
- Internet Advertising Bureau: Advertising Revenue Report. https://www.iab.com/insights/iab-internet-advertising-revenue-report, 2018.Google Scholar
- G. Acar, M.Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens, and B. Preneel. FPDetective: Dusting the Web for Fingerprinters. In ACM CCS, 2013. Google ScholarDigital Library
- R. Balebako, P. L. G. De León, R. Shay, B. Ur, Y. Wang, and L. F. Cranor. Measuring the Effectiveness of Privacy Tools for Limiting Behavioral Advertising. In W2SP Workshop, 2012.Google Scholar
- P. Bangera and S. Gorinsky. Ads versus Regular Contents: Dissecting the Web Hosting Ecosystem. In IFIP Networking, 2017.Google ScholarCross Ref
- M. A. Bashir, S. Arshad, E. Kirda, W. Robertson, and C. Wilson. How Tracking Companies Circumvent Ad Blockers Using WebSockets. In Workshop on Technology and Consumer Protection, 2018.Google Scholar
- M. Calder, X. Fan, Z. Hu, E. Katz-Bassett, J. Heidemann, and R. Govindan. Mapping the Expansion of Google's Serving Infrastructure. In ACM IMC, 2013. Google ScholarDigital Library
- B. Claise. Cisco Systems NetFlow Services Export Version 9, October 2004. IETF RFC 3954.Google Scholar
- P. Ferguson and D. Senie. Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing, May 2000. IETF RFC 2827. Google ScholarDigital Library
- M. J. Freedman, M. Vutukuru, N. Feamster, and H. Balakrishnan. Geographic Locality of IP Prefixes. In ACM IMC, 2005. Google ScholarDigital Library
- N. Fruchter, H. Miao, S. Stevenson, and R. Balebako. Variations in Tracking in Relation to Geographic Location. CoRR, 2015.Google Scholar
- A. Gervais, A. Filios, V. Lenders, and S. Capkun. Quantifying Web Adblocker Privacy. 2017.Google Scholar
- M. Gharaibeh, A. Shah, B. Huffaker, H. Zhang, R. Ensafi, and C. Papadopoulos. A Look at Router Geolocation in Public and Commercial Databases. In ACM IMC, 2017. Google ScholarDigital Library
- I. Reyes and P. Wijesekera and A. Razaghpanah and J. Reardon, N. Vallina-Rodriguez and S. Egelman and C. Kreibich. Is Our Children's Apps Learning? Automatically Detecting COPPA Violations. In Workshop on Technology and Consumer Protection (ConPro), 2017.Google Scholar
- E. Katz-Bassett, J. P. John, A. Krishnamurthy, D. Wetherall, T. Anderson, and Y. Chawathe. Towards IP geolocation using delay and topology measurements. In ACM IMC, 2006. Google ScholarDigital Library
- A. Langley, A. Riddoch, A. Wilk, A. Vicente, C. Krasic, D. Zhang, F. Yang, F. Kouranov, I. Swett, J. Iyengar, J. Bailey, J. Dorfman, J. Roskind, J. Kulik, P. Westin, R. Tenneti, R. Shade, R. Hamilton, V. Vasiliev, W-T. Chang, and Z. Shi. The QUIC Transport Protocol: Design and Internet-Scale Deployment. In ACM SIGCOMM, 2017. Google ScholarDigital Library
- A. Lerner, A. Kornfeld Simpson, T. Kohno, and F. Roesner. Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016. In USENIX Security Symposium, 2016. Google ScholarDigital Library
- C. Leung, J. Ren, D. Choffnes, and C. Wilson. Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services. In ACM IMC, 2016. Google ScholarDigital Library
- M. Falahrastegar and H. Haddadi and S. Uhlig and R. Mortier. The Rise of Panopticons: Examining Region-Specific Third-Party Web Tracking. In TMA, 2014.Google ScholarCross Ref
- M. Falahrastegar and H. Haddadi and S. Uhlig and R. Mortier. Tracking Personal Identifiers Across the Web. In PAM, 2016.Google ScholarCross Ref
- J. R. Mayer and J. C. Mitchell. Third-party Web Tracking: Policy and Technology. In IEEE Symposium on Security and Privacy, 2012. Google ScholarDigital Library
- J. S. Otto, M. A. Sanchez, J. P. Rula, and F. E. Bustamante. Content delivery and the natural evolution of DNS - Remote DNS Trends, Performance Issues and Alternative Solutions. In ACM IMC, 2012. Google ScholarDigital Library
- P. Papadopoulos, N. Kourtellis, and E. P. Markatos. Exclusive: How the (synced) Cookie Monster breached my encrypted VPN session. In European Workshop on Systems Security, 2018. Google ScholarDigital Library
- P. Papadopoulos, P. Rodriguez, N. Kourtellis, and N. Laoutaris. If you are not paying for it, you are the product: how much do advertisers pay to reach you? In ACM IMC, 2017. Google ScholarDigital Library
- J. Parra-Arnau, J. P. Achara, and C. Castelluccia. MyAdChoices: Bringing Transparency and Control to Online Advertising. TWEB, 2017. Google ScholarDigital Library
- I. Poese, S. Uhlig, M. A. Kaafar, B. Donnet, and B. Gueye. IP Geolocation Databases: Unreliable? ACM CCR, 41(2), 2011. Google ScholarDigital Library
- E. Pujol, O. Hohlfeld, and A. Feldmann. Annoyed Users: Ads and Ad-Block Usage in the Wild. In ACM IMC, 2015. Google ScholarDigital Library
- A. Razaghpanah, R. Nithyanand, N. Vallina-Rodriguez, S. Sundaresan, M. Allman, C. Kreibich, and P. Gill. Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem. In NDSS, 2018.Google ScholarCross Ref
- B. Reuben, L. Ulrik, M. Van Kleek, J. Zhao, T. Libert, and N. Shadbolt. Third Party Tracking in the Mobile Ecosystem. CoRR, 2018.Google Scholar
- J. Ruth, I. Poese, C. Dietzel, and O. Hohlfeld. A First Look at QUIC in the Wild. In PAM, 2018.Google ScholarCross Ref
- InMon - sFlow. http://sflow.org/.Google Scholar
- S. S. Siwpersad, B. Gueye, and S. Uhlig. Assessing the geographic resolution of exhaustive tabulation for geolocating Internet hosts. In PAM, 2008. Google ScholarDigital Library
- O. Starov, P. Gill, and N. Nikiforakis. Are You Sure You Want to Contact Us? Quantifying the Leakage of PII via Website Contact Forms. PoPETs, 2016.Google ScholarCross Ref
- E. Steven and A. Narayanan. Online Tracking: A 1-million-site Measurement and Analysis. In ACM CCS, 2016.Google Scholar
- F. Streibelt, J. Boettger, N. Chatzis, G. Smaragdakis, and A. Feldmann. Exploring EDNS-Client-Subnet Adopters in your Free Time. In ACM IMC, 2013.Google ScholarDigital Library
- N. Vallina-Rodriguez, J. Shah, A. Finamore, Y. Grunenberger, K. Papagiannaki, H. Haddadi, and J. Crowcroft. Breaking for Commercials: Characterizing Mobile Advertising. In ACM IMC, 2012. Google ScholarDigital Library
- R.J. Walls, E. D. Kilmer, N. Lageman, and P. D. McDaniel. Measuring the Impact and Perception of Acceptable Advertisements. In ACM IMC, 2015. Google ScholarDigital Library
- J. Wang, W. Zhang, and S. Yuan. Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting. Foundations and Trends in Information Retrieval, 11, Oct 2016. Google ScholarDigital Library
- F. Weimer. Passive DNS Replication. In 17th Annual FIRST Conference, 2005.Google Scholar
Index Terms
- Tracing Cross Border Web Tracking
Recommendations
Cross-Border Transaction Liability
Checkpoints in Cyberspace: Best Practices to Avert Liability in Cross-Border Transactions is an intensely serious look at the US trade and export regulations that govern cross-border transactions.
Illumination independent marker tracking using cross-ratio invariance
VRST '14: Proceedings of the 20th ACM Symposium on Virtual Reality Software and TechnologyMarker tracking is used in numerous applications. Depending on the context and its constraints, tracking accuracy can be a crucial component of the application. In this paper, we firstly highlight that the tracking accuracy depends on the illumination, ...
Comments