skip to main content
10.1145/3278532.3278561acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article
Distinguished Paper

Tracing Cross Border Web Tracking

Published:31 October 2018Publication History

ABSTRACT

A tracking flow is a flow between an end user and a Web tracking service. We develop an extensive measurement methodology for quantifying at scale the amount of tracking flows that cross data protection borders, be it national or international, such as the EU28 border within which the General Data Protection Regulation (GDPR) applies. Our methodology uses a browser extension to fully render advertising and tracking code, various lists and heuristics to extract well known trackers, passive DNS replication to get all the IP ranges of trackers, and state-of-the art geolocation. We employ our methodology on a dataset from 350 real users of the browser extension over a period of more than four months, and then generalize our results by analyzing billions of web tracking flows from more than 60 million broadband and mobile users from 4 large European ISPs. We show that the majority of tracking flows cross national borders in Europe but, unlike popular belief, are pretty well confined within the larger GDPR jurisdiction. Simple DNS redirection and PoP mirroring can increase national confinement while sealing almost all tracking flows within Europe. Last, we show that cross boarder tracking is prevalent even in sensitive and hence protected data categories and groups including health, sexual orientation, minors, and others.

References

  1. AdBlock Plus - Surf the web without annoying ads! https://adblockplus.org/.Google ScholarGoogle Scholar
  2. Amazon - AWS IP Address Ranges in JSON format. https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html.Google ScholarGoogle Scholar
  3. Children's Online Privacy Protection Act (COPPA). https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule.Google ScholarGoogle Scholar
  4. CrowdFlower. https://www.crowdflower.com/.Google ScholarGoogle Scholar
  5. Data protection in the EU, The General Data Protection Regulation (GDPR); Regulation (EU) 2016/679. https://ec.europa.eu/info/law/law-topic/data-protection/.Google ScholarGoogle Scholar
  6. Digital Ocean network. https://status.digitalocean.com/.Google ScholarGoogle Scholar
  7. Easylist - The primary filter list that removes most adverts from international webpages. https://easylist.to/.Google ScholarGoogle Scholar
  8. Equinix: Global Data Centers and Colocation Services. https://www.equinix.com/locations/.Google ScholarGoogle Scholar
  9. Ghostery - Makes the Web Cleaner, Faster and Safer! https://www.ghostery.com/.Google ScholarGoogle Scholar
  10. Google AdWords. https://adwords.google.com/.Google ScholarGoogle Scholar
  11. Google Cloud Locations. https://cloud.google.com/about/locations/.Google ScholarGoogle Scholar
  12. Google: Our Infrastructure. htttps://peering.google.com/#/infrastructure.Google ScholarGoogle Scholar
  13. Google: Real-Time Bidding Protocol. https://developers.google.com/ad-exchange/rtb/start.Google ScholarGoogle Scholar
  14. Interactive Advertising Bureau: OpenRTB (Real-Time Bidding). https://www.iab.com/guidelines/real-time-bidding-rtb-project/.Google ScholarGoogle Scholar
  15. IP-API - Free Geolocation API. http://ip-api.com/.Google ScholarGoogle Scholar
  16. MaxMind: IP Geolocation and Online Fraud Prevention. https://www.maxmind.com.Google ScholarGoogle Scholar
  17. Microsoft Azure Datacenter IP Ranges. https://www.microsoft.com/en-us/download/details.aspx?id=41653.Google ScholarGoogle Scholar
  18. ORACLE: Data Regions for Platform and Infrastructure Services. https://cloud.oracle.com/data-regions.Google ScholarGoogle Scholar
  19. Rackspace Global Infrastructure. https://www.rackspace.com/about/datacenters.Google ScholarGoogle Scholar
  20. RIPE Atlas. https://atlas.ripe.net/.Google ScholarGoogle Scholar
  21. RIPE NCC OpenIPmap: Geolocating Internet Infrastructure with Inference Engines and Crowdsourcing. https://ipmap.ripe.net/.Google ScholarGoogle Scholar
  22. Robtex - Everything you need to know about domains, DNS, IP, Routes, Autonomous Systems, and much, much more! https://www.robtex.com/.Google ScholarGoogle Scholar
  23. The Cloudflare Global Anycast Network. https://www.cloudflare.com/network/.Google ScholarGoogle Scholar
  24. The IBM Cloud network. https://www.ibm.com/cloud-computing/bluemix/ournetwork.Google ScholarGoogle Scholar
  25. OpenRTB API Specification Version 2.3.1. https://www.iab.com/wp-content/uploads/2015/05/OpenRTB_API_Specification_Version_2_3_1.pdf, 2015.Google ScholarGoogle Scholar
  26. Internet Advertising Bureau: Advertising Revenue Report. https://www.iab.com/insights/iab-internet-advertising-revenue-report, 2018.Google ScholarGoogle Scholar
  27. G. Acar, M.Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens, and B. Preneel. FPDetective: Dusting the Web for Fingerprinters. In ACM CCS, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Balebako, P. L. G. De León, R. Shay, B. Ur, Y. Wang, and L. F. Cranor. Measuring the Effectiveness of Privacy Tools for Limiting Behavioral Advertising. In W2SP Workshop, 2012.Google ScholarGoogle Scholar
  29. P. Bangera and S. Gorinsky. Ads versus Regular Contents: Dissecting the Web Hosting Ecosystem. In IFIP Networking, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  30. M. A. Bashir, S. Arshad, E. Kirda, W. Robertson, and C. Wilson. How Tracking Companies Circumvent Ad Blockers Using WebSockets. In Workshop on Technology and Consumer Protection, 2018.Google ScholarGoogle Scholar
  31. M. Calder, X. Fan, Z. Hu, E. Katz-Bassett, J. Heidemann, and R. Govindan. Mapping the Expansion of Google's Serving Infrastructure. In ACM IMC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Claise. Cisco Systems NetFlow Services Export Version 9, October 2004. IETF RFC 3954.Google ScholarGoogle Scholar
  33. P. Ferguson and D. Senie. Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing, May 2000. IETF RFC 2827. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. J. Freedman, M. Vutukuru, N. Feamster, and H. Balakrishnan. Geographic Locality of IP Prefixes. In ACM IMC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. N. Fruchter, H. Miao, S. Stevenson, and R. Balebako. Variations in Tracking in Relation to Geographic Location. CoRR, 2015.Google ScholarGoogle Scholar
  36. A. Gervais, A. Filios, V. Lenders, and S. Capkun. Quantifying Web Adblocker Privacy. 2017.Google ScholarGoogle Scholar
  37. M. Gharaibeh, A. Shah, B. Huffaker, H. Zhang, R. Ensafi, and C. Papadopoulos. A Look at Router Geolocation in Public and Commercial Databases. In ACM IMC, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. I. Reyes and P. Wijesekera and A. Razaghpanah and J. Reardon, N. Vallina-Rodriguez and S. Egelman and C. Kreibich. Is Our Children's Apps Learning? Automatically Detecting COPPA Violations. In Workshop on Technology and Consumer Protection (ConPro), 2017.Google ScholarGoogle Scholar
  39. E. Katz-Bassett, J. P. John, A. Krishnamurthy, D. Wetherall, T. Anderson, and Y. Chawathe. Towards IP geolocation using delay and topology measurements. In ACM IMC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Langley, A. Riddoch, A. Wilk, A. Vicente, C. Krasic, D. Zhang, F. Yang, F. Kouranov, I. Swett, J. Iyengar, J. Bailey, J. Dorfman, J. Roskind, J. Kulik, P. Westin, R. Tenneti, R. Shade, R. Hamilton, V. Vasiliev, W-T. Chang, and Z. Shi. The QUIC Transport Protocol: Design and Internet-Scale Deployment. In ACM SIGCOMM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Lerner, A. Kornfeld Simpson, T. Kohno, and F. Roesner. Internet Jones and the Raiders of the Lost Trackers: An Archaeological Study of Web Tracking from 1996 to 2016. In USENIX Security Symposium, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Leung, J. Ren, D. Choffnes, and C. Wilson. Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services. In ACM IMC, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Falahrastegar and H. Haddadi and S. Uhlig and R. Mortier. The Rise of Panopticons: Examining Region-Specific Third-Party Web Tracking. In TMA, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  44. M. Falahrastegar and H. Haddadi and S. Uhlig and R. Mortier. Tracking Personal Identifiers Across the Web. In PAM, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  45. J. R. Mayer and J. C. Mitchell. Third-party Web Tracking: Policy and Technology. In IEEE Symposium on Security and Privacy, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. S. Otto, M. A. Sanchez, J. P. Rula, and F. E. Bustamante. Content delivery and the natural evolution of DNS - Remote DNS Trends, Performance Issues and Alternative Solutions. In ACM IMC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Papadopoulos, N. Kourtellis, and E. P. Markatos. Exclusive: How the (synced) Cookie Monster breached my encrypted VPN session. In European Workshop on Systems Security, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. P. Papadopoulos, P. Rodriguez, N. Kourtellis, and N. Laoutaris. If you are not paying for it, you are the product: how much do advertisers pay to reach you? In ACM IMC, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Parra-Arnau, J. P. Achara, and C. Castelluccia. MyAdChoices: Bringing Transparency and Control to Online Advertising. TWEB, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. I. Poese, S. Uhlig, M. A. Kaafar, B. Donnet, and B. Gueye. IP Geolocation Databases: Unreliable? ACM CCR, 41(2), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. E. Pujol, O. Hohlfeld, and A. Feldmann. Annoyed Users: Ads and Ad-Block Usage in the Wild. In ACM IMC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. Razaghpanah, R. Nithyanand, N. Vallina-Rodriguez, S. Sundaresan, M. Allman, C. Kreibich, and P. Gill. Apps, Trackers, Privacy, and Regulators: A Global Study of the Mobile Tracking Ecosystem. In NDSS, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  53. B. Reuben, L. Ulrik, M. Van Kleek, J. Zhao, T. Libert, and N. Shadbolt. Third Party Tracking in the Mobile Ecosystem. CoRR, 2018.Google ScholarGoogle Scholar
  54. J. Ruth, I. Poese, C. Dietzel, and O. Hohlfeld. A First Look at QUIC in the Wild. In PAM, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  55. InMon - sFlow. http://sflow.org/.Google ScholarGoogle Scholar
  56. S. S. Siwpersad, B. Gueye, and S. Uhlig. Assessing the geographic resolution of exhaustive tabulation for geolocating Internet hosts. In PAM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. O. Starov, P. Gill, and N. Nikiforakis. Are You Sure You Want to Contact Us? Quantifying the Leakage of PII via Website Contact Forms. PoPETs, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  58. E. Steven and A. Narayanan. Online Tracking: A 1-million-site Measurement and Analysis. In ACM CCS, 2016.Google ScholarGoogle Scholar
  59. F. Streibelt, J. Boettger, N. Chatzis, G. Smaragdakis, and A. Feldmann. Exploring EDNS-Client-Subnet Adopters in your Free Time. In ACM IMC, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. N. Vallina-Rodriguez, J. Shah, A. Finamore, Y. Grunenberger, K. Papagiannaki, H. Haddadi, and J. Crowcroft. Breaking for Commercials: Characterizing Mobile Advertising. In ACM IMC, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. R.J. Walls, E. D. Kilmer, N. Lageman, and P. D. McDaniel. Measuring the Impact and Perception of Acceptable Advertisements. In ACM IMC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J. Wang, W. Zhang, and S. Yuan. Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting. Foundations and Trends in Information Retrieval, 11, Oct 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. F. Weimer. Passive DNS Replication. In 17th Annual FIRST Conference, 2005.Google ScholarGoogle Scholar

Index Terms

  1. Tracing Cross Border Web Tracking

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              IMC '18: Proceedings of the Internet Measurement Conference 2018
              October 2018
              507 pages
              ISBN:9781450356190
              DOI:10.1145/3278532

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 31 October 2018

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed limited

              Acceptance Rates

              Overall Acceptance Rate277of1,083submissions,26%

              Upcoming Conference

              IMC '24
              ACM Internet Measurement Conference
              November 4 - 6, 2024
              Madrid , AA , Spain

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader