skip to main content
10.1145/2810103.2813614acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

Published:12 October 2015Publication History

ABSTRACT

We present Sunlight, a system that detects the causes of targeting phenomena on the web -- such as personalized advertisements, recommendations, or content -- at large scale and with solid statistical confidence. Today's web is growing increasingly complex and impenetrable as myriad of services collect, analyze, use, and exchange users' personal information. No one can tell who has what data, for what purposes they are using it, and how those uses affect the users. The few studies that exist reveal problematic effects -- such as discriminatory pricing and advertising -- but they are either too small-scale to generalize or lack formal assessments of confidence in the results, making them difficult to trust or interpret. Sunlight brings a principled and scalable methodology to personal data measurements by adapting well-established methods from statistics for the specific problem of targeting detection. Our methodology formally separates different operations into four key phases: scalable hypothesis generation, interpretable hypothesis formation, statistical significance testing, and multiple testing correction. Each phase bears instantiations from multiple mechanisms from statistics, each making different assumptions and tradeoffs. Sunlight offers a modular design that allows exploration of this vast design space. We explore a portion of this space, thoroughly evaluating the tradeoffs both analytically and experimentally. Our exploration reveals subtle tensions between scalability and confidence. Sunlight's default functioning strikes a balance to provide the first system that can diagnose targeting at fine granularity, at scale, and with solid statistical justification of its results.

We showcase our system by running two measurement studies of targeting on the web, both the largest of their kind. Our studies -- about ad targeting in Gmail and on the web -- reveal statistically justifiable evidence that contradicts two Google statements regarding the lack of targeting on sensitive and prohibited topics.

References

  1. AdBlockPlus.small https://adblockplus.org/, 2015.Google ScholarGoogle Scholar
  2. Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., and Muthukrishnan, S. Adscape: Harvesting and Analyzing Online Display Ads. WWW (Apr. 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Benjamini, Y., and Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Annals of statistics (2001), 1165--1188.Google ScholarGoogle Scholar
  4. Bickel, P. J., Ritov, Y., and Tsybakov, A. B. Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37, 4 (08 2009), 1705--1732.Google ScholarGoogle ScholarCross RefCross Ref
  5. Bodik, P., Goldszmidt, M., Fox, A., Woodard, D. B., and Andersen, H. Fingerprinting the datacenter: Automated classification of performance crises. In European Conference on Computer Systems (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Book, T., and Wallach, D. S. An Empirical Study of Mobile Ad Targeting. arXiv.org (2015).Google ScholarGoogle Scholar
  7. Brandeis, L. What Publicity Can Do. Harper's Weekly (Dec. 1913).Google ScholarGoogle Scholar
  8. Datta, A., Tschantz, M. C., and Datta, A. Automated Experiments on Ad Privacy Settings. In Proceedings of Privacy Enhancing Technologies (2015).Google ScholarGoogle ScholarCross RefCross Ref
  9. Donoho, D. L. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289--1306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dudoit, S., and van der Laan, M. Multiple testing procedures with applications to genomics. Springer, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  11. Feldman, V. Optimal hardness results for maximizing agreement with monomials. SIAM Journal on Computing 39, 2 (2009), 606--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Google. AdSense policy.small https://support.google.com/adsense/answer/3016459?hl=en, 2015.Google ScholarGoogle Scholar
  13. Google. AdWords policy.small https://support.google.com/adwordspolicy/answer/6008942?hl=en, 2015.Google ScholarGoogle Scholar
  14. Gretton, A., Bousquet, O., Smola, A., , and Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory (2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hannak, A., Sapiezynski, P., Kakhki, A. M., Krishnamurthy, B., Lazer, D., Mislove, A., and Wilson, C. Measuring personalization of web search. In WWW (May 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hannak, A., Soeller, G., Lazer, D., Mislove, A., and Wilson, C. Measuring Price Discrimination and Steering on E-commerce Web Sites. In IMC (Nov. 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Holm, S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65--70.Google ScholarGoogle Scholar
  18. Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., and Geambasu, R. XRay: Enhancing the Web's Transparency with Differential Correlation. 23rd USENIX Security Symposium (USENIX Security 14) (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Liu, B., Sheth, A., Weinsberg, U., Chandrashekar, J., and Govindan, R. AdReveal: improving transparency into online targeted advertising. In HotNets-XII (Nov. 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Detecting price and search discrimination on the internet. In HotNets-XI: Proceedings of the 11th ACM Workshop on Hot Topics in Networks (Oct. 2012), ACM Request Permissions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Crowd-assisted Search for Price Discrimination in E-Commerce: First results. arXiv.org (July 2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Nath, S. MAdScope: Characterizing Mobile In-App Targeted Ads. Proceedings of ACM Mobisys (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ng, A. Y. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-first International Conference on Machine Learning (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rubin, D. B. Estimating the causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66 (1974), 688--701.Google ScholarGoogle ScholarCross RefCross Ref
  25. Selenium.small http://www.seleniumhq.org/, 2015.Google ScholarGoogle Scholar
  26. Tibshirani, R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58 (1994), 267--288.Google ScholarGoogle Scholar
  27. Vissers, T., Nikiforakis, N., Bielova, N., and Joosen, W. Crying Wolf? On the Price Discrimination of Online Airline Tickets. Hot Topics in Privacy Enhancing Technologies (June 2014), 1--12.Google ScholarGoogle Scholar
  28. Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 6 (2009), 714--721. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Xing, X., Meng, W., Doozan, D., Feamster, N., Lee, W., and Snoeren, A. C. Exposing Inconsistent Web Search Results with Bobble. In PAM '14: Proceedings of the Passive and Active Measurements Conference (2014). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
              October 2015
              1750 pages
              ISBN:9781450338325
              DOI:10.1145/2810103

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 12 October 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              CCS '15 Paper Acceptance Rate128of660submissions,19%Overall Acceptance Rate1,261of6,999submissions,18%

              Upcoming Conference

              CCS '24
              ACM SIGSAC Conference on Computer and Communications Security
              October 14 - 18, 2024
              Salt Lake City , UT , USA

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader