ABSTRACT
We present Sunlight, a system that detects the causes of targeting phenomena on the web -- such as personalized advertisements, recommendations, or content -- at large scale and with solid statistical confidence. Today's web is growing increasingly complex and impenetrable as myriad of services collect, analyze, use, and exchange users' personal information. No one can tell who has what data, for what purposes they are using it, and how those uses affect the users. The few studies that exist reveal problematic effects -- such as discriminatory pricing and advertising -- but they are either too small-scale to generalize or lack formal assessments of confidence in the results, making them difficult to trust or interpret. Sunlight brings a principled and scalable methodology to personal data measurements by adapting well-established methods from statistics for the specific problem of targeting detection. Our methodology formally separates different operations into four key phases: scalable hypothesis generation, interpretable hypothesis formation, statistical significance testing, and multiple testing correction. Each phase bears instantiations from multiple mechanisms from statistics, each making different assumptions and tradeoffs. Sunlight offers a modular design that allows exploration of this vast design space. We explore a portion of this space, thoroughly evaluating the tradeoffs both analytically and experimentally. Our exploration reveals subtle tensions between scalability and confidence. Sunlight's default functioning strikes a balance to provide the first system that can diagnose targeting at fine granularity, at scale, and with solid statistical justification of its results.
We showcase our system by running two measurement studies of targeting on the web, both the largest of their kind. Our studies -- about ad targeting in Gmail and on the web -- reveal statistically justifiable evidence that contradicts two Google statements regarding the lack of targeting on sensitive and prohibited topics.
- AdBlockPlus.small https://adblockplus.org/, 2015.Google Scholar
- Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., and Muthukrishnan, S. Adscape: Harvesting and Analyzing Online Display Ads. WWW (Apr. 2014). Google ScholarDigital Library
- Benjamini, Y., and Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Annals of statistics (2001), 1165--1188.Google Scholar
- Bickel, P. J., Ritov, Y., and Tsybakov, A. B. Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37, 4 (08 2009), 1705--1732.Google ScholarCross Ref
- Bodik, P., Goldszmidt, M., Fox, A., Woodard, D. B., and Andersen, H. Fingerprinting the datacenter: Automated classification of performance crises. In European Conference on Computer Systems (2010). Google ScholarDigital Library
- Book, T., and Wallach, D. S. An Empirical Study of Mobile Ad Targeting. arXiv.org (2015).Google Scholar
- Brandeis, L. What Publicity Can Do. Harper's Weekly (Dec. 1913).Google Scholar
- Datta, A., Tschantz, M. C., and Datta, A. Automated Experiments on Ad Privacy Settings. In Proceedings of Privacy Enhancing Technologies (2015).Google ScholarCross Ref
- Donoho, D. L. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289--1306. Google ScholarDigital Library
- Dudoit, S., and van der Laan, M. Multiple testing procedures with applications to genomics. Springer, 2008.Google ScholarCross Ref
- Feldman, V. Optimal hardness results for maximizing agreement with monomials. SIAM Journal on Computing 39, 2 (2009), 606--645. Google ScholarDigital Library
- Google. AdSense policy.small https://support.google.com/adsense/answer/3016459?hl=en, 2015.Google Scholar
- Google. AdWords policy.small https://support.google.com/adwordspolicy/answer/6008942?hl=en, 2015.Google Scholar
- Gretton, A., Bousquet, O., Smola, A., , and Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory (2005). Google ScholarDigital Library
- Hannak, A., Sapiezynski, P., Kakhki, A. M., Krishnamurthy, B., Lazer, D., Mislove, A., and Wilson, C. Measuring personalization of web search. In WWW (May 2013). Google ScholarDigital Library
- Hannak, A., Soeller, G., Lazer, D., Mislove, A., and Wilson, C. Measuring Price Discrimination and Steering on E-commerce Web Sites. In IMC (Nov. 2014). Google ScholarDigital Library
- Holm, S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65--70.Google Scholar
- Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., and Geambasu, R. XRay: Enhancing the Web's Transparency with Differential Correlation. 23rd USENIX Security Symposium (USENIX Security 14) (2014). Google ScholarDigital Library
- Liu, B., Sheth, A., Weinsberg, U., Chandrashekar, J., and Govindan, R. AdReveal: improving transparency into online targeted advertising. In HotNets-XII (Nov. 2013). Google ScholarDigital Library
- Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Detecting price and search discrimination on the internet. In HotNets-XI: Proceedings of the 11th ACM Workshop on Hot Topics in Networks (Oct. 2012), ACM Request Permissions. Google ScholarDigital Library
- Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Crowd-assisted Search for Price Discrimination in E-Commerce: First results. arXiv.org (July 2013).Google ScholarDigital Library
- Nath, S. MAdScope: Characterizing Mobile In-App Targeted Ads. Proceedings of ACM Mobisys (2015). Google ScholarDigital Library
- Ng, A. Y. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-first International Conference on Machine Learning (2004). Google ScholarDigital Library
- Rubin, D. B. Estimating the causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66 (1974), 688--701.Google ScholarCross Ref
- Selenium.small http://www.seleniumhq.org/, 2015.Google Scholar
- Tibshirani, R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58 (1994), 267--288.Google Scholar
- Vissers, T., Nikiforakis, N., Bielova, N., and Joosen, W. Crying Wolf? On the Price Discrimination of Online Airline Tickets. Hot Topics in Privacy Enhancing Technologies (June 2014), 1--12.Google Scholar
- Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 6 (2009), 714--721. Google ScholarDigital Library
- Xing, X., Meng, W., Doozan, D., Feamster, N., Lee, W., and Snoeren, A. C. Exposing Inconsistent Web Search Results with Bobble. In PAM '14: Proceedings of the Passive and Active Measurements Conference (2014). Google ScholarDigital Library
Index Terms
- Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence
Recommendations
I always feel like somebody's watching me: measuring online behavioural advertising
CoNEXT '15: Proceedings of the 11th ACM Conference on Emerging Networking Experiments and TechnologiesOnline Behavioural targeted Advertising (OBA) has risen in prominence as a method to increase the effectiveness of online advertising. OBA operates by associating tags or labels to users based on their online activity and then using these labels to ...
Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services
IMC '16: Proceedings of the 2016 Internet Measurement ConferenceMany popular, free online services provide cross-platform interfaces via Web browsers as well as apps on iOS and Android. To monetize these services, many additionally include tracking and advertising libraries that gather information about users with ...
MyAdChoices: Bringing Transparency and Control to Online Advertising
The intrusiveness and the increasing invasiveness of online advertising have, in the last few years, raised serious concerns regarding user privacy and Web usability. As a reaction to these concerns, we have witnessed the emergence of a myriad of ad-...
Comments