research-article

Public Access

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

Authors:
Mathias Lecuyer

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Riley Spahn

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Yannis Spiliopolous

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Augustin Chaintreau

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Roxana Geambasu

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Daniel Hsu

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications SecurityOctober 2015Pages 554–566https://doi.org/10.1145/2810103.2813614

Published:12 October 2015Publication History

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Pages 554–566

ABSTRACT

We present Sunlight, a system that detects the causes of targeting phenomena on the web -- such as personalized advertisements, recommendations, or content -- at large scale and with solid statistical confidence. Today's web is growing increasingly complex and impenetrable as myriad of services collect, analyze, use, and exchange users' personal information. No one can tell who has what data, for what purposes they are using it, and how those uses affect the users. The few studies that exist reveal problematic effects -- such as discriminatory pricing and advertising -- but they are either too small-scale to generalize or lack formal assessments of confidence in the results, making them difficult to trust or interpret. Sunlight brings a principled and scalable methodology to personal data measurements by adapting well-established methods from statistics for the specific problem of targeting detection. Our methodology formally separates different operations into four key phases: scalable hypothesis generation, interpretable hypothesis formation, statistical significance testing, and multiple testing correction. Each phase bears instantiations from multiple mechanisms from statistics, each making different assumptions and tradeoffs. Sunlight offers a modular design that allows exploration of this vast design space. We explore a portion of this space, thoroughly evaluating the tradeoffs both analytically and experimentally. Our exploration reveals subtle tensions between scalability and confidence. Sunlight's default functioning strikes a balance to provide the first system that can diagnose targeting at fine granularity, at scale, and with solid statistical justification of its results.

We showcase our system by running two measurement studies of targeting on the web, both the largest of their kind. Our studies -- about ad targeting in Gmail and on the web -- reveal statistically justifiable evidence that contradicts two Google statements regarding the lack of targeting on sensitive and prohibited topics.

References

AdBlockPlus.small https://adblockplus.org/, 2015.Google Scholar
Barford, P., Canadi, I., Krushevskaja, D., Ma, Q., and Muthukrishnan, S. Adscape: Harvesting and Analyzing Online Display Ads. WWW (Apr. 2014). Google ScholarDigital Library
Benjamini, Y., and Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Annals of statistics (2001), 1165--1188.Google Scholar
Bickel, P. J., Ritov, Y., and Tsybakov, A. B. Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37, 4 (08 2009), 1705--1732.Google ScholarCross Ref
Bodik, P., Goldszmidt, M., Fox, A., Woodard, D. B., and Andersen, H. Fingerprinting the datacenter: Automated classification of performance crises. In European Conference on Computer Systems (2010). Google ScholarDigital Library
Book, T., and Wallach, D. S. An Empirical Study of Mobile Ad Targeting. arXiv.org (2015).Google Scholar
Brandeis, L. What Publicity Can Do. Harper's Weekly (Dec. 1913).Google Scholar
Datta, A., Tschantz, M. C., and Datta, A. Automated Experiments on Ad Privacy Settings. In Proceedings of Privacy Enhancing Technologies (2015).Google ScholarCross Ref
Donoho, D. L. Compressed sensing. IEEE Transactions on Information Theory 52, 4 (2006), 1289--1306. Google ScholarDigital Library
Dudoit, S., and van der Laan, M. Multiple testing procedures with applications to genomics. Springer, 2008.Google ScholarCross Ref
Feldman, V. Optimal hardness results for maximizing agreement with monomials. SIAM Journal on Computing 39, 2 (2009), 606--645. Google ScholarDigital Library
Google. AdSense policy.small https://support.google.com/adsense/answer/3016459?hl=en, 2015.Google Scholar
Google. AdWords policy.small https://support.google.com/adwordspolicy/answer/6008942?hl=en, 2015.Google Scholar
Gretton, A., Bousquet, O., Smola, A., , and Schölkopf, B. Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory (2005). Google ScholarDigital Library
Hannak, A., Sapiezynski, P., Kakhki, A. M., Krishnamurthy, B., Lazer, D., Mislove, A., and Wilson, C. Measuring personalization of web search. In WWW (May 2013). Google ScholarDigital Library
Hannak, A., Soeller, G., Lazer, D., Mislove, A., and Wilson, C. Measuring Price Discrimination and Steering on E-commerce Web Sites. In IMC (Nov. 2014). Google ScholarDigital Library
Holm, S. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65--70.Google Scholar
Lécuyer, M., Ducoffe, G., Lan, F., Papancea, A., Petsios, T., Spahn, R., Chaintreau, A., and Geambasu, R. XRay: Enhancing the Web's Transparency with Differential Correlation. 23rd USENIX Security Symposium (USENIX Security 14) (2014). Google ScholarDigital Library
Liu, B., Sheth, A., Weinsberg, U., Chandrashekar, J., and Govindan, R. AdReveal: improving transparency into online targeted advertising. In HotNets-XII (Nov. 2013). Google ScholarDigital Library
Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Detecting price and search discrimination on the internet. In HotNets-XI: Proceedings of the 11th ACM Workshop on Hot Topics in Networks (Oct. 2012), ACM Request Permissions. Google ScholarDigital Library
Mikians, J., Gyarmati, L., Erramilli, V., and Laoutaris, N. Crowd-assisted Search for Price Discrimination in E-Commerce: First results. arXiv.org (July 2013).Google ScholarDigital Library
Nath, S. MAdScope: Characterizing Mobile In-App Targeted Ads. Proceedings of ACM Mobisys (2015). Google ScholarDigital Library
Ng, A. Y. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-first International Conference on Machine Learning (2004). Google ScholarDigital Library
Rubin, D. B. Estimating the causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66 (1974), 688--701.Google ScholarCross Ref
Selenium.small http://www.seleniumhq.org/, 2015.Google Scholar
Tibshirani, R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58 (1994), 267--288.Google Scholar
Vissers, T., Nikiforakis, N., Bielova, N., and Joosen, W. Crying Wolf? On the Price Discrimination of Online Airline Tickets. Hot Topics in Privacy Enhancing Technologies (June 2014), 1--12.Google Scholar
Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E., and Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 6 (2009), 714--721. Google ScholarDigital Library
Xing, X., Meng, W., Doozan, D., Feamster, N., Lee, W., and Snoeren, A. C. Exposing Inconsistent Web Search Results with Bobble. In PAM '14: Proceedings of the Passive and Active Measurements Conference (2014). Google ScholarDigital Library

Index Terms

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence
1. Security and privacy
  1. Human and societal aspects of security and privacy
2. Social and professional topics
  1. Computing / technology policy
  2. Professional topics
    1. Computing profession
      1. Codes of ethics

Recommendations

I always feel like somebody's watching me: measuring online behavioural advertising
CoNEXT '15: Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies

Online Behavioural targeted Advertising (OBA) has risen in prominence as a method to increase the effectiveness of online advertising. OBA operates by associating tags or labels to users based on their online activity and then using these labels to ...
Read More
Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services
IMC '16: Proceedings of the 2016 Internet Measurement Conference

Many popular, free online services provide cross-platform interfaces via Web browsers as well as apps on iOS and Android. To monetize these services, many additionally include tracking and advertising libraries that gather information about users with ...
Read More
MyAdChoices: Bringing Transparency and Control to Online Advertising

The intrusiveness and the increasing invasiveness of online advertising have, in the last few years, raised serious concerns regarding user privacy and Web usability. As a reaction to these concerns, we have witnessed the emergence of a myriad of ad-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security
October 2015
1750 pages
ISBN:9781450338325
DOI:10.1145/2810103
General Chair:
Indrajit Ray
Colorado State University, USA
,
Program Chairs:
Ninghui Li
Purdue University, USA
,
Christopher Kruegel
University of California, Santa Barbara, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
measurement
privacy
web transparency
Qualifiers
- research-article
Conference

Acceptance Rates
CCS '15 Paper Acceptance Rate128of660submissions,19%Overall Acceptance Rate1,261of6,999submissions,18%
More
Upcoming Conference
CCS '24

Sponsor:

sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 14 - 18, 2024

Salt Lake City , UT , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 786
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

I always feel like somebody's watching me: measuring online behavioural advertising

Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services

MyAdChoices: Bringing Transparency and Control to Online Advertising

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence

CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

I always feel like somebody's watching me: measuring online behavioural advertising

Should You Use the App for That?: Comparing the Privacy Implications of App- and Web-based Online Services

MyAdChoices: Bringing Transparency and Control to Online Advertising

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media