research-article

Focused Context Balancing for Robust Offline Policy Evaluation

Authors:
Hao Zou

Tsinghua University & Beijing National Research Center for Information Science and Technology (BNRist)., Beijing, China

Tsinghua University & Beijing National Research Center for Information Science and Technology (BNRist)., Beijing, China
View Profile

,
Kun Kuang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Boqi Chen

Boston University & Tsinghua University, Boston, MA, USA

Boston University & Tsinghua University, Boston, MA, USA
View Profile

,
Peixuan Chen

Tencent, Shenzhen, China

Tencent, Shenzhen, China
View Profile

,
Peng Cui

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2019Pages 696–704https://doi.org/10.1145/3292500.3330852

Published:25 July 2019Publication History

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 696–704

ABSTRACT

Precisely evaluating the effect of new policies (e.g. ad-placement models, recommendation functions, ranking functions) is one of the most important problems for improving interactive systems. The conventional policy evaluation methods rely on online A/B tests, but they are usually extremely expensive and may have undesirable impacts. Recently, Inverse Propensity Score (IPS) estimators are proposed as alternatives to evaluate the effect of new policy with offline logged data that was collected from a different policy in the past. They tend to remove the distribution shift induced by past policy. However, they ignore the distribution shift that would be induced by the new policy, which results in imprecise evaluation. Moreover, their performances rely on accurate estimation of propensity score, which can not be guaranteed or validated in practice. In this paper, we propose a non-parametric method, named Focused Context Balancing (FCB) algorithm, to learn sample weights for context balancing, so that the distribution shift induced by the past policy and new policy can be eliminated respectively. To validate the effectiveness of our FCB algorithm, we conduct extensive experiments on both synthetic and real world datasets. The experimental results clearly demonstrate that our FCB algorithm outperforms existing estimators by achieving more precise and robust results for offline policy evaluation.

References

Aman Agarwal, Soumya Basu, Tobias Schnabel, and Thorsten Joachims. 2017. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 687--696. Google ScholarDigital Library
Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning. 1638--1646. Google ScholarDigital Library
Susan Athey, Guido W Imbens, and Stefan Wager. 2016. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2016).Google Scholar
Peter C Austin. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research , Vol. 46, 3 (2011), 399--424.Google Scholar
Heejung Bang and James M Robins. 2005. Doubly robust estimation in missing data and causal inference models. Biometrics , Vol. 61, 4 (2005), 962--973.Google ScholarCross Ref
Léon Bottou, Jonas Peters, Joaquin Qui nonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research , Vol. 14, 1 (2013), 3207--3260. Google ScholarDigital Library
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney K Newey. 2016. Double machine learning for treatment and causal parameters . Technical Report. cemmap working paper, Centre for Microdata Methods and Practice.Google Scholar
Miroslav Dudik, John Langford, and Lihong Li. 2011. Doubly robust policy evaluation and learning. In International Conference on International Conference on Machine Learning. 1097--1104. Google ScholarDigital Library
Max H Farrell. 2015. Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics , Vol. 189, 1 (2015), 1--23.Google ScholarCross Ref
Jens Hainmueller. 2012. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis , Vol. 20, 1 (2012), 25--46.Google ScholarCross Ref
John Hammersley. 2013. Monte carlo methods .Springer Science & Business Media.Google Scholar
Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association , Vol. 47, 260 (1952), 663--685.Google ScholarCross Ref
Ron Kohavi and Roger Longbotham. 2011. Unexpected results in online controlled experiments. ACM SIGKDD Explorations Newsletter , Vol. 12, 2 (2011), 31--35. Google ScholarDigital Library
Augustine Kong. 1992. A note on importance sampling using standardized weights. University of Chicago, Dept. of Statistics, Tech. Rep , Vol. 348 (1992).Google Scholar
Kun Kuang, Peng Cui, Bo Li, Meng Jiang, and Shiqiang Yang. 2017a. Estimating Treatment Effect in the Wild via Differentiated Confounder Balancing. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 265--274. Google ScholarDigital Library
Kun Kuang, Peng Cui, Bo Li, Meng Jiang, Shiqiang Yang, and Fei Wang. 2017b. Treatment effect estimation with data-driven variable decomposition. In Thirty-First AAAI Conference on Artificial Intelligence . Google ScholarDigital Library
Kun Kuang, Meng Jiang, Peng Cui, and Shiqiang Yang. 2016. Steering social media promotions with effective strategies. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 985--990.Google ScholarCross Ref
John Langford and Tong Zhang. 2008. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems. 817--824. Google ScholarDigital Library
Randall Lewis and David Reiley. 2009. Retail advertising works! measuring the effects of advertising on sales via a controlled experiment on yahoo! (2009).Google Scholar
Lihong Li, Shunbao Chen, Jim Kleban, and Ankur Gupta. 2015. Counterfactual estimation and optimization of click metrics in search engines: A case study. In Proceedings of the 24th International Conference on World Wide Web. ACM, 929--934. Google ScholarDigital Library
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297--306. Google ScholarDigital Library
Art B Owen. 2013. Monte Carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen (2013).Google Scholar
Michael JD Powell and J Swann. 1966. Weighted uniform samplinga Monte Carlo technique for reducing variance. IMA Journal of Applied Mathematics , Vol. 2, 3 (1966), 228--236.Google ScholarCross Ref
Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika , Vol. 70, 1 (1983), 41--55.Google ScholarCross Ref
Reuven Y Rubinstein and Dirk P Kroese. 2016. Simulation and the Monte Carlo method . Vol. 10. John Wiley & Sons. Google ScholarDigital Library
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. arXiv preprint arXiv:1602.05352 (2016). Google ScholarDigital Library
Alex Strehl, John Langford, Lihong Li, and Sham M Kakade. 2010. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems. 2217--2225. Google ScholarDigital Library
Adith Swaminathan and Thorsten Joachims. 2015a. Counterfactual risk minimization: Learning from logged bandit feedback. In International Conference on Machine Learning. 814--823. Google ScholarDigital Library
Adith Swaminathan and Thorsten Joachims. 2015b. The self-normalized estimator for counterfactual learning. In Advances in Neural Information Processing Systems. 3231--3239. Google ScholarDigital Library
Daniel Westreich, Justin Lessler, and Michele Jonsson Funk. 2010. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. Journal of clinical epidemiology , Vol. 63, 8 (2010), 826--833.Google ScholarCross Ref
José R Zubizarreta. 2015. Stable weights that balance covariates for estimation with incomplete outcome data. J. Amer. Statist. Assoc. , Vol. 110, 511 (2015), 910--922.Google ScholarCross Ref

Index Terms

Focused Context Balancing for Robust Offline Policy Evaluation
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Learning from implicit feedback

Recommendations

Designing Fast and Scalable XACML Policy Evaluation Engines

Most prior research on policies has focused on correctness. While correctness is an important issue, the adoption of policy-based computing may be limited if the resulting systems are not implemented efficiently and thus perform poorly. To increase the ...
Read More
XACBench: a XACML policy benchmark
Abstract
XACML standard defines a declarative language to determine access control policies which are critical for deploying security solutions. It is important to evaluate the performance of policies defined by XACML, for applications such as policy ...
Read More
Adaptive Reordering and Clustering-Based Framework for Efficient XACML Policy Evaluation

The adoption of XACML as the standard for specifying access control policies for various applications, especially web services is vastly increasing. This calls for high performance XACML policy evaluation engines. A policy evaluation engine can easily ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
General Chairs:
Ankur Teredesai
KenSci
,
Vipin Kumar
University of Minnesota
,
Program Chairs:
Ying Li
EV Analysis Corporation
,
Rómer Rosales
LinkedIn
,
Evimaria Terzi
Boston University
,
George Karypis
University of Minnesota
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
context balancing
distribution shift
policy evaluation
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 478
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Focused Context Balancing for Robust Offline Policy Evaluation

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Designing Fast and Scalable XACML Policy Evaluation Engines

XACBench: a XACML policy benchmark

Adaptive Reordering and Clustering-Based Framework for Efficient XACML Policy Evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Focused Context Balancing for Robust Offline Policy Evaluation

KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Designing Fast and Scalable XACML Policy Evaluation Engines

XACBench: a XACML policy benchmark

Adaptive Reordering and Clustering-Based Framework for Efficient XACML Policy Evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media