skip to main content
10.1145/3292500.3330852acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Focused Context Balancing for Robust Offline Policy Evaluation

Authors Info & Claims
Published:25 July 2019Publication History

ABSTRACT

Precisely evaluating the effect of new policies (e.g. ad-placement models, recommendation functions, ranking functions) is one of the most important problems for improving interactive systems. The conventional policy evaluation methods rely on online A/B tests, but they are usually extremely expensive and may have undesirable impacts. Recently, Inverse Propensity Score (IPS) estimators are proposed as alternatives to evaluate the effect of new policy with offline logged data that was collected from a different policy in the past. They tend to remove the distribution shift induced by past policy. However, they ignore the distribution shift that would be induced by the new policy, which results in imprecise evaluation. Moreover, their performances rely on accurate estimation of propensity score, which can not be guaranteed or validated in practice. In this paper, we propose a non-parametric method, named Focused Context Balancing (FCB) algorithm, to learn sample weights for context balancing, so that the distribution shift induced by the past policy and new policy can be eliminated respectively. To validate the effectiveness of our FCB algorithm, we conduct extensive experiments on both synthetic and real world datasets. The experimental results clearly demonstrate that our FCB algorithm outperforms existing estimators by achieving more precise and robust results for offline policy evaluation.

References

  1. Aman Agarwal, Soumya Basu, Tobias Schnabel, and Thorsten Joachims. 2017. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 687--696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning. 1638--1646. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Susan Athey, Guido W Imbens, and Stefan Wager. 2016. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2016).Google ScholarGoogle Scholar
  4. Peter C Austin. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research , Vol. 46, 3 (2011), 399--424.Google ScholarGoogle Scholar
  5. Heejung Bang and James M Robins. 2005. Doubly robust estimation in missing data and causal inference models. Biometrics , Vol. 61, 4 (2005), 962--973.Google ScholarGoogle ScholarCross RefCross Ref
  6. Léon Bottou, Jonas Peters, Joaquin Qui nonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research , Vol. 14, 1 (2013), 3207--3260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney K Newey. 2016. Double machine learning for treatment and causal parameters . Technical Report. cemmap working paper, Centre for Microdata Methods and Practice.Google ScholarGoogle Scholar
  8. Miroslav Dudik, John Langford, and Lihong Li. 2011. Doubly robust policy evaluation and learning. In International Conference on International Conference on Machine Learning. 1097--1104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Max H Farrell. 2015. Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics , Vol. 189, 1 (2015), 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jens Hainmueller. 2012. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis , Vol. 20, 1 (2012), 25--46.Google ScholarGoogle ScholarCross RefCross Ref
  11. John Hammersley. 2013. Monte carlo methods .Springer Science & Business Media.Google ScholarGoogle Scholar
  12. Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association , Vol. 47, 260 (1952), 663--685.Google ScholarGoogle ScholarCross RefCross Ref
  13. Ron Kohavi and Roger Longbotham. 2011. Unexpected results in online controlled experiments. ACM SIGKDD Explorations Newsletter , Vol. 12, 2 (2011), 31--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Augustine Kong. 1992. A note on importance sampling using standardized weights. University of Chicago, Dept. of Statistics, Tech. Rep , Vol. 348 (1992).Google ScholarGoogle Scholar
  15. Kun Kuang, Peng Cui, Bo Li, Meng Jiang, and Shiqiang Yang. 2017a. Estimating Treatment Effect in the Wild via Differentiated Confounder Balancing. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 265--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kun Kuang, Peng Cui, Bo Li, Meng Jiang, Shiqiang Yang, and Fei Wang. 2017b. Treatment effect estimation with data-driven variable decomposition. In Thirty-First AAAI Conference on Artificial Intelligence . Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kun Kuang, Meng Jiang, Peng Cui, and Shiqiang Yang. 2016. Steering social media promotions with effective strategies. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 985--990.Google ScholarGoogle ScholarCross RefCross Ref
  18. John Langford and Tong Zhang. 2008. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems. 817--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Randall Lewis and David Reiley. 2009. Retail advertising works! measuring the effects of advertising on sales via a controlled experiment on yahoo! (2009).Google ScholarGoogle Scholar
  20. Lihong Li, Shunbao Chen, Jim Kleban, and Ankur Gupta. 2015. Counterfactual estimation and optimization of click metrics in search engines: A case study. In Proceedings of the 24th International Conference on World Wide Web. ACM, 929--934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Art B Owen. 2013. Monte Carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen (2013).Google ScholarGoogle Scholar
  23. Michael JD Powell and J Swann. 1966. Weighted uniform samplinga Monte Carlo technique for reducing variance. IMA Journal of Applied Mathematics , Vol. 2, 3 (1966), 228--236.Google ScholarGoogle ScholarCross RefCross Ref
  24. Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika , Vol. 70, 1 (1983), 41--55.Google ScholarGoogle ScholarCross RefCross Ref
  25. Reuven Y Rubinstein and Dirk P Kroese. 2016. Simulation and the Monte Carlo method . Vol. 10. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. arXiv preprint arXiv:1602.05352 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Alex Strehl, John Langford, Lihong Li, and Sham M Kakade. 2010. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems. 2217--2225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Adith Swaminathan and Thorsten Joachims. 2015a. Counterfactual risk minimization: Learning from logged bandit feedback. In International Conference on Machine Learning. 814--823. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Adith Swaminathan and Thorsten Joachims. 2015b. The self-normalized estimator for counterfactual learning. In Advances in Neural Information Processing Systems. 3231--3239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Daniel Westreich, Justin Lessler, and Michele Jonsson Funk. 2010. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. Journal of clinical epidemiology , Vol. 63, 8 (2010), 826--833.Google ScholarGoogle ScholarCross RefCross Ref
  31. José R Zubizarreta. 2015. Stable weights that balance covariates for estimation with incomplete outcome data. J. Amer. Statist. Assoc. , Vol. 110, 511 (2015), 910--922.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Focused Context Balancing for Robust Offline Policy Evaluation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      July 2019
      3305 pages
      ISBN:9781450362016
      DOI:10.1145/3292500

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 July 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '19 Paper Acceptance Rate110of1,200submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader