In an online decision problem, an algorithm performs a sequence of trials, each of which involves selecting one element from a fixed set of alternatives (the "strategy set") whose costs vary over time. After T trials, the combined cost of the algorithm's choices is compared with that of the single strategy whose combined cost is minimum. Their difference is called regret, and one seeks algorithms which are efficient in that their regret is sublinear in T and polynomial in the problem size. We study an important class of online decision problems called generalized multi-armed bandit problems. In the past such problems have found applications in areas as diverse as statistics, computer science, economic theory, and medical decision-making. Most existing algorithms were efficient only in the case of a small (i.e. polynomial-sized) strategy set. We extend the theory by supplying non-trivial algorithms and lower bounds for cases in which the strategy set is much larger (exponential or infinite) and the cost function class is structured, e.g. by constraining the cost functions to be linear or convex. As applications, we consider adaptive routing in networks, adaptive pricing in electronic markets, and collaborative decision-making by untrusting peers in a dynamic environment. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
Cited By
- Yang L, Chen Y, Hajiesmaili M, Herbster M and Towsley D (2022). Hierarchical Learning Algorithms for Multi-scale Expert Problems, Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6:2, (1-29), Online publication date: 26-May-2022.
- Kleinberg R, Slivkins A and Upfal E (2019). Bandits and Experts in Metric Spaces, Journal of the ACM, 66:4, (1-77), Online publication date: 26-Aug-2019.
- Ghosh A and Hummel P Learning and incentives in user-generated content Proceedings of the 4th conference on Innovations in Theoretical Computer Science, (233-246)
- Anagnostopoulos A, Kumar R, Mahdian M, Upfal E and Vandin F Algorithms on evolving graphs Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, (149-160)
- Bahmani B, Kumar R, Mahdian M and Upfal E PageRank on an evolving graph Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, (24-32)
- Chen Y, Kung J, Parkes D, Procaccia A and Zhang H Incentive design for adaptive agents The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, (627-634)
- Anagnostopoulos A, Kumar R, Mahdian M and Upfal E (2011). Sorting and selection on dynamic data, Theoretical Computer Science, 412:24, (2564-2576), Online publication date: 1-May-2011.
- Negoescu D, Frazier P and Powell W (2011). The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery, INFORMS Journal on Computing, 23:3, (346-363), Online publication date: 1-Jul-2011.
- Kleinberg R and Slivkins A Sharp dichotomies for regret minimization in metric spaces Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms, (827-846)
- Nazerzadeh H, Saberi A and Vohra R Dynamic cost-per-action mechanisms and applications to online advertising Proceedings of the 17th international conference on World Wide Web, (179-188)
- Kleinberg R, Slivkins A and Upfal E Multi-armed bandits in metric spaces Proceedings of the fortieth annual ACM symposium on Theory of computing, (681-690)
- Rusmevichientong P and Williamson D An adaptive algorithm for selecting profitable keywords for search-based advertising services Proceedings of the 7th ACM conference on Electronic commerce, (260-269)
Index Terms
- Online decision problems with large strategy sets
Recommendations
Efficient algorithms for online decision problems
Special issue: Learning theory 2003In an online decision problem, one makes a sequence of decisions without knowledge of the future. Each period, one pays a cost based on the decision and observed state. We give a simple approach for doing nearly as well as the best single decision, ...
P-Selective Sets and Reducing Search to Decision vs Self-Reducibility
We distinguish self-reducibility of a languageLwith the question of whether search reduces to decision forL. Results include: (i) If NE E, then there exists a setLin NP P such that search reduces to decision forL, search doesnotnonadaptively reduce to ...