ABSTRACT
We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a hierarchical Bayesian infinite mixture model. For each novel MDP, we use the previously learned distribution as an informed prior for modelbased Bayesian reinforcement learning. The hierarchical Bayesian framework provides a strong prior that allows us to rapidly infer the characteristics of new environments based on previous environments, while the use of a nonparametric model allows us to quickly adapt to environments we have not encountered before. In addition, the use of infinite mixtures allows for the model to automatically learn the number of underlying MDP components. We evaluate our approach and show that it leads to significant speedups in convergence to an optimal policy after observing only a small number of tasks.
- Banerjee, B., & Stone, P. (2007). General game learning using knowledge transfer. Proceedings of the 20th International Joint Conference on Artificial Intelligence. Google ScholarDigital Library
- Dearden, R., Friedman, N., & Andre, D. (1998a). Modelbased Bayesian exploration. Proceedings of the 15th International Conference on Machine Learning. Google ScholarDigital Library
- Dearden, R., Friedman, N., & Russell, S. (1998b). Bayesian Q-learning. Proceedings of the Fifteenth National Conference on Artificial Intelligence. Google ScholarDigital Library
- Duff, M. (2003). Design for an optimal probe. Proceedings of the 20th International Conference on Machine Learning.Google Scholar
- Konidaris, G., & Barto, A. (2006). Autonomous shaping: knowledge transfer in reinforcement learning. Proceedings of the 23rd international conference on Machine Learning (pp. 489--496). Google ScholarDigital Library
- Mehta, N., Natarajan, S., Tadepalli, P., & Fern, A. (2005). Transfer in variable-reward hierarchical reinforcement learning. Workshop on Transfer Learning at Neural Information Processing Systems.Google Scholar
- Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9, 249--265.Google Scholar
- Strens, M. J. A. (2000). A Bayesian framework for reinforcement learning. Proceedings of the 17th International Conference on Machine Learning. Google ScholarDigital Library
- Sutton, R., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press. Google ScholarDigital Library
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285--294.Google ScholarCross Ref
- Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. Proceedings of the 22nd Internationl Conference on Machine Learning. Google ScholarDigital Library
- Multi-task reinforcement learning: a hierarchical Bayesian approach
Recommendations
Multi-task Reinforcement Learning in Partially Observable Stochastic Environments
We consider the problem of multi-task reinforcement learning (MTRL) in multiple partially observable stochastic environments. We introduce the regionalized policy representation (RPR) to characterize the agent's behavior in each environment. The RPR is ...
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agent@?s sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (...
Active Task-Inference-Guided Deep Inverse Reinforcement Learning
2020 59th IEEE Conference on Decision and Control (CDC)We consider the problem of reward learning for temporally extended tasks. For reward learning, inverse reinforcement learning (IRL) is a widely used paradigm. Given a Markov decision process (MDP) and a set of demonstrations for a task, IRL learns a ...
Comments