skip to main content
10.1145/1273496.1273624acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Multi-task reinforcement learning: a hierarchical Bayesian approach

Published:20 June 2007Publication History

ABSTRACT

We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a hierarchical Bayesian infinite mixture model. For each novel MDP, we use the previously learned distribution as an informed prior for modelbased Bayesian reinforcement learning. The hierarchical Bayesian framework provides a strong prior that allows us to rapidly infer the characteristics of new environments based on previous environments, while the use of a nonparametric model allows us to quickly adapt to environments we have not encountered before. In addition, the use of infinite mixtures allows for the model to automatically learn the number of underlying MDP components. We evaluate our approach and show that it leads to significant speedups in convergence to an optimal policy after observing only a small number of tasks.

References

  1. Banerjee, B., & Stone, P. (2007). General game learning using knowledge transfer. Proceedings of the 20th International Joint Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dearden, R., Friedman, N., & Andre, D. (1998a). Modelbased Bayesian exploration. Proceedings of the 15th International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dearden, R., Friedman, N., & Russell, S. (1998b). Bayesian Q-learning. Proceedings of the Fifteenth National Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Duff, M. (2003). Design for an optimal probe. Proceedings of the 20th International Conference on Machine Learning.Google ScholarGoogle Scholar
  5. Konidaris, G., & Barto, A. (2006). Autonomous shaping: knowledge transfer in reinforcement learning. Proceedings of the 23rd international conference on Machine Learning (pp. 489--496). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mehta, N., Natarajan, S., Tadepalli, P., & Fern, A. (2005). Transfer in variable-reward hierarchical reinforcement learning. Workshop on Transfer Learning at Neural Information Processing Systems.Google ScholarGoogle Scholar
  7. Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9, 249--265.Google ScholarGoogle Scholar
  8. Strens, M. J. A. (2000). A Bayesian framework for reinforcement learning. Proceedings of the 17th International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sutton, R., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285--294.Google ScholarGoogle ScholarCross RefCross Ref
  11. Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. Proceedings of the 22nd Internationl Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Multi-task reinforcement learning: a hierarchical Bayesian approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICML '07: Proceedings of the 24th international conference on Machine learning
        June 2007
        1233 pages
        ISBN:9781595937933
        DOI:10.1145/1273496

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate140of548submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader