skip to main content
10.5555/3237383.3237451acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Lenient Multi-Agent Deep Reinforcement Learning

Authors Info & Claims
Published:09 July 2018Publication History

ABSTRACT

Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated when agents update their policies in parallel \citefoerster2017stabilising. In this work we apply leniency \citepanait2006lenient to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm \citeomidshafiei2017deep as well as a modified version we call scheduled -HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) \citebucsoniu2010multi. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.

References

  1. Tucker Balch and Ronald C Arkin . 1994. Communication in reactive multiagent robotic systems. Autonomous robots, Vol. 1, 1 (1994), 27--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nikos Barbalios and Panagiotis Tzionas . 2014. A robust approach for multi-agent natural resource allocation based on stochastic optimization algorithms. Applied Soft Computing Vol. 18 (2014), 12--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Daan Bloembergen, Daniel Hennes, Michael Kaisers, and Karl Tuyls . 2015. Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research Vol. 53 (2015), 659--697. Google ScholarGoogle ScholarCross RefCross Ref
  4. Daan Bloembergen, Michael Kaisers, and Karl Tuyls . 2011. Empirical and theoretical support for lenient learning The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3. International Foundation for Autonomous Agents and Multiagent Systems, 1105--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lucian Busoniu, Robert Babuska, and Bart De Schutter . 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lucian Bucsoniu, Robert Babuvska, and Bart De Schutter . 2010. Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1. Springer, 183--221.Google ScholarGoogle Scholar
  7. Moses S Charikar . 2002. Similarity estimation techniques from rounding algorithms Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tim de Bruin, Jens Kober, Karl Tuyls, and Robert Babuvska . 2015. The importance of experience replay database composition in deep reinforcement learning Deep Reinforcement Learning Workshop, NIPS.Google ScholarGoogle Scholar
  9. Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip Torr, Pushmeet Kohli, Shimon Whiteson, et almbox. . 2017. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1702.08887 (2017).Google ScholarGoogle Scholar
  10. Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine . 2016. Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv preprint arXiv:1610.00633 (2016).Google ScholarGoogle Scholar
  11. Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer . 2017. Cooperative Multi-Agent Control Using Deep Reinforcement Learning Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2017).Google ScholarGoogle Scholar
  12. Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, and Enrique Munoz de Cote . 2017. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity. arXiv preprint arXiv:1707.09183 (2017).Google ScholarGoogle Scholar
  13. Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore . 1996. Reinforcement learning: A survey. Journal of artificial intelligence research Vol. 4 (1996), 237--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Diederik P. Kingma and Jimmy Ba . 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  15. Guillaume Lample and Devendra Singh Chaplot . 2017. Playing FPS Games with Deep Reinforcement Learning. AAAI (2017), 2140--2146.Google ScholarGoogle Scholar
  16. Long-H Lin . 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, Vol. 8, 3/4 (1992), 69--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Laëtitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat . 2007. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on. IEEE, 64--69.Google ScholarGoogle ScholarCross RefCross Ref
  18. Laetitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat . 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review Vol. 27, 1 (2012), 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox. . 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  20. Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian . 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning. 2681--2690.Google ScholarGoogle Scholar
  21. Gregory Palmer, Karl Tuyls, Daan Bloembergen, and Rahul Savani . 2017. Lenient Multi-Agent Deep Reinforcement Learning. arXiv preprint arXiv:1707.04402 (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liviu Panait, Keith Sullivan, and Sean Luke . 2006. Lenient learners in cooperative multiagent systems Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. ACM, 801--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liviu Panait, Karl Tuyls, and Sean Luke . 2008. Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. Journal of Machine Learning Research Vol. 9, Mar (2008), 423--457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mitchell A Potter and Kenneth A De Jong . 1994. A cooperative coevolutionary approach to function optimization International Conference on Parallel Problem Solving from Nature. Springer, 249--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver . 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).Google ScholarGoogle Scholar
  26. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, and Thore Graepel . 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).Google ScholarGoogle Scholar
  27. Richard S Sutton and Andrew G Barto . 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente . 2017. Multiagent cooperation and competition with deep reinforcement learning. PLoS One, Vol. 12, 4 (2017), e0172395.Google ScholarGoogle ScholarCross RefCross Ref
  29. Ming Tan . 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents Proceedings of the tenth international conference on machine learning. 330--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel . 2017. # Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning Advances in Neural Information Processing Systems. 2750--2759.Google ScholarGoogle Scholar
  31. Karl Tuyls and Gerhard Weiss . 2012. Multiagent Learning: Basics, Challenges, and Prospects. AI Magazine, Vol. 33, 3 (2012), 41--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hado Van Hasselt . 2010. Double Q-learning Advances in Neural Information Processing Systems. 2613--2621. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hado Van Hasselt, Arthur Guez, and David Silver . 2016. Deep Reinforcement Learning with Double Q-Learning. AAAI (2016), 2094--2100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning, Vol. 8, 3--4 (1992), 279--292.Google ScholarGoogle Scholar
  35. Ermo Wei and Sean Luke . 2016. Lenient Learning in Independent-Learner Stochastic Cooperative Games. Journal of Machine Learning Research Vol. 17, 84 (2016), 1--42. http://jmlr.org/papers/v17/15--417.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R Paul Wiegand . 2003. An analysis of cooperative coevolutionary algorithms. Ph.D. Dissertation. bibinfoschoolGeorge Mason University Virginia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yinliang Xu, Wei Zhang, Wenxin Liu, and Frank Ferrese . 2012. Multiagent-based reinforcement learning for optimal reactive power dispatch. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 42, 6 (2012), 1742--1751. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Lenient Multi-Agent Deep Reinforcement Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems
            July 2018
            2312 pages

            Publisher

            International Foundation for Autonomous Agents and Multiagent Systems

            Richland, SC

            Publication History

            • Published: 9 July 2018

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            AAMAS '18 Paper Acceptance Rate149of607submissions,25%Overall Acceptance Rate1,155of5,036submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader