research-article

Lenient Multi-Agent Deep Reinforcement Learning

Authors:
Gregory Palmer

University of Liverpool, Liverpool, United Kingdom

University of Liverpool, Liverpool, United Kingdom
View Profile

,
Karl Tuyls

DeepMind & University of Liverpool, London, United Kingdom

DeepMind & University of Liverpool, London, United Kingdom
View Profile

,
Daan Bloembergen

Centrum Wiskunde & Informatica, Amsterdam, Netherlands

Centrum Wiskunde & Informatica, Amsterdam, Netherlands
View Profile

,
Rahul Savani

University of Liverpool, Liverpool, United Kingdom

University of Liverpool, Liverpool, United Kingdom
View Profile

AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent SystemsJuly 2018Pages 443–451

Published:09 July 2018Publication History

AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

Pages 443–451

ABSTRACT

Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated when agents update their policies in parallel \citefoerster2017stabilising. In this work we apply leniency \citepanait2006lenient to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm \citeomidshafiei2017deep as well as a modified version we call scheduled -HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) \citebucsoniu2010multi. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.

References

Tucker Balch and Ronald C Arkin . 1994. Communication in reactive multiagent robotic systems. Autonomous robots, Vol. 1, 1 (1994), 27--52. Google ScholarDigital Library
Nikos Barbalios and Panagiotis Tzionas . 2014. A robust approach for multi-agent natural resource allocation based on stochastic optimization algorithms. Applied Soft Computing Vol. 18 (2014), 12--24. Google ScholarDigital Library
Daan Bloembergen, Daniel Hennes, Michael Kaisers, and Karl Tuyls . 2015. Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research Vol. 53 (2015), 659--697. Google ScholarCross Ref
Daan Bloembergen, Michael Kaisers, and Karl Tuyls . 2011. Empirical and theoretical support for lenient learning The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3. International Foundation for Autonomous Agents and Multiagent Systems, 1105--1106. Google ScholarDigital Library
Lucian Busoniu, Robert Babuska, and Bart De Schutter . 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008). Google ScholarDigital Library
Lucian Bucsoniu, Robert Babuvska, and Bart De Schutter . 2010. Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1. Springer, 183--221.Google Scholar
Moses S Charikar . 2002. Similarity estimation techniques from rounding algorithms Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380--388. Google ScholarDigital Library
Tim de Bruin, Jens Kober, Karl Tuyls, and Robert Babuvska . 2015. The importance of experience replay database composition in deep reinforcement learning Deep Reinforcement Learning Workshop, NIPS.Google Scholar
Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip Torr, Pushmeet Kohli, Shimon Whiteson, et almbox. . 2017. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1702.08887 (2017).Google Scholar
Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine . 2016. Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv preprint arXiv:1610.00633 (2016).Google Scholar
Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer . 2017. Cooperative Multi-Agent Control Using Deep Reinforcement Learning Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2017).Google Scholar
Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, and Enrique Munoz de Cote . 2017. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity. arXiv preprint arXiv:1707.09183 (2017).Google Scholar
Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore . 1996. Reinforcement learning: A survey. Journal of artificial intelligence research Vol. 4 (1996), 237--285. Google ScholarDigital Library
Diederik P. Kingma and Jimmy Ba . 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).Google Scholar
Guillaume Lample and Devendra Singh Chaplot . 2017. Playing FPS Games with Deep Reinforcement Learning. AAAI (2017), 2140--2146.Google Scholar
Long-H Lin . 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, Vol. 8, 3/4 (1992), 69--97. Google ScholarDigital Library
Laëtitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat . 2007. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on. IEEE, 64--69.Google ScholarCross Ref
Laetitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat . 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review Vol. 27, 1 (2012), 1--31. Google ScholarDigital Library
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox. . 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.Google Scholar
Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian . 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning. 2681--2690.Google Scholar
Gregory Palmer, Karl Tuyls, Daan Bloembergen, and Rahul Savani . 2017. Lenient Multi-Agent Deep Reinforcement Learning. arXiv preprint arXiv:1707.04402 (2017). Google ScholarDigital Library
Liviu Panait, Keith Sullivan, and Sean Luke . 2006. Lenient learners in cooperative multiagent systems Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. ACM, 801--803. Google ScholarDigital Library
Liviu Panait, Karl Tuyls, and Sean Luke . 2008. Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. Journal of Machine Learning Research Vol. 9, Mar (2008), 423--457. Google ScholarDigital Library
Mitchell A Potter and Kenneth A De Jong . 1994. A cooperative coevolutionary approach to function optimization International Conference on Parallel Problem Solving from Nature. Springer, 249--257. Google ScholarDigital Library
Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver . 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).Google Scholar
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, and Thore Graepel . 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).Google Scholar
Richard S Sutton and Andrew G Barto . 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente . 2017. Multiagent cooperation and competition with deep reinforcement learning. PLoS One, Vol. 12, 4 (2017), e0172395.Google ScholarCross Ref
Ming Tan . 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents Proceedings of the tenth international conference on machine learning. 330--337. Google ScholarDigital Library
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel . 2017. # Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning Advances in Neural Information Processing Systems. 2750--2759.Google Scholar
Karl Tuyls and Gerhard Weiss . 2012. Multiagent Learning: Basics, Challenges, and Prospects. AI Magazine, Vol. 33, 3 (2012), 41--52.Google ScholarDigital Library
Hado Van Hasselt . 2010. Double Q-learning Advances in Neural Information Processing Systems. 2613--2621. Google ScholarDigital Library
Hado Van Hasselt, Arthur Guez, and David Silver . 2016. Deep Reinforcement Learning with Double Q-Learning. AAAI (2016), 2094--2100. Google ScholarDigital Library
Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning, Vol. 8, 3--4 (1992), 279--292.Google Scholar
Ermo Wei and Sean Luke . 2016. Lenient Learning in Independent-Learner Stochastic Cooperative Games. Journal of Machine Learning Research Vol. 17, 84 (2016), 1--42. http://jmlr.org/papers/v17/15--417.html Google ScholarDigital Library
R Paul Wiegand . 2003. An analysis of cooperative coevolutionary algorithms. Ph.D. Dissertation. bibinfoschoolGeorge Mason University Virginia. Google ScholarDigital Library
Yinliang Xu, Wei Zhang, Wenxin Liu, and Frank Ferrese . 2012. Multiagent-based reinforcement learning for optimal reactive power dispatch. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 42, 6 (2012), 1742--1751. Google ScholarDigital Library

Index Terms

Lenient Multi-Agent Deep Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Multi-agent reinforcement learning
        Sequential decision making
    2. Machine learning approaches
      1. Neural networks
      2. Stochastic games

Recommendations

MAT-DQN: Toward Interpretable Multi-agent Deep Reinforcement Learning for Coordinated Activities
Artificial Neural Networks and Machine Learning – ICANN 2021
Abstract
We propose an interpretable neural network architecture for multi-agent deep reinforcement learning to understand the rationale for learned cooperative behavior of the agents. Although the deep learning technology has contributed significantly to ...
Read More
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United Kingdom

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...
Read More
MADDPGViz: a visual analytics approach to understand multi-agent deep reinforcement learning
Abstract
Deep reinforcement learning (DRL) has received widespread attention recently, where the control policies are trained through deep neural networks. Several visual analytics methods were proposed to reveal the internal mechanism of DRL. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems
July 2018
2312 pages
General Chairs:
Elisabeth Andre
Augsburg University, Germany
,
Sven Koenig
University of Southern California, USA
,
Program Chairs:
Mehdi Dastani
Utrecht University, Netherlands
,
Gita Sukthankar
University of Central Florida, USA
Sponsors
In-Cooperation
Publisher
International Foundation for Autonomous Agents and Multiagent Systems
Richland, SC
Publication History
- Published: 9 July 2018
Check for updates
Author Tags
leniency
multi-agent deep reinforcement learning
Qualifiers
- research-article
Conference

Acceptance Rates
AAMAS '18 Paper Acceptance Rate149of607submissions,25%Overall Acceptance Rate1,155of5,036submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 660
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lenient Multi-Agent Deep Reinforcement Learning

AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

MAT-DQN: Toward Interpretable Multi-agent Deep Reinforcement Learning for Coordinated Activities

Deep reinforcement learning for multi-agent interaction

MADDPGViz: a visual analytics approach to understand multi-agent deep reinforcement learning