ABSTRACT
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated when agents update their policies in parallel \citefoerster2017stabilising. In this work we apply leniency \citepanait2006lenient to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm \citeomidshafiei2017deep as well as a modified version we call scheduled -HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) \citebucsoniu2010multi. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.
- Tucker Balch and Ronald C Arkin . 1994. Communication in reactive multiagent robotic systems. Autonomous robots, Vol. 1, 1 (1994), 27--52. Google ScholarDigital Library
- Nikos Barbalios and Panagiotis Tzionas . 2014. A robust approach for multi-agent natural resource allocation based on stochastic optimization algorithms. Applied Soft Computing Vol. 18 (2014), 12--24. Google ScholarDigital Library
- Daan Bloembergen, Daniel Hennes, Michael Kaisers, and Karl Tuyls . 2015. Evolutionary dynamics of multi-agent learning: A survey. Journal of Artificial Intelligence Research Vol. 53 (2015), 659--697. Google ScholarCross Ref
- Daan Bloembergen, Michael Kaisers, and Karl Tuyls . 2011. Empirical and theoretical support for lenient learning The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3. International Foundation for Autonomous Agents and Multiagent Systems, 1105--1106. Google ScholarDigital Library
- Lucian Busoniu, Robert Babuska, and Bart De Schutter . 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008). Google ScholarDigital Library
- Lucian Bucsoniu, Robert Babuvska, and Bart De Schutter . 2010. Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1. Springer, 183--221.Google Scholar
- Moses S Charikar . 2002. Similarity estimation techniques from rounding algorithms Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380--388. Google ScholarDigital Library
- Tim de Bruin, Jens Kober, Karl Tuyls, and Robert Babuvska . 2015. The importance of experience replay database composition in deep reinforcement learning Deep Reinforcement Learning Workshop, NIPS.Google Scholar
- Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Philip Torr, Pushmeet Kohli, Shimon Whiteson, et almbox. . 2017. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1702.08887 (2017).Google Scholar
- Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine . 2016. Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv preprint arXiv:1610.00633 (2016).Google Scholar
- Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer . 2017. Cooperative Multi-Agent Control Using Deep Reinforcement Learning Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2017).Google Scholar
- Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, and Enrique Munoz de Cote . 2017. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity. arXiv preprint arXiv:1707.09183 (2017).Google Scholar
- Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore . 1996. Reinforcement learning: A survey. Journal of artificial intelligence research Vol. 4 (1996), 237--285. Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba . 2014. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).Google Scholar
- Guillaume Lample and Devendra Singh Chaplot . 2017. Playing FPS Games with Deep Reinforcement Learning. AAAI (2017), 2140--2146.Google Scholar
- Long-H Lin . 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, Vol. 8, 3/4 (1992), 69--97. Google ScholarDigital Library
- Laëtitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat . 2007. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on. IEEE, 64--69.Google ScholarCross Ref
- Laetitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat . 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review Vol. 27, 1 (2012), 1--31. Google ScholarDigital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox. . 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.Google Scholar
- Shayegan Omidshafiei, Jason Pazis, Christopher Amato, Jonathan P How, and John Vian . 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning. 2681--2690.Google Scholar
- Gregory Palmer, Karl Tuyls, Daan Bloembergen, and Rahul Savani . 2017. Lenient Multi-Agent Deep Reinforcement Learning. arXiv preprint arXiv:1707.04402 (2017). Google ScholarDigital Library
- Liviu Panait, Keith Sullivan, and Sean Luke . 2006. Lenient learners in cooperative multiagent systems Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. ACM, 801--803. Google ScholarDigital Library
- Liviu Panait, Karl Tuyls, and Sean Luke . 2008. Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. Journal of Machine Learning Research Vol. 9, Mar (2008), 423--457. Google ScholarDigital Library
- Mitchell A Potter and Kenneth A De Jong . 1994. A cooperative coevolutionary approach to function optimization International Conference on Parallel Problem Solving from Nature. Springer, 249--257. Google ScholarDigital Library
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver . 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).Google Scholar
- Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, and Thore Graepel . 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).Google Scholar
- Richard S Sutton and Andrew G Barto . 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
- Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente . 2017. Multiagent cooperation and competition with deep reinforcement learning. PLoS One, Vol. 12, 4 (2017), e0172395.Google ScholarCross Ref
- Ming Tan . 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents Proceedings of the tenth international conference on machine learning. 330--337. Google ScholarDigital Library
- Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel . 2017. # Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning Advances in Neural Information Processing Systems. 2750--2759.Google Scholar
- Karl Tuyls and Gerhard Weiss . 2012. Multiagent Learning: Basics, Challenges, and Prospects. AI Magazine, Vol. 33, 3 (2012), 41--52.Google ScholarDigital Library
- Hado Van Hasselt . 2010. Double Q-learning Advances in Neural Information Processing Systems. 2613--2621. Google ScholarDigital Library
- Hado Van Hasselt, Arthur Guez, and David Silver . 2016. Deep Reinforcement Learning with Double Q-Learning. AAAI (2016), 2094--2100. Google ScholarDigital Library
- Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning, Vol. 8, 3--4 (1992), 279--292.Google Scholar
- Ermo Wei and Sean Luke . 2016. Lenient Learning in Independent-Learner Stochastic Cooperative Games. Journal of Machine Learning Research Vol. 17, 84 (2016), 1--42. http://jmlr.org/papers/v17/15--417.html Google ScholarDigital Library
- R Paul Wiegand . 2003. An analysis of cooperative coevolutionary algorithms. Ph.D. Dissertation. bibinfoschoolGeorge Mason University Virginia. Google ScholarDigital Library
- Yinliang Xu, Wei Zhang, Wenxin Liu, and Frank Ferrese . 2012. Multiagent-based reinforcement learning for optimal reactive power dispatch. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 42, 6 (2012), 1742--1751. Google ScholarDigital Library
Index Terms
- Lenient Multi-Agent Deep Reinforcement Learning
Recommendations
MAT-DQN: Toward Interpretable Multi-agent Deep Reinforcement Learning for Coordinated Activities
Artificial Neural Networks and Machine Learning – ICANN 2021AbstractWe propose an interpretable neural network architecture for multi-agent deep reinforcement learning to understand the rationale for learned cooperative behavior of the agents. Although the deep learning technology has contributed significantly to ...
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United KingdomThe development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...
MADDPGViz: a visual analytics approach to understand multi-agent deep reinforcement learning
AbstractDeep reinforcement learning (DRL) has received widespread attention recently, where the control policies are trained through deep neural networks. Several visual analytics methods were proposed to reveal the internal mechanism of DRL. However, ...
Comments