Abstract
In machine learning, Reinforcement Learning (RL) is an important tool for creating intelligent agents that learn solely through experience. One particular subarea within the RL domain that has received great attention is how to define macro-actions, which are temporal abstractions composed of a sequence of primitive actions. This subarea, loosely called skill acquisition, has been under development for several years and has led to better results in a diversity of RL problems. Among the many skill acquisition approaches, graph-based methods have received considerable attention. This survey presents an overview of graph-based skill acquisition methods for RL. We cover a diversity of these approaches and discuss how they evolved throughout the years. Finally, we also discuss the current challenges and open issues in the area of graph-based skill acquisition for RL.
- Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence. 1726--1734. Google ScholarDigital Library
- Pierre-Luc Bacon and Doina Precup. 2013. Using label propagation for learning temporally abstract actions in reinforcement learning. In Proceedings of the Workshop on Multiagent Interaction Networks. 1--7.Google Scholar
- Albert-László Barabási. 2016. Network Science. Cambridge University Press.Google Scholar
- Andre Barreto, Will Dabney, Remi Munos, Jonathan J. Hunt, Tom Schaul, David Silver, and Hado P. van Hasselt. 2017. Successor features for transfer in reinforcement learning. In Advances in Neural Information Processing Systems (NIPS’17). Curran Associates, 4058--4068. Google ScholarDigital Library
- Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, and Stig Petersen. 2016. DeepMind lab. arXiv preprint arXiv:1612.03801 (2016).Google Scholar
- Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 1 (May 2013), 253--279. Google ScholarDigital Library
- Katy Börner, Soma Sanyal, and Alessandro Vespignani. 2007. Network science. Annual Review of Information Science and Technology 41, 1 (2007), 537--607. Google ScholarDigital Library
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI gym. arXiv preprint arXiv:1606.01540 (2016).Google Scholar
- Fei Chen, Yang Gao, Shifu Chen, and Zhenduo Ma. 2007. Connect-based subgoal discovery for options in hierarchical reinforcement learning. In Proceedings of the International Conference on Natural Computation (ICNC’07), Vol. 4. 698--702. Google ScholarDigital Library
- Cc Chiu and Von-Wun Soo. 2011. Subgoal identifications in reinforcement learning: A survey. Advances in Reinforcement Learning (2011), 181--188.Google ScholarCross Ref
- Ozgür Şimşek. 2008. Behavioral Building Blocks for Autonomous Agents: Description, Identification, and Learning. Ph.D. Dissertation. University of Massachusetts.Google Scholar
- Ozgür Şimşek and A. Barto. 2007. Betweenness centrality as a basis for forming skills. Technical Report, University of Massachusetts, Department of Computer Science.Google Scholar
- Özgür Şimşek and Andrew G. Barto. 2004. Using relative novelty to identify useful temporal abstractions in reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’04). ACM, New York, 95--103. Google ScholarDigital Library
- Ozgür Şimşek and Andrew G. Barto. 2008. Skill characterization based on betweenness. In Advances in Neural Information Processing Systems (NIPS’08), 1497--1504. Google ScholarDigital Library
- Özgür Şimşek, Alicia P. Wolfe, and Andrew G. Barto. 2005. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the International Conference on Machine Learning (ICML’05). ACM, New York, 816--823. Google ScholarDigital Library
- Marzieh Davoodabadi and Hamid Beigy. 2011. A new method for discovering subgoals and constructing options in reinforcement learning. In Indian International Conference on Artificial Intelligence (IICAI’11). 441--450.Google Scholar
- Peter Dayan. 1993. Improving generalization for temporal difference learning: The successor representation. Neural Computation 5, 4 (1993), 613--624. Google ScholarDigital Library
- Thomas G. Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 1 (Nov. 2000), 227--303. Google ScholarDigital Library
- Bruce L. Digney. 1998. Learning hierarchical control structures for multiple tasks and changing environments. From Animals to Animats 5: Proceedings of the International Conference on the Simulation of Adaptive Behavior. 321--330. Google ScholarDigital Library
- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the International Conference on International Conference on Machine Learning (ICML’16), Vol. 48. 1329--1338. Google ScholarDigital Library
- Negin Entezari, Mohammad Ebrahim Shiri, and Parham Moradi. 2010. A local graph clustering algorithm for discovering subgoals in reinforcement learning. In Communications in Computer and Information Science. Springer, Berlin, 41--50.Google Scholar
- Sandeep Goel and Manfred Huber. 2003. Subgoal discovery for hierarchical reinforcement learning using learned policies. In Proceedings of the International Florida Artificial Intelligence Research Society Conference, Ingrid Russell and Susan M. Haller (Eds.). AAAI Press, 346--350.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS’14). Curran Associates, 2672--2680. Google ScholarDigital Library
- Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. In Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 1462--1471. Google ScholarDigital Library
- Bernhard Hengst. 2002. Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the International Conference on Machine Learning (ICML’02). Morgan Kaufmann Publishers, San Francisco, 243--250. Google ScholarDigital Library
- Bernhard Hengst. 2003. Discovering Hierarchy in Reinforcement Learning. Ph.D. Dissertation. University of New South Wales. Google ScholarDigital Library
- Bernhard Hengst. 2004. Model approximation for HEXQ hierarchical reinforcement learning. In Proceedings of the European Conference on Machine Learning (ECML’04). Springer, Berlin,144--155. Google ScholarDigital Library
- Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and Peter Boncz. 2016. LDBC graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proceedings of the VLDB Endowment 9, 13 (2016), 1317--1328. Google ScholarDigital Library
- Andrej Karpathy and Li Fei-Fei. 2017. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 664--676. Google ScholarDigital Library
- Seyed Jalal Kazemitabar and Hamid Beigy. 2009. Using strongly connected components as a basis for autonomous skill acquisition in reinforcement learning. In Proceedings of the International Symposium on Neural Networks (ISNN’09). Springer, Berlin, 794--803. Google ScholarDigital Library
- Ghorban Kheradmandian and Mohammad Rahmati. 2009. Automatic abstraction in reinforcement learning using data mining techniques. Robotics and Autonomous Systems 57, 11 (2009), 1119--1128. Google ScholarDigital Library
- George Konidaris and Andrew Barto. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems (NIPS’09), Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (Eds.). 1015--1023. Google ScholarDigital Library
- Ramnandan Krishnamurthy, Aravind S. Lakshminarayanan, Peeyush Kumar, and Balaraman Ravindran. 2016. Hierarchical reinforcement learning using spatio-temporal abstractions and deep neural networks. In Proceedings of the International Conference on Machine Learning.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS’12). Curran Associates, 1097--1105. Google ScholarDigital Library
- Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 3 (1992), 293--321. Google ScholarDigital Library
- Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment 8, 3, 281--292. Google ScholarDigital Library
- Marlos C. Machado, Marc G. Bellemare, and Michael H. Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’17). 2295--2304. Google ScholarDigital Library
- Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, and Murray Campbell. 2017. Eigenoption discovery through the deep successor representation. CoRR abs/1710.11089 (2017). Retrieved from http://arxiv.org/abs/1710.11089.Google Scholar
- Sridhar Mahadevan. 2005. Proto-value functions: Developmental reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’05). ACM, New York, 553--560. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, and Aart J. C. Bik. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 135--145. Google ScholarDigital Library
- Daniel J. Mankowitz, Timothy A. Mann, and Shie Mannor. 2016. Adaptive skills adaptive partitions (ASAP). In Advances in Neural Information Processing Systems (NIPS’16), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, 1588--1596. Google ScholarDigital Library
- Shie Mannor, Ishai Menache, Amit Hoze, and Uri Klein. 2004. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the International Conference on Machine Learning (ICML’04). ACM, New York, 71--78. Google ScholarDigital Library
- Oded Maron. 1998. Learning from Ambiguity. Ph.D. Dissertation. Massachusetts Institute of Technology. Google ScholarDigital Library
- Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’98). MIT Press, Cambridge, MA, 570--576. Google ScholarDigital Library
- Vimal Mathew, Peeyush Kumar, and Balaraman Ravindran. 2012. Abstraction in reinforcement learning in terms of metastability. In Proceedings of the European Workshop on Reinforcement Learning (EWRL’12).1--14.Google Scholar
- Amy McGovern and Andrew G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the International Conference on Machine Learning, 361--368. Google ScholarDigital Library
- Amy McGovern, Richard S. Sutton, and Andrew H. Fagg. 1997. Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper Celebration of Women in Computing. 13--18.Google Scholar
- Ishai Menache, Shie Mannor, and Nahum Shimkin. 2002. Q-cut - dynamic discovery of sub-goals in reinforcement learning. Proceedings of the European Conference on Machine Learning, 295--306. Google ScholarDigital Library
- Tomáš Mikolov, Stefan Kombrink, Lukáš Burget, Jan Černocký, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). 5528--5531.Google ScholarCross Ref
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, 1928--1937. Google ScholarDigital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529--533.Google ScholarCross Ref
- Bojan Mohar. 1997. Some Applications of Laplace Eigenvalues of Graphs. Springer Netherlands, Dordrecht, 225--275.Google Scholar
- Parham Moradi, Mohammad Ebrahim Shiri, and Negin Entezari. 2010. Automatic skill acquisition in reinforcement learning agents using connection bridge centrality. In Communications in Computer and Information Science. Springer, Berlin, 51--62.Google Scholar
- Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E 69, 2 (2004), 15.Google ScholarCross Ref
- Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, and Satinder Singh. 2015. Action-conditional video prediction using deep networks in atari games. In Advances in Neural Information Processing Systems (NIPS’15), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, 2863--2871. Google ScholarDigital Library
- Sarah Osentoski and Sridhar Mahadevan. 2010. Basis function construction for hierarchical reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 747--754. Google ScholarDigital Library
- Ronald Edward Parr. 1998. Hierarchical Control and Learning for Markov Decision Processes. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
- Duncan Potts and Bernhard Hengst. 2004. Concurrent discovery of task hierarchies. In AAAI Spring Symposium on Knowledge Representation and Ontology for Autonomous Systems. 1--8.Google Scholar
- Doina Precup. 2000. Temporal Abstraction in Reinforcement Learning. Ph.D. Dissertation. University of Massachusetts. Google ScholarDigital Library
- Martin L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley 8 Sons. Google ScholarDigital Library
- Ali Ajdari Rad, Martin Hasler, and Parham Moradi. 2010. Automatic skill acquisition in reinforcement learning using connection graph stability centrality. In Proceedings of IEEE International Symposium on Circuits and Systems. 697--700.Google ScholarCross Ref
- Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76, 3 (2007), 11.Google ScholarCross Ref
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS’15). Curran Associates, 91--99. Google ScholarDigital Library
- Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (Aug. 2000), 888--905. Google ScholarDigital Library
- Kimberly L. Stachenfeld, Matthew Botvinick, and Samuel J. Gershman. 2014. Design principles of the hippocampal cognitive map. In Advances in Neural Information Processing Systems (NIPS’14). Curran Associates, 2528--2536. Google ScholarDigital Library
- Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press. Google ScholarDigital Library
- Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1999), 181--211. Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR’15). 1--12.Google Scholar
- Nasrin Taghizadeh and Hamid Beigy. 2013. A novel graphical approach to automatic abstraction in reinforcement learning. Robotics and Autonomous Systems 61, 8 (2013), 821--835.Google ScholarCross Ref
- Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, and Koray Kavukcuoglu. 2016. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems (NIPS’16), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, 3486--3494. Google ScholarDigital Library
- Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’17). 1--12.Google Scholar
- Krista Rizman Žalik. 2008. An efficient K’-Means clustering algorithm. Pattern Recognition Letters 29, 9 (2008), 1385--1391. Google ScholarDigital Library
- Christopher J. C. H. Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King’s College, Cambridge, UK.Google Scholar
- Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep. 2004. Perron cluster analysis and its connection to graph partitioning for noisy data. Technical Report, Zuse Institute Berlin (ZIB).Google Scholar
- Klaus Wehmuth and Artur Ziviani. 2011. Distributed location of the critical nodes to network robustness based on spectral analysis. In Latin American Network Operations and Management Symposium (LANOMS’11). 1--8.Google ScholarCross Ref
Index Terms
- Graph-Based Skill Acquisition For Reinforcement Learning
Recommendations
Graph based skill acquisition and transfer Learning for continuous reinforcement learning domains
Introducing connectivity graph to model agent behavior and environment dynamics.A graph based approach for automatic skill acquisition.A skill based transfer learning method in continuous reinforcement learning domain. Since reinforcement learning ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Explanation-Based Learning and Reinforcement Learning: A Unified View
In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of ...
Comments