survey

Graph-Based Skill Acquisition For Reinforcement Learning

Authors:
Matheus R. F. MendonÇa

National Laboratory for Scientific Computing (LNCC), RJ, Brazil

National Laboratory for Scientific Computing (LNCC), RJ, Brazil

0000-0001-5541-7207
View Profile

,
Artur Ziviani

National Laboratory for Scientific Computing (LNCC), RJ, Brazil

National Laboratory for Scientific Computing (LNCC), RJ, Brazil
View Profile

,
AndrÉ M. S. Barreto

National Laboratory for Scientific Computing (LNCC), RJ, Brazil

National Laboratory for Scientific Computing (LNCC), RJ, Brazil
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 52 Issue 1Article No.: 6pp 1–26https://doi.org/10.1145/3291045

Published:13 February 2019Publication History

ACM Computing Surveys

Abstract

In machine learning, Reinforcement Learning (RL) is an important tool for creating intelligent agents that learn solely through experience. One particular subarea within the RL domain that has received great attention is how to define macro-actions, which are temporal abstractions composed of a sequence of primitive actions. This subarea, loosely called skill acquisition, has been under development for several years and has led to better results in a diversity of RL problems. Among the many skill acquisition approaches, graph-based methods have received considerable attention. This survey presents an overview of graph-based skill acquisition methods for RL. We cover a diversity of these approaches and discuss how they evolved throughout the years. Finally, we also discuss the current challenges and open issues in the area of graph-based skill acquisition for RL.

References

Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence. 1726--1734. Google ScholarDigital Library
Pierre-Luc Bacon and Doina Precup. 2013. Using label propagation for learning temporally abstract actions in reinforcement learning. In Proceedings of the Workshop on Multiagent Interaction Networks. 1--7.Google Scholar
Albert-László Barabási. 2016. Network Science. Cambridge University Press.Google Scholar
Andre Barreto, Will Dabney, Remi Munos, Jonathan J. Hunt, Tom Schaul, David Silver, and Hado P. van Hasselt. 2017. Successor features for transfer in reinforcement learning. In Advances in Neural Information Processing Systems (NIPS’17). Curran Associates, 4058--4068. Google ScholarDigital Library
Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, and Stig Petersen. 2016. DeepMind lab. arXiv preprint arXiv:1612.03801 (2016).Google Scholar
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 1 (May 2013), 253--279. Google ScholarDigital Library
Katy Börner, Soma Sanyal, and Alessandro Vespignani. 2007. Network science. Annual Review of Information Science and Technology 41, 1 (2007), 537--607. Google ScholarDigital Library
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI gym. arXiv preprint arXiv:1606.01540 (2016).Google Scholar
Fei Chen, Yang Gao, Shifu Chen, and Zhenduo Ma. 2007. Connect-based subgoal discovery for options in hierarchical reinforcement learning. In Proceedings of the International Conference on Natural Computation (ICNC’07), Vol. 4. 698--702. Google ScholarDigital Library
Cc Chiu and Von-Wun Soo. 2011. Subgoal identifications in reinforcement learning: A survey. Advances in Reinforcement Learning (2011), 181--188.Google ScholarCross Ref
Ozgür Şimşek. 2008. Behavioral Building Blocks for Autonomous Agents: Description, Identification, and Learning. Ph.D. Dissertation. University of Massachusetts.Google Scholar
Ozgür Şimşek and A. Barto. 2007. Betweenness centrality as a basis for forming skills. Technical Report, University of Massachusetts, Department of Computer Science.Google Scholar
Özgür Şimşek and Andrew G. Barto. 2004. Using relative novelty to identify useful temporal abstractions in reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’04). ACM, New York, 95--103. Google ScholarDigital Library
Ozgür Şimşek and Andrew G. Barto. 2008. Skill characterization based on betweenness. In Advances in Neural Information Processing Systems (NIPS’08), 1497--1504. Google ScholarDigital Library
Özgür Şimşek, Alicia P. Wolfe, and Andrew G. Barto. 2005. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the International Conference on Machine Learning (ICML’05). ACM, New York, 816--823. Google ScholarDigital Library
Marzieh Davoodabadi and Hamid Beigy. 2011. A new method for discovering subgoals and constructing options in reinforcement learning. In Indian International Conference on Artificial Intelligence (IICAI’11). 441--450.Google Scholar
Peter Dayan. 1993. Improving generalization for temporal difference learning: The successor representation. Neural Computation 5, 4 (1993), 613--624. Google ScholarDigital Library
Thomas G. Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 1 (Nov. 2000), 227--303. Google ScholarDigital Library
Bruce L. Digney. 1998. Learning hierarchical control structures for multiple tasks and changing environments. From Animals to Animats 5: Proceedings of the International Conference on the Simulation of Adaptive Behavior. 321--330. Google ScholarDigital Library
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the International Conference on International Conference on Machine Learning (ICML’16), Vol. 48. 1329--1338. Google ScholarDigital Library
Negin Entezari, Mohammad Ebrahim Shiri, and Parham Moradi. 2010. A local graph clustering algorithm for discovering subgoals in reinforcement learning. In Communications in Computer and Information Science. Springer, Berlin, 41--50.Google Scholar
Sandeep Goel and Manfred Huber. 2003. Subgoal discovery for hierarchical reinforcement learning using learned policies. In Proceedings of the International Florida Artificial Intelligence Research Society Conference, Ingrid Russell and Susan M. Haller (Eds.). AAAI Press, 346--350.Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS’14). Curran Associates, 2672--2680. Google ScholarDigital Library
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. In Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 1462--1471. Google ScholarDigital Library
Bernhard Hengst. 2002. Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the International Conference on Machine Learning (ICML’02). Morgan Kaufmann Publishers, San Francisco, 243--250. Google ScholarDigital Library
Bernhard Hengst. 2003. Discovering Hierarchy in Reinforcement Learning. Ph.D. Dissertation. University of New South Wales. Google ScholarDigital Library
Bernhard Hengst. 2004. Model approximation for HEXQ hierarchical reinforcement learning. In Proceedings of the European Conference on Machine Learning (ECML’04). Springer, Berlin,144--155. Google ScholarDigital Library
Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and Peter Boncz. 2016. LDBC graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proceedings of the VLDB Endowment 9, 13 (2016), 1317--1328. Google ScholarDigital Library
Andrej Karpathy and Li Fei-Fei. 2017. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 664--676. Google ScholarDigital Library
Seyed Jalal Kazemitabar and Hamid Beigy. 2009. Using strongly connected components as a basis for autonomous skill acquisition in reinforcement learning. In Proceedings of the International Symposium on Neural Networks (ISNN’09). Springer, Berlin, 794--803. Google ScholarDigital Library
Ghorban Kheradmandian and Mohammad Rahmati. 2009. Automatic abstraction in reinforcement learning using data mining techniques. Robotics and Autonomous Systems 57, 11 (2009), 1119--1128. Google ScholarDigital Library
George Konidaris and Andrew Barto. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems (NIPS’09), Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (Eds.). 1015--1023. Google ScholarDigital Library
Ramnandan Krishnamurthy, Aravind S. Lakshminarayanan, Peeyush Kumar, and Balaraman Ravindran. 2016. Hierarchical reinforcement learning using spatio-temporal abstractions and deep neural networks. In Proceedings of the International Conference on Machine Learning.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS’12). Curran Associates, 1097--1105. Google ScholarDigital Library
Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 3 (1992), 293--321. Google ScholarDigital Library
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment 8, 3, 281--292. Google ScholarDigital Library
Marlos C. Machado, Marc G. Bellemare, and Michael H. Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’17). 2295--2304. Google ScholarDigital Library
Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, and Murray Campbell. 2017. Eigenoption discovery through the deep successor representation. CoRR abs/1710.11089 (2017). Retrieved from http://arxiv.org/abs/1710.11089.Google Scholar
Sridhar Mahadevan. 2005. Proto-value functions: Developmental reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’05). ACM, New York, 553--560. Google ScholarDigital Library
Grzegorz Malewicz, Matthew H. Austern, and Aart J. C. Bik. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 135--145. Google ScholarDigital Library
Daniel J. Mankowitz, Timothy A. Mann, and Shie Mannor. 2016. Adaptive skills adaptive partitions (ASAP). In Advances in Neural Information Processing Systems (NIPS’16), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, 1588--1596. Google ScholarDigital Library
Shie Mannor, Ishai Menache, Amit Hoze, and Uri Klein. 2004. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the International Conference on Machine Learning (ICML’04). ACM, New York, 71--78. Google ScholarDigital Library
Oded Maron. 1998. Learning from Ambiguity. Ph.D. Dissertation. Massachusetts Institute of Technology. Google ScholarDigital Library
Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’98). MIT Press, Cambridge, MA, 570--576. Google ScholarDigital Library
Vimal Mathew, Peeyush Kumar, and Balaraman Ravindran. 2012. Abstraction in reinforcement learning in terms of metastability. In Proceedings of the European Workshop on Reinforcement Learning (EWRL’12).1--14.Google Scholar
Amy McGovern and Andrew G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the International Conference on Machine Learning, 361--368. Google ScholarDigital Library
Amy McGovern, Richard S. Sutton, and Andrew H. Fagg. 1997. Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper Celebration of Women in Computing. 13--18.Google Scholar
Ishai Menache, Shie Mannor, and Nahum Shimkin. 2002. Q-cut - dynamic discovery of sub-goals in reinforcement learning. Proceedings of the European Conference on Machine Learning, 295--306. Google ScholarDigital Library
Tomáš Mikolov, Stefan Kombrink, Lukáš Burget, Jan Černocký, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). 5528--5531.Google ScholarCross Ref
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, 1928--1937. Google ScholarDigital Library
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529--533.Google ScholarCross Ref
Bojan Mohar. 1997. Some Applications of Laplace Eigenvalues of Graphs. Springer Netherlands, Dordrecht, 225--275.Google Scholar
Parham Moradi, Mohammad Ebrahim Shiri, and Negin Entezari. 2010. Automatic skill acquisition in reinforcement learning agents using connection bridge centrality. In Communications in Computer and Information Science. Springer, Berlin, 51--62.Google Scholar
Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E 69, 2 (2004), 15.Google ScholarCross Ref
Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, and Satinder Singh. 2015. Action-conditional video prediction using deep networks in atari games. In Advances in Neural Information Processing Systems (NIPS’15), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, 2863--2871. Google ScholarDigital Library
Sarah Osentoski and Sridhar Mahadevan. 2010. Basis function construction for hierarchical reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 747--754. Google ScholarDigital Library
Ronald Edward Parr. 1998. Hierarchical Control and Learning for Markov Decision Processes. Ph.D. Dissertation. University of California, Berkeley.Google Scholar
Duncan Potts and Bernhard Hengst. 2004. Concurrent discovery of task hierarchies. In AAAI Spring Symposium on Knowledge Representation and Ontology for Autonomous Systems. 1--8.Google Scholar
Doina Precup. 2000. Temporal Abstraction in Reinforcement Learning. Ph.D. Dissertation. University of Massachusetts. Google ScholarDigital Library
Martin L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley 8 Sons. Google ScholarDigital Library
Ali Ajdari Rad, Martin Hasler, and Parham Moradi. 2010. Automatic skill acquisition in reinforcement learning using connection graph stability centrality. In Proceedings of IEEE International Symposium on Circuits and Systems. 697--700.Google ScholarCross Ref
Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76, 3 (2007), 11.Google ScholarCross Ref
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS’15). Curran Associates, 91--99. Google ScholarDigital Library
Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (Aug. 2000), 888--905. Google ScholarDigital Library
Kimberly L. Stachenfeld, Matthew Botvinick, and Samuel J. Gershman. 2014. Design principles of the hippocampal cognitive map. In Advances in Neural Information Processing Systems (NIPS’14). Curran Associates, 2528--2536. Google ScholarDigital Library
Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press. Google ScholarDigital Library
Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1999), 181--211. Google ScholarDigital Library
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR’15). 1--12.Google Scholar
Nasrin Taghizadeh and Hamid Beigy. 2013. A novel graphical approach to automatic abstraction in reinforcement learning. Robotics and Autonomous Systems 61, 8 (2013), 821--835.Google ScholarCross Ref
Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, and Koray Kavukcuoglu. 2016. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems (NIPS’16), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, 3486--3494. Google ScholarDigital Library
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’17). 1--12.Google Scholar
Krista Rizman Žalik. 2008. An efficient K’-Means clustering algorithm. Pattern Recognition Letters 29, 9 (2008), 1385--1391. Google ScholarDigital Library
Christopher J. C. H. Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King’s College, Cambridge, UK.Google Scholar
Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep. 2004. Perron cluster analysis and its connection to graph partitioning for noisy data. Technical Report, Zuse Institute Berlin (ZIB).Google Scholar
Klaus Wehmuth and Artur Ziviani. 2011. Distributed location of the critical nodes to network robustness based on spectral analysis. In Latin American Network Operations and Management Symposium (LANOMS’11). 1--8.Google ScholarCross Ref

Index Terms

Graph-Based Skill Acquisition For Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
      2. Unsupervised learning
        Cluster analysis
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
      2. Spectra of graphs

Recommendations

Graph based skill acquisition and transfer Learning for continuous reinforcement learning domains

Introducing connectivity graph to model agent behavior and environment dynamics.A graph based approach for automatic skill acquisition.A skill based transfer learning method in continuous reinforcement learning domain. Since reinforcement learning ...
Read More
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Read More
Explanation-Based Learning and Reinforcement Learning: A Unified View

In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 52, Issue 1
January 2020
758 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3309872
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering
Issue’s Table of Contents
Copyright © 2019 ACM
© 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 February 2019
- Accepted: 1 October 2018
- Revised: 1 August 2018
- Received: 1 January 2018
Published in csur Volume 52, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Skill acquisition
centrality
clustering
graph analytics
reinforcement learning
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 1,211
  Total Downloads
- Downloads (Last 12 months)143
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Graph-Based Skill Acquisition For Reinforcement Learning

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Graph based skill acquisition and transfer Learning for continuous reinforcement learning domains

Reward Shaping in Episodic Reinforcement Learning

Explanation-Based Learning and Reinforcement Learning: A Unified View