skip to main content
survey

Graph-Based Skill Acquisition For Reinforcement Learning

Published:13 February 2019Publication History
Skip Abstract Section

Abstract

In machine learning, Reinforcement Learning (RL) is an important tool for creating intelligent agents that learn solely through experience. One particular subarea within the RL domain that has received great attention is how to define macro-actions, which are temporal abstractions composed of a sequence of primitive actions. This subarea, loosely called skill acquisition, has been under development for several years and has led to better results in a diversity of RL problems. Among the many skill acquisition approaches, graph-based methods have received considerable attention. This survey presents an overview of graph-based skill acquisition methods for RL. We cover a diversity of these approaches and discuss how they evolved throughout the years. Finally, we also discuss the current challenges and open issues in the area of graph-based skill acquisition for RL.

References

  1. Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence. 1726--1734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Pierre-Luc Bacon and Doina Precup. 2013. Using label propagation for learning temporally abstract actions in reinforcement learning. In Proceedings of the Workshop on Multiagent Interaction Networks. 1--7.Google ScholarGoogle Scholar
  3. Albert-László Barabási. 2016. Network Science. Cambridge University Press.Google ScholarGoogle Scholar
  4. Andre Barreto, Will Dabney, Remi Munos, Jonathan J. Hunt, Tom Schaul, David Silver, and Hado P. van Hasselt. 2017. Successor features for transfer in reinforcement learning. In Advances in Neural Information Processing Systems (NIPS’17). Curran Associates, 4058--4068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright, Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton, Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, and Stig Petersen. 2016. DeepMind lab. arXiv preprint arXiv:1612.03801 (2016).Google ScholarGoogle Scholar
  6. Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 1 (May 2013), 253--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Katy Börner, Soma Sanyal, and Alessandro Vespignani. 2007. Network science. Annual Review of Information Science and Technology 41, 1 (2007), 537--607. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI gym. arXiv preprint arXiv:1606.01540 (2016).Google ScholarGoogle Scholar
  9. Fei Chen, Yang Gao, Shifu Chen, and Zhenduo Ma. 2007. Connect-based subgoal discovery for options in hierarchical reinforcement learning. In Proceedings of the International Conference on Natural Computation (ICNC’07), Vol. 4. 698--702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cc Chiu and Von-Wun Soo. 2011. Subgoal identifications in reinforcement learning: A survey. Advances in Reinforcement Learning (2011), 181--188.Google ScholarGoogle ScholarCross RefCross Ref
  11. Ozgür Şimşek. 2008. Behavioral Building Blocks for Autonomous Agents: Description, Identification, and Learning. Ph.D. Dissertation. University of Massachusetts.Google ScholarGoogle Scholar
  12. Ozgür Şimşek and A. Barto. 2007. Betweenness centrality as a basis for forming skills. Technical Report, University of Massachusetts, Department of Computer Science.Google ScholarGoogle Scholar
  13. Özgür Şimşek and Andrew G. Barto. 2004. Using relative novelty to identify useful temporal abstractions in reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’04). ACM, New York, 95--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ozgür Şimşek and Andrew G. Barto. 2008. Skill characterization based on betweenness. In Advances in Neural Information Processing Systems (NIPS’08), 1497--1504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Özgür Şimşek, Alicia P. Wolfe, and Andrew G. Barto. 2005. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the International Conference on Machine Learning (ICML’05). ACM, New York, 816--823. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Marzieh Davoodabadi and Hamid Beigy. 2011. A new method for discovering subgoals and constructing options in reinforcement learning. In Indian International Conference on Artificial Intelligence (IICAI’11). 441--450.Google ScholarGoogle Scholar
  17. Peter Dayan. 1993. Improving generalization for temporal difference learning: The successor representation. Neural Computation 5, 4 (1993), 613--624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thomas G. Dietterich. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 1 (Nov. 2000), 227--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bruce L. Digney. 1998. Learning hierarchical control structures for multiple tasks and changing environments. From Animals to Animats 5: Proceedings of the International Conference on the Simulation of Adaptive Behavior. 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. Benchmarking deep reinforcement learning for continuous control. In Proceedings of the International Conference on International Conference on Machine Learning (ICML’16), Vol. 48. 1329--1338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Negin Entezari, Mohammad Ebrahim Shiri, and Parham Moradi. 2010. A local graph clustering algorithm for discovering subgoals in reinforcement learning. In Communications in Computer and Information Science. Springer, Berlin, 41--50.Google ScholarGoogle Scholar
  22. Sandeep Goel and Manfred Huber. 2003. Subgoal discovery for hierarchical reinforcement learning using learned policies. In Proceedings of the International Florida Artificial Intelligence Research Society Conference, Ingrid Russell and Susan M. Haller (Eds.). AAAI Press, 346--350.Google ScholarGoogle Scholar
  23. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS’14). Curran Associates, 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. 2015. DRAW: A recurrent neural network for image generation. In Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research), Francis Bach and David Blei (Eds.), Vol. 37. PMLR, Lille, France, 1462--1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bernhard Hengst. 2002. Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the International Conference on Machine Learning (ICML’02). Morgan Kaufmann Publishers, San Francisco, 243--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Bernhard Hengst. 2003. Discovering Hierarchy in Reinforcement Learning. Ph.D. Dissertation. University of New South Wales. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bernhard Hengst. 2004. Model approximation for HEXQ hierarchical reinforcement learning. In Proceedings of the European Conference on Machine Learning (ECML’04). Springer, Berlin,144--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardto, Hassan Chafio, Mihai Capotă, Narayanan Sundaram, Michael Anderson, Ilie Gabriel Tănase, Yinglong Xia, Lifeng Nai, and Peter Boncz. 2016. LDBC graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proceedings of the VLDB Endowment 9, 13 (2016), 1317--1328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andrej Karpathy and Li Fei-Fei. 2017. Deep visual-semantic alignments for generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (2017), 664--676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Seyed Jalal Kazemitabar and Hamid Beigy. 2009. Using strongly connected components as a basis for autonomous skill acquisition in reinforcement learning. In Proceedings of the International Symposium on Neural Networks (ISNN’09). Springer, Berlin, 794--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ghorban Kheradmandian and Mohammad Rahmati. 2009. Automatic abstraction in reinforcement learning using data mining techniques. Robotics and Autonomous Systems 57, 11 (2009), 1119--1128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. George Konidaris and Andrew Barto. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems (NIPS’09), Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (Eds.). 1015--1023. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ramnandan Krishnamurthy, Aravind S. Lakshminarayanan, Peeyush Kumar, and Balaraman Ravindran. 2016. Hierarchical reinforcement learning using spatio-temporal abstractions and deep neural networks. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle Scholar
  34. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS’12). Curran Associates, 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 3 (1992), 293--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-scale distributed graph computing systems: An experimental evaluation. Proceedings of the VLDB Endowment 8, 3, 281--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Marlos C. Machado, Marc G. Bellemare, and Michael H. Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’17). 2295--2304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, and Murray Campbell. 2017. Eigenoption discovery through the deep successor representation. CoRR abs/1710.11089 (2017). Retrieved from http://arxiv.org/abs/1710.11089.Google ScholarGoogle Scholar
  39. Sridhar Mahadevan. 2005. Proto-value functions: Developmental reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’05). ACM, New York, 553--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Grzegorz Malewicz, Matthew H. Austern, and Aart J. C. Bik. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 135--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Daniel J. Mankowitz, Timothy A. Mann, and Shie Mannor. 2016. Adaptive skills adaptive partitions (ASAP). In Advances in Neural Information Processing Systems (NIPS’16), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, 1588--1596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Shie Mannor, Ishai Menache, Amit Hoze, and Uri Klein. 2004. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the International Conference on Machine Learning (ICML’04). ACM, New York, 71--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Oded Maron. 1998. Learning from Ambiguity. Ph.D. Dissertation. Massachusetts Institute of Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Oded Maron and Tomás Lozano-Pérez. 1998. A framework for multiple-instance learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’98). MIT Press, Cambridge, MA, 570--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Vimal Mathew, Peeyush Kumar, and Balaraman Ravindran. 2012. Abstraction in reinforcement learning in terms of metastability. In Proceedings of the European Workshop on Reinforcement Learning (EWRL’12).1--14.Google ScholarGoogle Scholar
  46. Amy McGovern and Andrew G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the International Conference on Machine Learning, 361--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Amy McGovern, Richard S. Sutton, and Andrew H. Fagg. 1997. Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper Celebration of Women in Computing. 13--18.Google ScholarGoogle Scholar
  48. Ishai Menache, Shie Mannor, and Nahum Shimkin. 2002. Q-cut - dynamic discovery of sub-goals in reinforcement learning. Proceedings of the European Conference on Machine Learning, 295--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tomáš Mikolov, Stefan Kombrink, Lukáš Burget, Jan Černocký, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). 5528--5531.Google ScholarGoogle ScholarCross RefCross Ref
  50. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.), Vol. 48. PMLR, New York, 1928--1937. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (Feb. 2015), 529--533.Google ScholarGoogle ScholarCross RefCross Ref
  52. Bojan Mohar. 1997. Some Applications of Laplace Eigenvalues of Graphs. Springer Netherlands, Dordrecht, 225--275.Google ScholarGoogle Scholar
  53. Parham Moradi, Mohammad Ebrahim Shiri, and Negin Entezari. 2010. Automatic skill acquisition in reinforcement learning agents using connection bridge centrality. In Communications in Computer and Information Science. Springer, Berlin, 51--62.Google ScholarGoogle Scholar
  54. Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E 69, 2 (2004), 15.Google ScholarGoogle ScholarCross RefCross Ref
  55. Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, and Satinder Singh. 2015. Action-conditional video prediction using deep networks in atari games. In Advances in Neural Information Processing Systems (NIPS’15), C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, 2863--2871. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Sarah Osentoski and Sridhar Mahadevan. 2010. Basis function construction for hierarchical reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 747--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ronald Edward Parr. 1998. Hierarchical Control and Learning for Markov Decision Processes. Ph.D. Dissertation. University of California, Berkeley.Google ScholarGoogle Scholar
  58. Duncan Potts and Bernhard Hengst. 2004. Concurrent discovery of task hierarchies. In AAAI Spring Symposium on Knowledge Representation and Ontology for Autonomous Systems. 1--8.Google ScholarGoogle Scholar
  59. Doina Precup. 2000. Temporal Abstraction in Reinforcement Learning. Ph.D. Dissertation. University of Massachusetts. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Martin L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley 8 Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ali Ajdari Rad, Martin Hasler, and Parham Moradi. 2010. Automatic skill acquisition in reinforcement learning using connection graph stability centrality. In Proceedings of IEEE International Symposium on Circuits and Systems. 697--700.Google ScholarGoogle ScholarCross RefCross Ref
  62. Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76, 3 (2007), 11.Google ScholarGoogle ScholarCross RefCross Ref
  63. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS’15). Curran Associates, 91--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (Aug. 2000), 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Kimberly L. Stachenfeld, Matthew Botvinick, and Samuel J. Gershman. 2014. Design principles of the hippocampal cognitive map. In Advances in Neural Information Processing Systems (NIPS’14). Curran Associates, 2528--2536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1999), 181--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR’15). 1--12.Google ScholarGoogle Scholar
  69. Nasrin Taghizadeh and Hamid Beigy. 2013. A novel graphical approach to automatic abstraction in reinforcement learning. Robotics and Autonomous Systems 61, 8 (2013), 821--835.Google ScholarGoogle ScholarCross RefCross Ref
  70. Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, and Koray Kavukcuoglu. 2016. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems (NIPS’16), D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, 3486--3494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. In Proceedings of the International Conference on Machine Learning (ICML’17). 1--12.Google ScholarGoogle Scholar
  72. Krista Rizman Žalik. 2008. An efficient K’-Means clustering algorithm. Pattern Recognition Letters 29, 9 (2008), 1385--1391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Christopher J. C. H. Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. King’s College, Cambridge, UK.Google ScholarGoogle Scholar
  74. Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep. 2004. Perron cluster analysis and its connection to graph partitioning for noisy data. Technical Report, Zuse Institute Berlin (ZIB).Google ScholarGoogle Scholar
  75. Klaus Wehmuth and Artur Ziviani. 2011. Distributed location of the critical nodes to network robustness based on spectral analysis. In Latin American Network Operations and Management Symposium (LANOMS’11). 1--8.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Graph-Based Skill Acquisition For Reinforcement Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys  Volume 52, Issue 1
            January 2020
            758 pages
            ISSN:0360-0300
            EISSN:1557-7341
            DOI:10.1145/3309872
            • Editor:
            • Sartaj Sahni
            Issue’s Table of Contents

            Copyright © 2019 ACM

            © 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 February 2019
            • Accepted: 1 October 2018
            • Revised: 1 August 2018
            • Received: 1 January 2018
            Published in csur Volume 52, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • survey
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format