ABSTRACT
In this paper, we address a relatively unexplored aspect of designing agents that learn from human training by investigating how the agent's non-task behavior can elicit human feedback of higher quality and quantity. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent's actions, as the foundation for our investigation. Then, we propose two new training interfaces to increase active involvement in the training process and thereby improve the agent's task performance. One provides information on the agent's uncertainty, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent's performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. Subsequent analysis of our results suggests that the organizational maxim about human behavior, "you get what you measure" - i.e., sharing metrics with people causes them to focus on maximizing or minimizing those metrics while de-emphasizing other objectives - also applies to the training of agents, providing a powerful guiding principle for human-agent interface design in general.
- P. Abbeel and A. Ng. Apprenticeship learning via inverse reinforcement learning. ICML, 2004. Google ScholarDigital Library
- B. Argall, B. Browning, and M. Veloso. Learning by demonstration with critique from a human teacher. HRI, 2007. Google ScholarDigital Library
- B. Argall, S. Chernova, M. Veloso, and B. Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 2009. Google ScholarDigital Library
- D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996. Google ScholarDigital Library
- B. Blumberg, M. Downie, Y. Ivanov, M. Berlin, M. Johnson, and B. Tomlinson. Integrated learning for interactive synthetic characters. ACM Transactions on Graphics, 2002. Google ScholarDigital Library
- N. Bohm, G. Kokai, and S. Mandl. Evolving a heuristic function for the game of Tetris. Proc. Lernen, Wissensentdeckung und Adaptivitat LWA, 2004.Google Scholar
- E. Bouwers, J. Visser, and A. Van Deursen. Getting what you measure. Communications of the ACM, 2012. Google ScholarDigital Library
- C. Chao, M. Cakmak, and A. Thomaz. Transparent active learning for robots. HRI, 2010. Google ScholarDigital Library
- S. Chernova and M. Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 2009. Google ScholarDigital Library
- E. Demaine, S. Hohenberger, and D. Liben-Nowell. Tetris is hard, even to approximate. Computing and Combinatorics, 2003. Google ScholarDigital Library
- D. Gill and T. Deeter. Development of the sport orientation questionnaire. Research Quarterly for Exercise and Sport, 1988.Google ScholarCross Ref
- K. Judah, S. Roy, A. Fern, and T. Dietterich. Reinforcement learning via practice and critique advice. Proc. of the 24th AAAI Conference on AI, 2010.Google Scholar
- W. Knox. Learning from Human-Generated Reward. PhD thesis, 2012.Google Scholar
- W. Knox, B. Glass, B. Love, W. Maddox, and P. Stone. How humans teach agents. IJSR, 2012.Google ScholarCross Ref
- W. Knox and P. Stone. Interactively shaping agents via human reinforcement: The TAMER framework. Proc. of the 5th International Conference on Knowledge Capture, 2009. Google ScholarDigital Library
- W. Knox and P. Stone. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. AAMAS, 2010. Google ScholarDigital Library
- W. Knox and P. Stone. Reinforcement learning from human reward: Discounting in episodic tasks. RO-MAN, 2012.Google ScholarCross Ref
- W. Knox and P. Stone. Reinforcement learning from simultaneous human and MDP reward. AAMAS, 2012. Google ScholarDigital Library
- E. Lawrence, P. Shaw, D. Baker, S. Baron-Cohen, A. David, et al. Measuring empathy: reliability and validity of the empathy quotient. Psychological Medicine, 2004.Google ScholarCross Ref
- A. Lockerd and C. Breazeal. Tutelage and socially guided robot learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004.Google Scholar
- R. Maclin and J. Shavlik. Creating advice-taking reinforcement learners. Machine Learning, 1996. Google ScholarDigital Library
- P. Pilarski, M. Dawson, T. Degris, F. Fahimi, J. Carey, and R. Sutton. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning. International Conference on Rehabilitation Robotics, 2011.Google ScholarCross Ref
- H. Suay and S. Chernova. Effect of human guidance and state space size on interactive reinforcement learning. RO-MAN, 2011.Google ScholarCross Ref
- R. Sutton and A. Barto. Reinforcement learning: An introduction. Cambridge Univ Press, 1998. Google ScholarDigital Library
- I. Szita and A. Lorincz. Learning Tetris Using the Noisy Cross-Entropy Method. Neural Computation, 2006. Google ScholarDigital Library
- M. Taylor and S. Chernova. Integrating human demonstration and reinforcement learning: Initial results in human-agent transfer. AAMAS Workshop, 2010.Google Scholar
- A. Tenorio-Gonzalez, E. Morales, and L. Villaseñor-Pineda. Dynamic reward shaping: training a robot by voice. Advances in Artificial Intelligence - IBERAMIA, 2010. Google ScholarDigital Library
- A. Thomaz and C. Breazeal. Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. Proc. of the National Conference on AI, 2006. Google ScholarDigital Library
- A. Thomaz and C. Breazeal. Transparency and socially guided machine learning. ICDL, 2006.Google Scholar
- A. Thomaz, G. Hoffman, and C. Breazeal. Real-time interactive reinforcement learning for robots. AAAI Workshop, 2005.Google Scholar
- C. Watkins and P. Dayan. Q-learning. Machine Learning, 1992.Google Scholar
Index Terms
- Using informative behavior to increase engagement in the tamer framework
Recommendations
Using informative behavior to increase engagement while learning from human reward
In this work, we address a relatively unexplored aspect of designing agents that learn from human reward. We investigate how an agent's non-task behavior can affect a human trainer's training and agent learning. We use the TAMER framework, which ...
Leveraging social networks to motivate humans to train agents
AAMAS '14: Proceedings of the 2014 international conference on Autonomous agents and multi-agent systemsLearning from rewards generated by a human trainer observing the agent in action has been demonstrated to be an effective method for humans to teach an agent to perform challenging tasks. However, how to make the agent learn most efficiently from these ...
The effects of cooperative agent behavior on human cooperativeness
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2In this paper we examine the question of how cooperativeness of a software agent affects cooperativeness of a human player. Our data shows that humans behave more cooperatively towards agents that negotiate with them in a cooperative way.
Comments