Temporal difference learning and TD-Gammon

Author:
Gerald Tesauro

IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY

IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY
View Profile

Authors Info & Claims

Communications of the ACM Volume 38 Issue 3March 1995pp 58–68https://doi.org/10.1145/203330.203343

Published:01 March 1995Publication History

Communications of the ACM

Abstract

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.

References

1 Berliner, H. Computer Backgammon Sci. Amer. 243, 1, (1980), 64-72.Google Scholar
2 Epstein, S, Towards an ideal trainer, Mach. Learning 15, 3. (1994), 251- 277. Google ScholarDigital Library
3 Fahlman, S. E, and Lebiere. C. The cascade-correlation learning architectture. In D. S. Touretky, Ed. Advances in Neura;l Information Processiong System 2, Morgan Kaufmann, San Mateo. Calif., (1990), 524-532. Google ScholarDigital Library
4 Fawcett, T.E. and Utgoff P.E. Automatic feature Generation for problem solving Systems. In D. Sleeman and P. Edwards Eds., Machine Learning: Proceeding's of the Ninth International Workshop, Morgan Kaufmann, Mateo. Calif., 1992, 144-153, Google ScholarDigital Library
5 Hornik. K., Stinchcombe. M. and White., H. Multilayer feedback networks are universal Approximators Neural Networks 2, (1989), 359-366. Google ScholarDigital Library
6 Isabelle, J.-F. Auto-apprentissage, a paide de research de neurones, de fonctions heuristic utilities dans les. jeux strageties Master's thesis. Univ of Montreal, 1993Google Scholar
7 Magreal,P. Backgammon, Times Books, Newyork, 19736.Google Scholar
8 Robertie. B, Carbonm Versus silicon: Matching wits with TD-Gammon. Inside Backmonnom 2, 2, (1992), 14-22.Google Scholar
9 Rumelhart, D. E., Hinton. G.E. and Williams, R. J.Learning internal representation by error propogation. In D. Rumelhart and J. McClelland Eds., Parellel Distributed Processing, Vol. 1. MIT Press. Cambridge, Mass., 1986. Google ScholarDigital Library
10 Samuel, A. Some studies in machine learning using the game of checkers Ibm J. of Research and Deveopment 3. (1959), 210-229Google Scholar
11 Schraudolph, N.N. DAyan P. and Sjnoeski, Tj> Temporal difference learning of positoin evaluation in the game of Go. In J. D, Cowan, el al. Eds., Advances in Neural Information Processing Systems 6, 817-824.Morgan Kaufmann, San Mateo, Calif 1994Google Scholar
12 Shannon, C.E Programming aComputer for Playing Chess. Philosophical Mag,41, (1950), 265-275.Google Scholar
13 Sutton, R. S. earning to predict by the methoiods of temporal differences. mach. Learning 3, (1998), 9-44. Google ScholarDigital Library
14 Tesauro, G. Neurogammon wins Computer Olympiad. Neura Computation-I, (1989),321-323.Google ScholarDigital Library
15 Tesaurou G. Practical issues in Temporal difference learning. Mach. Learning 8, (1992),257-277. Google ScholarDigital Library
16 Zadeh, N, and Kobiska, G. On optima doubing in backgammon, Manage, sci. 23 (1977), 853-858.Google Scholar

Index Terms

Recommendations

GP-Gammon: Genetically Programming Backgammon Players

We apply genetic programming to the evolution of strategies for playing the game of backgammon. We explore two different strategies of learning: using a fixed external opponent as teacher, and letting the individuals play against each other. We conclude ...
Read More
GP-Gammon: using genetic programming to evolve backgammon players
EuroGP'05: Proceedings of the 8th European conference on Genetic Programming

We apply genetic programming to the evolution of strategies for playing the game of backgammon. Pitted in a 1000-game tournament against a standard benchmark player—Pubeval—our best evolved program wins 58% of the games, the highest verifiable result to ...
Read More
TD-Gammon, a self-teaching backgammon program, achieves master-level play

TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(») reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (...
Read More

Reviews

Reviewer: Jaak Tepandi

Complex board games are a natural testing ground for machine learning and artificial intelligence. They are based on experience; they are attractive; and they do not have the safety requirements that sometimes block the use of heuristic methods. Despite recent advances, computer chess seems not to be a success of machine learning as such, because of its reliance on brute force search rather than “intelligent” approaches. This paper presents an interesting example of an opposite situation, the game-learning program TD-Gammon. TD-Gammon is a neural network that trains itself to play backgammon by playing against itself and learning from the outcome. The program has surpassed all previous computer programs that play backgammon. It is based on two main sets of methods: reinforcement learning, with feedback signals indicating how good or bad the system input was; and temporal difference (TD) learning methods, based on the difference between successive predictions. The main ideas of TD-Gammon are presented, the results of training are discussed, and examples of play are given. Tesauro then discusses other possible applications of TD learning in games, robot motor control, and financial trading strategies. The paper is useful for those interested in machine learning, neural networks, or backgammon. For the enthusiastic general public, the interesting message here is that there really are methods and applications where machine learning may result in a definite edge over humans in a complex area requiring judgment and intuition.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 38, Issue 3
March 1995
96 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/203330
Editor:
Jacques Cohen
Brandeis Univ., Waltham, MA
Issue’s Table of Contents
Copyright © 1995 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 1995
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,025
  Total Citations
  View Citations
- 5,734
  Total Downloads
- Downloads (Last 12 months)675
- Downloads (Last 6 weeks)123
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Temporal difference learning and TD-Gammon

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

GP-Gammon: Genetically Programming Backgammon Players

GP-Gammon: using genetic programming to evolve backgammon players

TD-Gammon, a self-teaching backgammon program, achieves master-level play

Reviews

Access critical reviews of Computing literature here