skip to main content
article
Free Access

Temporal difference learning and TD-Gammon

Authors Info & Claims
Published:01 March 1995Publication History
Skip Abstract Section

Abstract

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.

References

  1. 1 Berliner, H. Computer Backgammon Sci. Amer. 243, 1, (1980), 64-72.Google ScholarGoogle Scholar
  2. 2 Epstein, S, Towards an ideal trainer, Mach. Learning 15, 3. (1994), 251- 277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 Fahlman, S. E, and Lebiere. C. The cascade-correlation learning architectture. In D. S. Touretky, Ed. Advances in Neura;l Information Processiong System 2, Morgan Kaufmann, San Mateo. Calif., (1990), 524-532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 Fawcett, T.E. and Utgoff P.E. Automatic feature Generation for problem solving Systems. In D. Sleeman and P. Edwards Eds., Machine Learning: Proceeding's of the Ninth International Workshop, Morgan Kaufmann, Mateo. Calif., 1992, 144-153, Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 Hornik. K., Stinchcombe. M. and White., H. Multilayer feedback networks are universal Approximators Neural Networks 2, (1989), 359-366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 Isabelle, J.-F. Auto-apprentissage, a paide de research de neurones, de fonctions heuristic utilities dans les. jeux strageties Master's thesis. Univ of Montreal, 1993Google ScholarGoogle Scholar
  7. 7 Magreal,P. Backgammon, Times Books, Newyork, 19736.Google ScholarGoogle Scholar
  8. 8 Robertie. B, Carbonm Versus silicon: Matching wits with TD-Gammon. Inside Backmonnom 2, 2, (1992), 14-22.Google ScholarGoogle Scholar
  9. 9 Rumelhart, D. E., Hinton. G.E. and Williams, R. J.Learning internal representation by error propogation. In D. Rumelhart and J. McClelland Eds., Parellel Distributed Processing, Vol. 1. MIT Press. Cambridge, Mass., 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 Samuel, A. Some studies in machine learning using the game of checkers Ibm J. of Research and Deveopment 3. (1959), 210-229Google ScholarGoogle Scholar
  11. 11 Schraudolph, N.N. DAyan P. and Sjnoeski, Tj> Temporal difference learning of positoin evaluation in the game of Go. In J. D, Cowan, el al. Eds., Advances in Neural Information Processing Systems 6, 817-824.Morgan Kaufmann, San Mateo, Calif 1994Google ScholarGoogle Scholar
  12. 12 Shannon, C.E Programming aComputer for Playing Chess. Philosophical Mag,41, (1950), 265-275.Google ScholarGoogle Scholar
  13. 13 Sutton, R. S. earning to predict by the methoiods of temporal differences. mach. Learning 3, (1998), 9-44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 Tesauro, G. Neurogammon wins Computer Olympiad. Neura Computation-I, (1989),321-323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 Tesaurou G. Practical issues in Temporal difference learning. Mach. Learning 8, (1992),257-277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 Zadeh, N, and Kobiska, G. On optima doubing in backgammon, Manage, sci. 23 (1977), 853-858.Google ScholarGoogle Scholar

Index Terms

  1. Temporal difference learning and TD-Gammon

                  Recommendations

                  Reviews

                  Jaak Tepandi

                  Complex board games are a natural testing ground for machine learning and artificial intelligence. They are based on experience; they are attractive; and they do not have the safety requirements that sometimes block the use of heuristic methods. Despite recent advances, computer chess seems not to be a success of machine learning as such, because of its reliance on brute force search rather than “intelligent” approaches. This paper presents an interesting example of an opposite situation, the game-learning program TD-Gammon. TD-Gammon is a neural network that trains itself to play backgammon by playing against itself and learning from the outcome. The program has surpassed all previous computer programs that play backgammon. It is based on two main sets of methods: reinforcement learning, with feedback signals indicating how good or bad the system input was; and temporal difference (TD) learning methods, based on the difference between successive predictions. The main ideas of TD-Gammon are presented, the results of training are discussed, and examples of play are given. Tesauro then discusses other possible applications of TD learning in games, robot motor control, and financial trading strategies. The paper is useful for those interested in machine learning, neural networks, or backgammon. For the enthusiastic general public, the interesting message here is that there really are methods and applications where machine learning may result in a definite edge over humans in a complex area requiring judgment and intuition.

                  Access critical reviews of Computing literature here

                  Become a reviewer for Computing Reviews.

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image Communications of the ACM
                    Communications of the ACM  Volume 38, Issue 3
                    March 1995
                    96 pages
                    ISSN:0001-0782
                    EISSN:1557-7317
                    DOI:10.1145/203330
                    Issue’s Table of Contents

                    Copyright © 1995 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 1 March 1995

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • article

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader