Abstract
Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.
- 1 Berliner, H. Computer Backgammon Sci. Amer. 243, 1, (1980), 64-72.Google Scholar
- 2 Epstein, S, Towards an ideal trainer, Mach. Learning 15, 3. (1994), 251- 277. Google ScholarDigital Library
- 3 Fahlman, S. E, and Lebiere. C. The cascade-correlation learning architectture. In D. S. Touretky, Ed. Advances in Neura;l Information Processiong System 2, Morgan Kaufmann, San Mateo. Calif., (1990), 524-532. Google ScholarDigital Library
- 4 Fawcett, T.E. and Utgoff P.E. Automatic feature Generation for problem solving Systems. In D. Sleeman and P. Edwards Eds., Machine Learning: Proceeding's of the Ninth International Workshop, Morgan Kaufmann, Mateo. Calif., 1992, 144-153, Google ScholarDigital Library
- 5 Hornik. K., Stinchcombe. M. and White., H. Multilayer feedback networks are universal Approximators Neural Networks 2, (1989), 359-366. Google ScholarDigital Library
- 6 Isabelle, J.-F. Auto-apprentissage, a paide de research de neurones, de fonctions heuristic utilities dans les. jeux strageties Master's thesis. Univ of Montreal, 1993Google Scholar
- 7 Magreal,P. Backgammon, Times Books, Newyork, 19736.Google Scholar
- 8 Robertie. B, Carbonm Versus silicon: Matching wits with TD-Gammon. Inside Backmonnom 2, 2, (1992), 14-22.Google Scholar
- 9 Rumelhart, D. E., Hinton. G.E. and Williams, R. J.Learning internal representation by error propogation. In D. Rumelhart and J. McClelland Eds., Parellel Distributed Processing, Vol. 1. MIT Press. Cambridge, Mass., 1986. Google ScholarDigital Library
- 10 Samuel, A. Some studies in machine learning using the game of checkers Ibm J. of Research and Deveopment 3. (1959), 210-229Google Scholar
- 11 Schraudolph, N.N. DAyan P. and Sjnoeski, Tj> Temporal difference learning of positoin evaluation in the game of Go. In J. D, Cowan, el al. Eds., Advances in Neural Information Processing Systems 6, 817-824.Morgan Kaufmann, San Mateo, Calif 1994Google Scholar
- 12 Shannon, C.E Programming aComputer for Playing Chess. Philosophical Mag,41, (1950), 265-275.Google Scholar
- 13 Sutton, R. S. earning to predict by the methoiods of temporal differences. mach. Learning 3, (1998), 9-44. Google ScholarDigital Library
- 14 Tesauro, G. Neurogammon wins Computer Olympiad. Neura Computation-I, (1989),321-323.Google ScholarDigital Library
- 15 Tesaurou G. Practical issues in Temporal difference learning. Mach. Learning 8, (1992),257-277. Google ScholarDigital Library
- 16 Zadeh, N, and Kobiska, G. On optima doubing in backgammon, Manage, sci. 23 (1977), 853-858.Google Scholar
Index Terms
- Temporal difference learning and TD-Gammon
Recommendations
GP-Gammon: Genetically Programming Backgammon Players
We apply genetic programming to the evolution of strategies for playing the game of backgammon. We explore two different strategies of learning: using a fixed external opponent as teacher, and letting the individuals play against each other. We conclude ...
GP-Gammon: using genetic programming to evolve backgammon players
EuroGP'05: Proceedings of the 8th European conference on Genetic ProgrammingWe apply genetic programming to the evolution of strategies for playing the game of backgammon. Pitted in a 1000-game tournament against a standard benchmark player—Pubeval—our best evolved program wins 58% of the games, the highest verifiable result to ...
TD-Gammon, a self-teaching backgammon program, achieves master-level play
TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(») reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (...
Comments