research-article

Recursive Markov Decision Processes and Recursive Stochastic Games

Authors:
Kousha Etessami

University of Edinburgh

University of Edinburgh
View Profile

,
Mihalis Yannakakis

Columbia University

Columbia University
View Profile

Authors Info & Claims

Journal of the ACM Volume 62 Issue 2Article No.: 11pp 1–69https://doi.org/10.1145/2699431

Published:06 May 2015Publication History

Journal of the ACM

Abstract

We introduce Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs), which are classes of (finitely presented) countable-state MDPs and zero-sum turn-based (perfect information) stochastic games. They extend standard finite-state MDPs and stochastic games with a recursion feature. We study the decidability and computational complexity of these games under termination objectives for the two players: one player's goal is to maximize the probability of termination at a given exit, while the other player's goal is to minimize this probability. In the quantitative termination problems, given an RMDP (or RSSG) and probability p, we wish to decide whether the value of such a termination game is at least p (or at most p); in the qualitative termination problem we wish to decide whether the value is 1. The important 1-exit subclasses of these models, 1-RMDPs and 1-RSSGs, correspond in a precise sense to controlled and game versions of classic stochastic models, including multitype Branching Processes and Stochastic Context-Free Grammars, where the objective of the players is to maximize or minimize the probability of termination (extinction).

We provide a number of upper and lower bounds for qualitative and quantitative termination problems for RMDPs and RSSGs. We show both problems are undecidable for multi-exit RMDPs, but are decidable for 1-RMDPs and 1-RSSGs. Specifically, the quantitative termination problem is decidable in PSPACE for both 1-RMDPs and 1-RSSGs, and is at least as hard as the square root sum problem, a well-known open problem in numerical computation. We show that the qualitative termination problem for 1-RMDPs (i.e., a controlled version of branching processes) can be solved in polynomial time both for maximizing and minimizing 1-RMDPs. The qualitative problem for 1-RSSGs is in NP ∩ coNP, and is at least as hard as the quantitative termination problem for Condon's finite-state simple stochastic games, whose complexity remains a well known open problem. Finally, we show that even for 1-RMDPs, more general (qualitative and quantitative) model-checking problems with respect to linear-time temporal properties are undecidable even for a fixed property.

References

E. Allender, P. Bürgisser, J. Kjeldgaard-Pedersen, and P. B. Miltersen. 2009. On the complexity of numerical analysis. SIAM J. Comput. 38, 5, 1987--2006. Earlier conference version appeared in Proc., 21st IEEE Conference on Computational Complexity. Google ScholarDigital Library
R. Alur, M. Benedikt, K. Etessami, P. Godefroid, T. Reps, and M. Yannakakis. 2005. Analysis of recursive state machines. ACM Trans. Program. Lang. Syst. 27, 4, 786--818. Google ScholarDigital Library
R. Alur, C. Courcoubetis, and M. Yannakakis. 1995. Distinguishing tests for nondeterministic and probabilistic machines. In Proceedings of the 20th STOC. 363--372. Google ScholarDigital Library
K. B. Athreya and P. E. Ney. 1972. Branching processes. Springer-Verlag.Google Scholar
C. Baier and M. Kwiatkowska. 1998. Model checking for a probabilistic branching time logic with fairness. Distrib. Comput. 11, 3, 125--155. Google ScholarDigital Library
V. Blondel and V. Canterini. 2003. Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Syst. 36, 231--245.Google ScholarCross Ref
T. Brázdil, V. Brozek, and K. Etessami. 2010a. One-counter stochastic games. In Proceedings of the FSTTCS'10. 108--119.Google Scholar
T. Brázdil, V. Brozek, K. Etessami, and T. Kucera. 2011a. Approximating the termination value of one-counter mdps and stochastic games. In Proceedings of the ICALP'11. Google ScholarDigital Library
T. Brázdil, V. Brozek, K. Etessami, T. Kucera, and D. Wojtczak. 2010b. One-counter Markov decision processes. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA'10). Google ScholarDigital Library
T. Brázdil, V. Brozek, V. Forejt, and A. Kucera. 2006. Reachability in recursive Markov decision processes. In Proceedings of the 17th International CONCUR. 358--374. Google ScholarDigital Library
T. Brázdil, V. Brozek, A. Kucera, and J. Obdrzálek. 2011b. Qualitative reachability in stochastic bpa games. Inf. Computat. 209, 8, 1160--1183. Google ScholarDigital Library
T. Brázdil, A. Kučcera, and O. Stražzovský 2005. Decidability of temporal properties of probabilistic pushdown automata. In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS). Google ScholarDigital Library
J. Canny. 1988. Some algebraic and geometric computations in PSPACE. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC). 460--467. Google ScholarDigital Library
K. Chatterjee, L. de Alfaro, and T. A. Henzinger. 2006. The complexity of quantitative concurrent parity games. In Proceedings of the SODA. 678--687. Google ScholarDigital Library
A. Condon. 1992. The complexity of stochastic games. Inf. Comput. 96, 2, 203--224. Google ScholarDigital Library
A. Condon and R. Lipton. 1989. The complexity of space bounded interactive proofs. In Proceedings of the 30th IEEE FOCS. Google ScholarDigital Library
C. Courcoubetis and M. Yannakakis. 1995. The complexity of probabilistic verification. J. ACM 42, 4, 857--907. Google ScholarDigital Library
C. Courcoubetis and M. Yannakakis. 1998. Markov decision processes and regular events. IEEE Trans. Automat. Cont. 43, 10, 1399--1418.Google ScholarCross Ref
L. de Alfaro, T. A. Henzinger, and O. Kupferman. 2007. Concurrent reachability games. Theoret. Comput. Sci. 386, 3, 188--217. Google ScholarDigital Library
L. de Alfaro, M. Kwiatkowska, G. Norman, D. Parker, and R. Segala. 2000. Symbolic model checking of probabilistic processes using MTBDDs and the Kronecker representation. In Proceedings of the 6th TACAS. 395--410. Google ScholarDigital Library
L. de Alfaro and R. Majumdar. 2004. Quantitative solution of omega-regular games. J. Comput. Syst. Sci. 68, 2, 374--397. Google ScholarDigital Library
E. Denardo and U. Rothblum. 2005. Totally expanding multiplicative systems. Lin. Alg. Appl. 406, 142--158.Google ScholarCross Ref
E. V. Denardo and U. G. Rothblum. 2006. A turnpike theorem for a risk-sensitive markov decision process with stopping. SIAM J. Control Optimiz. 45, 2, 414--431. Google ScholarDigital Library
A. Ehrenfeucht and J. Mycielski. 1979. Positional strategies for mean payoff games. Int. J. Game Theory 8, 109--113.Google ScholarDigital Library
E. A. Emerson and C. S. Jutla. 1991. Tree automata, mu-calculus, and determinacy. In Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science (FOCS). 368--377. Google ScholarDigital Library
J. Esparza, A. Gaiser, and S. Kiefer. 2013. A strongly polynomial algorithm for criticality of branching processes and consistency of stochastic context-free grammars. Inf. Process. Lett. 113, 10--11, 381--385.Google ScholarCross Ref
J. Esparza, T. Gawlitza, S. Kiefer, and H. Seidl. 2008. Approximative methods for monotone systems of min-max-polynomial equations. In Proceedings of the ICALP (1). 698--710. Google ScholarDigital Library
J. Esparza, S. Kiefer, and M. Luttenberger. 2010. Computing the least fixed point of positive polynomial systems. SIAM J. Comput. 39, 6, 2282--2335.Google ScholarDigital Library
J. Esparza, A. Kučcera, and R. Mayr. 2004. Model checking probabilistic pushdown automata. In Proceedings of the 19th IEEE Symposium on Logic in Computer Science (LICS). 12--21. Google ScholarDigital Library
K. Etessami, A. Stewart, and M. Yannakakis. 2012a. Polynomial-time algorithms for multi-type branching processes and stochastic context-free grammars. In Proceedings of the 44th ACM STOC. (Full version is available at ArXiv:1201.2374.) Google ScholarDigital Library
K. Etessami, A. Stewart, and M. Yannakakis. 2012b. Polynomial time algorithms for branching Markov decision processes and probabilistic min(max) polynomial Bellman equations. In Proceedings of the 39th International Colloquium on Automata, Languages and Programming (ICALP). 314--326. (Full version at ArXiv:1202.4798.) Google ScholarDigital Library
K. Etessami, D. Wojtczak, and M. Yannakakis. 2008. Recursive stochastic games with positive rewards. In Proceedings of the 35th ICALP. 71--723. Google ScholarDigital Library
K. Etessami, D. Wojtczak, and M. Yannakakis. 2010. Quasi-birth-death processes, tree-like QBDs, probabilistic 1-counter automata, and pushdown systems. Perform. Eval. 67, 9, 837--857. (conference version appeared in QEST'08.) Google ScholarDigital Library
K. Etessami and M. Yannakakis. 2005. Recursive Markov decision processes and recursive stochastic games. In Proceedings of the 32nd International Colloquium on Automata, Languages, and Programming (ICALP). 891--903. Google ScholarDigital Library
K. Etessami and M. Yannakakis. 2006. Efficient qualitative analysis of classes of recursive Markov decision processes and simple stochastic games. In Proceedings of the 23rd STACS'06. Springer. Google ScholarDigital Library
K. Etessami and M. Yannakakis. 2008. Recursive concurrent stochastic games. Logi. Meth. Comput. Sci. 4, 4. (Conference version appeared in ICALP'06.) Google ScholarDigital Library
K. Etessami and M. Yannakakis. 2009. Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56, 1. Google ScholarDigital Library
K. Etessami and M. Yannakakis. 2010. On the complexity of Nash equilibria and other fixed points. SIAM J. Comput. 39, 6, 2531--2597. (Conference version appeared in Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS'07.)Google ScholarDigital Library
K. Etessami and M. Yannakakis. 2012. Model checking of recursive probabilistic systems. ACM Trans. Computat. Logic 13. Google ScholarDigital Library
C. J. Everett and S. Ulam. 1948. Multiplicative systems, Part I, II, and III. Tech. Rep. 683,690,707, Los Alamos Scientific Laboratory.Google Scholar
H. Everett. 1957. Recursive games. In Contributions to the Theory of Games, vol. 3. Annals of Mathematics Studies, no. 39. Princeton University Press, 47--78.Google Scholar
R. Fagin, A. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins. 2000. Random walks with "back buttons" (extended abstract). In Proceedings of the ACM Symposium on Theory of Computing (STOC). 484--493. Google ScholarDigital Library
E. Feinberg and A. Shwartz. Eds. 2002. Handbook of Markov Decision Processes. Kluwer.Google Scholar
J. Filar and K. Vrieze. 1997. Competitive Markov Decision Processes. Springer. Google ScholarDigital Library
M. R. Garey, R. L. Graham, and D. S. Johnson. 1976. Some NP-complete geometric problems. In Proceedings of the 8th ACM Symposium on Theory of Computing (STOC). 10--22. Google ScholarDigital Library
M. Grötschel, L. Lovász, and A. Schrijver. 1993. Geometric Algorithms and Combinatorial Optimization 2nd Ed. Springer-Verlag.Google Scholar
P. Haccou, P. Jagers, and V. A. Vatutin. 2005. Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press.Google ScholarCross Ref
T. E. Harris. 1963. The Theory of Branching Processes. Springer-Verlag.Google Scholar
S. Hart, M. Sharir, and A. Pnueli. 1983. Termination of probabilistic concurrent programs. ACM Trans. Prog. Languages and Systems 5, 3, 356--380. Google ScholarDigital Library
J. E. Hopcroft and J. D. Ullman. 1979. Introduction to Automata Theory, Formal Languages and Computation. Addison-Wesley. Google ScholarDigital Library
R. J. Horn and C. R. Johnson. 1985. Matrix Analysis. Cambridge University Press. Google ScholarDigital Library
P. Jagers. 1975. Branching Processes with Biological Applications. Wiley.Google Scholar
M. Kimmel and D. E. Axelrod. 2002. Branching Processes in Biology. Springer.Google Scholar
A. N. Kolmogorov and B. A. Sevastyanov. 1947. The calculation of final probabilities for branching random processes. Dokaldy 56, 783--786. (Russian).Google Scholar
M. Kwiatkowska. 2003. Model checking for probability and time: from theory to practice. In Proceedings of the 18th IEEE LICS. 351--360 Google ScholarDigital Library
G. Latouche and V. Ramaswami. 1999. Introduction to Matrix Analytic Methods in Stochastic Modeling. ASA-SIAM Series on Statistics and Applied Probability.Google Scholar
O. Madani, S. Hanks, and A. Condon. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147, 1, 5--34. Google ScholarDigital Library
K. Mahler. 1964. An inequality for the discriminant of a polynomial. Mich. Math J. 11, 257--262.Google ScholarCross Ref
A. Maitra and W. Sudderth. 1998. Finitely additive stochastic games with Borel measurable payoffs. Internat. J. Game Theory 27, 2, 257--267. Google ScholarDigital Library
C. Manning and H. Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press. Google ScholarDigital Library
D. A. Martin. 1998. Determinacy of Blackwell games. J. Symb. Logic 63, 4, 1565--1581.Google ScholarCross Ref
M. F. Neuts. 1981. Matrix-Geometric Solutions in Stochastic Models:an algorithmic approach. Johns Hopkins University Press.Google Scholar
A. Neyman and S. Sorin. Eds. 2003. Stochastic Games and Applications. NATO ASI Series, Kluwer.Google Scholar
D. Ornstein. 1969. On the existence of stationary optimal strategies. Proc. Amer. Math. Soc. 20, 563--569.Google ScholarCross Ref
A. Paz. 1971. Introduction to Probabilistic Automata. Academic Press. Google ScholarDigital Library
S. Pliska. 1976. Optimization of multitype branching processes. Manage. Sci. 23, 2, 117--124.Google ScholarDigital Library
M. L. Puterman. 1994. Markov Decision Processes. Wiley.Google Scholar
J. Renegar. 1992. On the computational complexity and geometry of the first-order theory of the reals, parts I-III. J. Symbol. Computat. 13, 3, 255--352.Google ScholarDigital Library
U. Rothblum and P. Whittle. 1982. Growth optimality for branching Markov decision chains. Math. Oper. Res. 7, 4, 582--601. Google ScholarDigital Library
Y. Sakakibara, M. Brown, R. Hughey, I. Mian, K. Sjolander, R. Underwood, and D. Haussler. 1994. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Rese. 22, 23, 5112--5120.Google ScholarCross Ref
L. Shapley. 1953. Stochastic games. Proc. Nat. Acad. Sci. 39, 1095--1100.Google ScholarCross Ref
A. Tarski. 1955. A lattice-theoretical fixpoint theorem and its applications. Pac. J. Math. 5, 285--309.Google ScholarCross Ref
M. Vardi. 1985. Automatic verification of probabilistic concurrent finite-state programs. In Proceedings of the 26th IEEE FOCS. 327--338. Google ScholarDigital Library
I. Walukiewicz. 1996. Pushdown processes: games and model checking. In Computer-Aided Verification. 62--75. Google ScholarDigital Library
D. Wojtczak and K. Etessami. 2007. PReMo: an analyzer for probabilistic recursive models. In Proceedings of the 13th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 66--71. Tool web page: http://groups.inf.ed.ac.uk/premo/. Google ScholarDigital Library
U. Zwick and M. Paterson. 1996. The complexity of mean payoff games on graphs. Theoret. Comput. Sci. 158, 1--2, 343--359. Google ScholarDigital Library

Index Terms

Recursive Markov Decision Processes and Recursive Stochastic Games
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations
      1. Markov networks
    2. Stochastic processes
      1. Markov processes
2. Theory of computation
  1. Design and analysis of algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Markov decision processes

Recommendations

Recursive stochastic games with positive rewards
Abstract
We study the complexity of a class of Markov decision processes and, more generally, stochastic games, called 1-exit Recursive Markov Decision Processes (1-RMDPs) and 1-exit Recursive Simple Stochastic Games (1-RSSGs), with strictly ...
Read More
Efficient qualitative analysis of classes of recursive markov decision processes and simple stochastic games
STACS'06: Proceedings of the 23rd Annual conference on Theoretical Aspects of Computer Science

Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs) are natural models for recursive systems involving both probabilistic and non-probabilistic actions. As shown recently [10], fundamental problems about such models,...
Read More
Markov decision process routing games
ICCPS '17: Proceedings of the 8th International Conference on Cyber-Physical Systems

We explore an extension of nonatomic routing games that we call Markov decision process routing games where each agent chooses a transition policy between nodes in a network rather than a path from an origin node to a destination node, i.e. each agent ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 62, Issue 2
May 2015
304 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/2772377
Editor:
Victor Vianu
University of California, San Diego
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 May 2015
- Accepted: 1 November 2014
- Revised: 1 April 2014
- Received: 1 June 2011
Published in jacm Volume 62, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Markov decision processes
Recursive stochastic processes
multitype branching processes
stochastic context-free grammars
stochastic games
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 563
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Recursive Markov Decision Processes and Recursive Stochastic Games

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Recursive stochastic games with positive rewards

Efficient qualitative analysis of classes of recursive markov decision processes and simple stochastic games

Markov decision process routing games

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Recursive Markov Decision Processes and Recursive Stochastic Games

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Recursive stochastic games with positive rewards

Efficient qualitative analysis of classes of recursive markov decision processes and simple stochastic games

Markov decision process routing games

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media