skip to main content
research-article

Recursive Markov Decision Processes and Recursive Stochastic Games

Published:06 May 2015Publication History
Skip Abstract Section

Abstract

We introduce Recursive Markov Decision Processes (RMDPs) and Recursive Simple Stochastic Games (RSSGs), which are classes of (finitely presented) countable-state MDPs and zero-sum turn-based (perfect information) stochastic games. They extend standard finite-state MDPs and stochastic games with a recursion feature. We study the decidability and computational complexity of these games under termination objectives for the two players: one player's goal is to maximize the probability of termination at a given exit, while the other player's goal is to minimize this probability. In the quantitative termination problems, given an RMDP (or RSSG) and probability p, we wish to decide whether the value of such a termination game is at least p (or at most p); in the qualitative termination problem we wish to decide whether the value is 1. The important 1-exit subclasses of these models, 1-RMDPs and 1-RSSGs, correspond in a precise sense to controlled and game versions of classic stochastic models, including multitype Branching Processes and Stochastic Context-Free Grammars, where the objective of the players is to maximize or minimize the probability of termination (extinction).

We provide a number of upper and lower bounds for qualitative and quantitative termination problems for RMDPs and RSSGs. We show both problems are undecidable for multi-exit RMDPs, but are decidable for 1-RMDPs and 1-RSSGs. Specifically, the quantitative termination problem is decidable in PSPACE for both 1-RMDPs and 1-RSSGs, and is at least as hard as the square root sum problem, a well-known open problem in numerical computation. We show that the qualitative termination problem for 1-RMDPs (i.e., a controlled version of branching processes) can be solved in polynomial time both for maximizing and minimizing 1-RMDPs. The qualitative problem for 1-RSSGs is in NP ∩ coNP, and is at least as hard as the quantitative termination problem for Condon's finite-state simple stochastic games, whose complexity remains a well known open problem. Finally, we show that even for 1-RMDPs, more general (qualitative and quantitative) model-checking problems with respect to linear-time temporal properties are undecidable even for a fixed property.

References

  1. E. Allender, P. Bürgisser, J. Kjeldgaard-Pedersen, and P. B. Miltersen. 2009. On the complexity of numerical analysis. SIAM J. Comput. 38, 5, 1987--2006. Earlier conference version appeared in Proc., 21st IEEE Conference on Computational Complexity. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Alur, M. Benedikt, K. Etessami, P. Godefroid, T. Reps, and M. Yannakakis. 2005. Analysis of recursive state machines. ACM Trans. Program. Lang. Syst. 27, 4, 786--818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Alur, C. Courcoubetis, and M. Yannakakis. 1995. Distinguishing tests for nondeterministic and probabilistic machines. In Proceedings of the 20th STOC. 363--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. B. Athreya and P. E. Ney. 1972. Branching processes. Springer-Verlag.Google ScholarGoogle Scholar
  5. C. Baier and M. Kwiatkowska. 1998. Model checking for a probabilistic branching time logic with fairness. Distrib. Comput. 11, 3, 125--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. Blondel and V. Canterini. 2003. Undecidable problems for probabilistic automata of fixed dimension. Theory Comput. Syst. 36, 231--245.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Brázdil, V. Brozek, and K. Etessami. 2010a. One-counter stochastic games. In Proceedings of the FSTTCS'10. 108--119.Google ScholarGoogle Scholar
  8. T. Brázdil, V. Brozek, K. Etessami, and T. Kucera. 2011a. Approximating the termination value of one-counter mdps and stochastic games. In Proceedings of the ICALP'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Brázdil, V. Brozek, K. Etessami, T. Kucera, and D. Wojtczak. 2010b. One-counter Markov decision processes. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Brázdil, V. Brozek, V. Forejt, and A. Kucera. 2006. Reachability in recursive Markov decision processes. In Proceedings of the 17th International CONCUR. 358--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Brázdil, V. Brozek, A. Kucera, and J. Obdrzálek. 2011b. Qualitative reachability in stochastic bpa games. Inf. Computat. 209, 8, 1160--1183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Brázdil, A. Kučcera, and O. Stražzovský 2005. Decidability of temporal properties of probabilistic pushdown automata. In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Canny. 1988. Some algebraic and geometric computations in PSPACE. In Proceedings of the 20th ACM Symposium on Theory of Computing (STOC). 460--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Chatterjee, L. de Alfaro, and T. A. Henzinger. 2006. The complexity of quantitative concurrent parity games. In Proceedings of the SODA. 678--687. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Condon. 1992. The complexity of stochastic games. Inf. Comput. 96, 2, 203--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Condon and R. Lipton. 1989. The complexity of space bounded interactive proofs. In Proceedings of the 30th IEEE FOCS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Courcoubetis and M. Yannakakis. 1995. The complexity of probabilistic verification. J. ACM 42, 4, 857--907. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Courcoubetis and M. Yannakakis. 1998. Markov decision processes and regular events. IEEE Trans. Automat. Cont. 43, 10, 1399--1418.Google ScholarGoogle ScholarCross RefCross Ref
  19. L. de Alfaro, T. A. Henzinger, and O. Kupferman. 2007. Concurrent reachability games. Theoret. Comput. Sci. 386, 3, 188--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. de Alfaro, M. Kwiatkowska, G. Norman, D. Parker, and R. Segala. 2000. Symbolic model checking of probabilistic processes using MTBDDs and the Kronecker representation. In Proceedings of the 6th TACAS. 395--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. de Alfaro and R. Majumdar. 2004. Quantitative solution of omega-regular games. J. Comput. Syst. Sci. 68, 2, 374--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Denardo and U. Rothblum. 2005. Totally expanding multiplicative systems. Lin. Alg. Appl. 406, 142--158.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. V. Denardo and U. G. Rothblum. 2006. A turnpike theorem for a risk-sensitive markov decision process with stopping. SIAM J. Control Optimiz. 45, 2, 414--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Ehrenfeucht and J. Mycielski. 1979. Positional strategies for mean payoff games. Int. J. Game Theory 8, 109--113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. A. Emerson and C. S. Jutla. 1991. Tree automata, mu-calculus, and determinacy. In Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science (FOCS). 368--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Esparza, A. Gaiser, and S. Kiefer. 2013. A strongly polynomial algorithm for criticality of branching processes and consistency of stochastic context-free grammars. Inf. Process. Lett. 113, 10--11, 381--385.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Esparza, T. Gawlitza, S. Kiefer, and H. Seidl. 2008. Approximative methods for monotone systems of min-max-polynomial equations. In Proceedings of the ICALP (1). 698--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Esparza, S. Kiefer, and M. Luttenberger. 2010. Computing the least fixed point of positive polynomial systems. SIAM J. Comput. 39, 6, 2282--2335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Esparza, A. Kučcera, and R. Mayr. 2004. Model checking probabilistic pushdown automata. In Proceedings of the 19th IEEE Symposium on Logic in Computer Science (LICS). 12--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Etessami, A. Stewart, and M. Yannakakis. 2012a. Polynomial-time algorithms for multi-type branching processes and stochastic context-free grammars. In Proceedings of the 44th ACM STOC. (Full version is available at ArXiv:1201.2374.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Etessami, A. Stewart, and M. Yannakakis. 2012b. Polynomial time algorithms for branching Markov decision processes and probabilistic min(max) polynomial Bellman equations. In Proceedings of the 39th International Colloquium on Automata, Languages and Programming (ICALP). 314--326. (Full version at ArXiv:1202.4798.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Etessami, D. Wojtczak, and M. Yannakakis. 2008. Recursive stochastic games with positive rewards. In Proceedings of the 35th ICALP. 71--723. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Etessami, D. Wojtczak, and M. Yannakakis. 2010. Quasi-birth-death processes, tree-like QBDs, probabilistic 1-counter automata, and pushdown systems. Perform. Eval. 67, 9, 837--857. (conference version appeared in QEST'08.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Etessami and M. Yannakakis. 2005. Recursive Markov decision processes and recursive stochastic games. In Proceedings of the 32nd International Colloquium on Automata, Languages, and Programming (ICALP). 891--903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Etessami and M. Yannakakis. 2006. Efficient qualitative analysis of classes of recursive Markov decision processes and simple stochastic games. In Proceedings of the 23rd STACS'06. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Etessami and M. Yannakakis. 2008. Recursive concurrent stochastic games. Logi. Meth. Comput. Sci. 4, 4. (Conference version appeared in ICALP'06.) Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Etessami and M. Yannakakis. 2009. Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations. J. ACM 56, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. Etessami and M. Yannakakis. 2010. On the complexity of Nash equilibria and other fixed points. SIAM J. Comput. 39, 6, 2531--2597. (Conference version appeared in Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS'07.)Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Etessami and M. Yannakakis. 2012. Model checking of recursive probabilistic systems. ACM Trans. Computat. Logic 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. J. Everett and S. Ulam. 1948. Multiplicative systems, Part I, II, and III. Tech. Rep. 683,690,707, Los Alamos Scientific Laboratory.Google ScholarGoogle Scholar
  41. H. Everett. 1957. Recursive games. In Contributions to the Theory of Games, vol. 3. Annals of Mathematics Studies, no. 39. Princeton University Press, 47--78.Google ScholarGoogle Scholar
  42. R. Fagin, A. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins. 2000. Random walks with "back buttons" (extended abstract). In Proceedings of the ACM Symposium on Theory of Computing (STOC). 484--493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. E. Feinberg and A. Shwartz. Eds. 2002. Handbook of Markov Decision Processes. Kluwer.Google ScholarGoogle Scholar
  44. J. Filar and K. Vrieze. 1997. Competitive Markov Decision Processes. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. R. Garey, R. L. Graham, and D. S. Johnson. 1976. Some NP-complete geometric problems. In Proceedings of the 8th ACM Symposium on Theory of Computing (STOC). 10--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Grötschel, L. Lovász, and A. Schrijver. 1993. Geometric Algorithms and Combinatorial Optimization 2nd Ed. Springer-Verlag.Google ScholarGoogle Scholar
  47. P. Haccou, P. Jagers, and V. A. Vatutin. 2005. Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  48. T. E. Harris. 1963. The Theory of Branching Processes. Springer-Verlag.Google ScholarGoogle Scholar
  49. S. Hart, M. Sharir, and A. Pnueli. 1983. Termination of probabilistic concurrent programs. ACM Trans. Prog. Languages and Systems 5, 3, 356--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. E. Hopcroft and J. D. Ullman. 1979. Introduction to Automata Theory, Formal Languages and Computation. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. R. J. Horn and C. R. Johnson. 1985. Matrix Analysis. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. P. Jagers. 1975. Branching Processes with Biological Applications. Wiley.Google ScholarGoogle Scholar
  53. M. Kimmel and D. E. Axelrod. 2002. Branching Processes in Biology. Springer.Google ScholarGoogle Scholar
  54. A. N. Kolmogorov and B. A. Sevastyanov. 1947. The calculation of final probabilities for branching random processes. Dokaldy 56, 783--786. (Russian).Google ScholarGoogle Scholar
  55. M. Kwiatkowska. 2003. Model checking for probability and time: from theory to practice. In Proceedings of the 18th IEEE LICS. 351--360 Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. G. Latouche and V. Ramaswami. 1999. Introduction to Matrix Analytic Methods in Stochastic Modeling. ASA-SIAM Series on Statistics and Applied Probability.Google ScholarGoogle Scholar
  57. O. Madani, S. Hanks, and A. Condon. 2003. On the undecidability of probabilistic planning and related stochastic optimization problems. Artif. Intell. 147, 1, 5--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. K. Mahler. 1964. An inequality for the discriminant of a polynomial. Mich. Math J. 11, 257--262.Google ScholarGoogle ScholarCross RefCross Ref
  59. A. Maitra and W. Sudderth. 1998. Finitely additive stochastic games with Borel measurable payoffs. Internat. J. Game Theory 27, 2, 257--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. C. Manning and H. Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. D. A. Martin. 1998. Determinacy of Blackwell games. J. Symb. Logic 63, 4, 1565--1581.Google ScholarGoogle ScholarCross RefCross Ref
  62. M. F. Neuts. 1981. Matrix-Geometric Solutions in Stochastic Models:an algorithmic approach. Johns Hopkins University Press.Google ScholarGoogle Scholar
  63. A. Neyman and S. Sorin. Eds. 2003. Stochastic Games and Applications. NATO ASI Series, Kluwer.Google ScholarGoogle Scholar
  64. D. Ornstein. 1969. On the existence of stationary optimal strategies. Proc. Amer. Math. Soc. 20, 563--569.Google ScholarGoogle ScholarCross RefCross Ref
  65. A. Paz. 1971. Introduction to Probabilistic Automata. Academic Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. S. Pliska. 1976. Optimization of multitype branching processes. Manage. Sci. 23, 2, 117--124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. M. L. Puterman. 1994. Markov Decision Processes. Wiley.Google ScholarGoogle Scholar
  68. J. Renegar. 1992. On the computational complexity and geometry of the first-order theory of the reals, parts I-III. J. Symbol. Computat. 13, 3, 255--352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. U. Rothblum and P. Whittle. 1982. Growth optimality for branching Markov decision chains. Math. Oper. Res. 7, 4, 582--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Y. Sakakibara, M. Brown, R. Hughey, I. Mian, K. Sjolander, R. Underwood, and D. Haussler. 1994. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Rese. 22, 23, 5112--5120.Google ScholarGoogle ScholarCross RefCross Ref
  71. L. Shapley. 1953. Stochastic games. Proc. Nat. Acad. Sci. 39, 1095--1100.Google ScholarGoogle ScholarCross RefCross Ref
  72. A. Tarski. 1955. A lattice-theoretical fixpoint theorem and its applications. Pac. J. Math. 5, 285--309.Google ScholarGoogle ScholarCross RefCross Ref
  73. M. Vardi. 1985. Automatic verification of probabilistic concurrent finite-state programs. In Proceedings of the 26th IEEE FOCS. 327--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. I. Walukiewicz. 1996. Pushdown processes: games and model checking. In Computer-Aided Verification. 62--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. D. Wojtczak and K. Etessami. 2007. PReMo: an analyzer for probabilistic recursive models. In Proceedings of the 13th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 66--71. Tool web page: http://groups.inf.ed.ac.uk/premo/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. U. Zwick and M. Paterson. 1996. The complexity of mean payoff games on graphs. Theoret. Comput. Sci. 158, 1--2, 343--359. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Recursive Markov Decision Processes and Recursive Stochastic Games

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Journal of the ACM
            Journal of the ACM  Volume 62, Issue 2
            May 2015
            304 pages
            ISSN:0004-5411
            EISSN:1557-735X
            DOI:10.1145/2772377
            Issue’s Table of Contents

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 May 2015
            • Accepted: 1 November 2014
            • Revised: 1 April 2014
            • Received: 1 June 2011
            Published in jacm Volume 62, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader