skip to main content
Skip header Section
Dynamic Programming and Optimal Control, Vol. IIJanuary 2007
Publisher:
  • Athena Scientific
ISBN:978-1-886529-30-4
Published:29 January 2007
Pages:
464
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

A major revision of the second volume of a textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization. The second volume is oriented towards mathematical analysis and computation, and treats infinite horizon problems extensively. New features of the 3rd edition are: 1) A major enlargement in size and scope: the length has increased by more than 50%, and most of the old material has been restructured and/or revised. 2) Extensive coverage (more than 100 pages) of recent research on simulation-based approximate dynamic programming (neuro-dynamic programming), which allow the practical application of dynamic programming to large and complex problems. 3) An in-depth development of the average cost problem (more than 100 pages), including a full analysis of multichain problems, and an extensive analysis of infinite-spaces problems. 4) An introduction to infinite state space stochastic shortest path problems. 5) Expansion of the theory and use of contraction mappings in infinite state space problems and in neuro-dynamic programming. 6) A substantive appendix on the mathematical measure-theoretic issues that must be addressed for a rigorous theory of stochastic dynamic programming. Much supplementary material can be found in the book's web page: http://www.athenasc.com/dpbook.html

Cited By

  1. Zakeri A, Moltafet M, Leinonen M and Codreanu M (2024). Minimizing the AoI in Resource-Constrained Multi-Source Relaying Systems: Dynamic and Learning-Based Scheduling, IEEE Transactions on Wireless Communications, 23:1, (450-466), Online publication date: 1-Jan-2024.
  2. ACM
    Krishnan K. S. A, Singh C, Maguluri S and Parag P (2023). Optimal Pricing in a Single Server System, ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 8:4, (1-32), Online publication date: 31-Dec-2024.
  3. Ren X, Fu M and Marcus S Sensitivity Analysis for Stopping Criteria with Application to Organ Transplantations Proceedings of the Winter Simulation Conference, (504-515)
  4. Moltafet M, Leinonen M, Codreanu M and Yates R (2023). Status Update Control and Analysis Under Two-Way Delay, IEEE/ACM Transactions on Networking, 31:6, (2918-2933), Online publication date: 1-Dec-2023.
  5. Wang Y, Velasquez A, Atia G, Prater-Bennette A and Zou S Model-free robust average-reward reinforcement learning Proceedings of the 40th International Conference on Machine Learning, (36431-36469)
  6. Zhao T, Zhou S, Sun Y and Niu Z (2023). A Predictive Frame Transmission Scheme for Cloud Gaming in Mobile Edge Cloudlet Systems, IEEE Transactions on Mobile Computing, 22:7, (3774-3789), Online publication date: 1-Jul-2023.
  7. Drent C, Drent M, Arts J and Kapodistria S (2023). Real-Time Integrated Learning and Decision Making for Cumulative Shock Degradation, Manufacturing & Service Operations Management, 25:1, (235-253), Online publication date: 1-Jan-2023.
  8. Qu G and Li N Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure 2019 IEEE 58th Conference on Decision and Control (CDC), (6479-6486)
  9. Greene M, Deptula P, Nivison S and Dixon W Reinforcement Learning with Sparse Bellman Error Extrapolation for Infinite-Horizon Approximate Optimal Regulation 2019 IEEE 58th Conference on Decision and Control (CDC), (1959-1964)
  10. ACM
    Pan L, Qian J, Xia W, Mao H, Yao J, Li P and Xiao Z Optimizing communication in deep reinforcement learning with XingTian Proceedings of the 23rd ACM/IFIP International Middleware Conference, (255-268)
  11. Skitsas K, Papageorgiou I, Talebi M, Kantere V, Katehakis M and Karras P (2022). SIFTER, Proceedings of the VLDB Endowment, 16:1, (90-98), Online publication date: 1-Sep-2022.
  12. Hatami M, Leinonen M, Chen Z, Pappas N and Codreanu M Asymptotically Optimal On-Demand AoI Minimization in Energy Harvesting IoT Networks 2022 IEEE International Symposium on Information Theory (ISIT), (922-927)
  13. Pezzutto M, Schenato L and Dey S (2022). Transmission power allocation for remote estimation with multi-packet reception capabilities, Automatica (Journal of IFAC), 140:C, Online publication date: 1-Jun-2022.
  14. Alizamir S, de Véricourt F and Sun P (2022). Search Under Accumulated Pressure, Operations Research, 70:3, (1393-1409), Online publication date: 1-May-2022.
  15. Atia G, Beckus A, Alkhouri I and Velasquez A (2021). Steady-State Planning in Expected Reward Multichain MDPs, Journal of Artificial Intelligence Research, 72, (1029-1082), Online publication date: 4-Jan-2022.
  16. Choraria M, Chattopadhyay A, Mitra U and Ström E (2022). Design of False Data Injection Attack on Distributed Process Estimation, IEEE Transactions on Information Forensics and Security, 17, (670-683), Online publication date: 1-Jan-2022.
  17. Qu G, Yu C, Low S and Wierman A Exploiting Linear Models for Model-Free Nonlinear Control: A Provably Convergent Policy Gradient Approach 2021 60th IEEE Conference on Decision and Control (CDC), (6539-6546)
  18. Martinelli A, Gargiani M and Lygeros J On the Synthesis of Bellman Inequalities for Data-Driven Optimal Control 2021 60th IEEE Conference on Decision and Control (CDC), (4352-4357)
  19. İnan Y, Inovan R and Telatar E Optimal Policies for Age and Distortion in a Discrete-Time Model 2021 IEEE Information Theory Workshop (ITW), (1-6)
  20. Kanoria Y and Saban D (2021). Facilitating the Search for Partners on Matching Platforms, Management Science, 67:10, (5990-6029), Online publication date: 1-Oct-2021.
  21. Ma Z, Huang P and Kuang Z (2021). Fuzzy Approximate Learning-Based Sliding Mode Control for Deploying Tethered Space Robot, IEEE Transactions on Fuzzy Systems, 29:9, (2739-2749), Online publication date: 1-Sep-2021.
  22. Ni C, Zhang A, Duan Y and Wang M Learning Good State and Action Representations via Tensor Decomposition 2021 IEEE International Symposium on Information Theory (ISIT), (1682-1687)
  23. Gupta A, Sikdar A and Chattopadhyay A Quickest detection of false data injection attack in remote state estimation 2021 IEEE International Symposium on Information Theory (ISIT), (3068-3073)
  24. Fridovich-Keil D and Tomlin C Approximate Solutions to a Class of Reachability Games 2021 IEEE International Conference on Robotics and Automation (ICRA), (12610-12617)
  25. Yang F and Chakraborty N Chance Constrained Simultaneous Path Planning and Task Assignment with Bottleneck Objective 2021 IEEE International Conference on Robotics and Automation (ICRA), (7510-7516)
  26. Oliehoek F, Witwicki S and Kaelbling L (2021). A Sufficient Statistic for Influence in Structured Multiagent Environments, Journal of Artificial Intelligence Research, 70, (789-870), Online publication date: 1-May-2021.
  27. Jiang Y, Kouzoupis D, Yin H, Diehl M and Houska B (2021). Decentralized Optimization Over Tree Graphs, Journal of Optimization Theory and Applications, 189:2, (384-407), Online publication date: 1-May-2021.
  28. Bian T and Jiang Z Value iteration, adaptive dynamic programming, and optimal control of nonlinear systems 2016 IEEE 55th Conference on Decision and Control (CDC), (3375-3380)
  29. Constantinescu D, Navarro A, Corbera F, Fernández-Madrigal J and Asenjo R (2021). Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs, The Journal of Supercomputing, 77:1, (44-65), Online publication date: 1-Jan-2021.
  30. Gosavi A The actor-critic algorithm for infinite horizon discounted cost revisited Proceedings of the Winter Simulation Conference, (2867-2878)
  31. Yang X, Hu J, Hu J and Peng Y Asynchronous value iteration for markov decision processes with continuous state spaces Proceedings of the Winter Simulation Conference, (2856-2866)
  32. van Heeswijk W and La Poutré H Deep reinforcement learning in linear discrete action spaces Proceedings of the Winter Simulation Conference, (1063-1074)
  33. Zhang N, Wang W, Zhou P and Huang A Delay-Optimal Edge Cache Replacement with Non-Markovian Content Fetching GLOBECOM 2020 - 2020 IEEE Global Communications Conference, (1-6)
  34. Wen Z, Precup D, Ibrahimi M, Barreto A, Van Roy B and Singh S On efficiency in hierarchical reinforcement learning Proceedings of the 34th International Conference on Neural Information Processing Systems, (6708-6718)
  35. Qu G, Lin Y, Wierman A and Li N Scalable multi-agent reinforcement learning for networked systems with average reward Proceedings of the 34th International Conference on Neural Information Processing Systems, (2074-2086)
  36. Neu G and Pike-Burke C A unifying view of optimism in episodic reinforcement learning Proceedings of the 34th International Conference on Neural Information Processing Systems, (1392-1403)
  37. Ajdari A, Saberian F and Ghate A (2020). A Theoretical Framework for Learning Tumor Dose-Response Uncertainty in Individualized Spatiobiologically Integrated Radiotherapy, INFORMS Journal on Computing, 32:4, (930-951), Online publication date: 1-Oct-2020.
  38. Wang J, Dong M, Liang B and Boudreau G Online Precoding Design for Downlink MIMO Wireless Network Virtualization with Imperfect CSI IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, (1211-1220)
  39. Lin W, Wang X, xu C, Sun X and Chen X Average Age Of Changed Information In The Internet Of Things 2020 IEEE Wireless Communications and Networking Conference (WCNC), (1-6)
  40. Lin Q, Nadarajah S and Soheili N (2020). Revisiting Approximate Linear Programming, Management Science, 66:4, (1544-1562), Online publication date: 1-Apr-2020.
  41. Vimal S, Kalaivani L, Kaliappan M, Suresh A, Gao X and Varatharajan R (2018). Development of secured data transmission using machine learning-based discrete-time partially observed Markov model and energy optimization in cognitive radio networks, Neural Computing and Applications, 32:1, (151-161), Online publication date: 1-Jan-2020.
  42. Rao S Relay Selection for Energy Harvesting Cooperative NOMA 2019 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), (1-6)
  43. Krauth K, Tu S and Recht B Finite-time analysis of approximate policy iteration for the linear quadratic regulator Proceedings of the 33rd International Conference on Neural Information Processing Systems, (8514-8524)
  44. Gupta H, Srikant R and Ying L Finite-time performance bounds and adaptive learning rate selection for two time-scale reinforcement learning Proceedings of the 33rd International Conference on Neural Information Processing Systems, (4704-4713)
  45. Eskandarian A (2019). Scanning the Issue, IEEE Transactions on Intelligent Transportation Systems, 20:12, (4257-4261), Online publication date: 1-Dec-2019.
  46. Wang S, Ahmed N and Yeap T (2019). Optimum Management of Urban Traffic Flow Based on a Stochastic Dynamic Model, IEEE Transactions on Intelligent Transportation Systems, 20:12, (4377-4389), Online publication date: 1-Dec-2019.
  47. Hutter M, Yang-Zhao S and Majeed S Conditions on features for temporal difference-like methods to converge Proceedings of the 28th International Joint Conference on Artificial Intelligence, (2570-2577)
  48. ACM
    Wei H, Kang X, Wang W and Ying L (2019). QuickStop, Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3:2, (1-25), Online publication date: 19-Jun-2019.
  49. ACM
    Yao M Robust Detection of Cyberbullying in Social Media Companion Proceedings of The 2019 World Wide Web Conference, (61-66)
  50. Georghiou A, Tsoukalas A and Wiesemann W (2019). Robust Dual Dynamic Programming, Operations Research, 67:3, (813-830), Online publication date: 1-May-2019.
  51. Burra R, Singh C and Kuri J Service Scheduling for Bernoulli Requests and Quadratic Cost IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, (2584-2592)
  52. Mulder J, van Jaarsveld W and Dekker R (2019). Simultaneous Optimization of Speed and Buffer Times with an Application to Liner Shipping, Transportation Science, 53:2, (365-382), Online publication date: 1-Mar-2019.
  53. Lee J (2019). Multi-objective optimization case study with active and passive design in building engineering, Structural and Multidisciplinary Optimization, 59:2, (507-519), Online publication date: 1-Feb-2019.
  54. Wen M and Topcu U Constrained cross-entropy method for safe reinforcement learning Proceedings of the 32nd International Conference on Neural Information Processing Systems, (7461-7471)
  55. Shah D and Xie Q Q-learning with nearest neighbors Proceedings of the 32nd International Conference on Neural Information Processing Systems, (3115-3125)
  56. Agha-mohammadi A, Agarwal S, Kim S, Chakravorty S and Amato N (2018). SLAP: Simultaneous Localization and Planning Under Uncertainty via Dynamic Replanning in Belief Space, IEEE Transactions on Robotics, 34:5, (1195-1214), Online publication date: 1-Oct-2018.
  57. Joseph A and Bhatnagar S (2018). An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Machine Language, 107:8-10, (1385-1429), Online publication date: 1-Sep-2018.
  58. Oliehoek F Interactive learning and decision making Proceedings of the 27th International Joint Conference on Artificial Intelligence, (5703-5708)
  59. Chatterjee K, Fu H, Goharshady A and Okati N Computational approaches for stochastic shortest path on succinct MDPs Proceedings of the 27th International Joint Conference on Artificial Intelligence, (4700-4707)
  60. Ramirez M, Papasimeon M, Lipovetzky N, Benke L, Miller T, Pearce A, Scala E and Zamani M Integrated Hybrid Planning and Programmed Control for Real Time UAV Maneuvering Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, (1318-1326)
  61. Kapoor S, Sreekumar S and Pillai S (2018). Distributed Scheduling in Multiple Access With Bursty Arrivals Under a Maximum Delay Constraint, IEEE Transactions on Information Theory, 64:2, (1297-1316), Online publication date: 1-Feb-2018.
  62. Rafieisakhaei M, Chakravorty S and Kumar P On the use of the observability gramian for partially observed robotic path planning problems 2017 IEEE 56th Annual Conference on Decision and Control (CDC), (1523-1528)
  63. Larsson D, Braun D and Tsiotras P Hierarchical state abstractions for decision-making problems with computational constraints 2017 IEEE 56th Annual Conference on Decision and Control (CDC), (1138-1143)
  64. Rafieisakhaei M, Chakravorty S and Kumar P A near-optimal decoupling principle for nonlinear stochastic systems arising in robotic path planning and control 2017 IEEE 56th Annual Conference on Decision and Control (CDC), (1-6)
  65. Palmer A and Vladimirsky A (2017). Optimal Stopping with a Probabilistic Constraint, Journal of Optimization Theory and Applications, 175:3, (795-817), Online publication date: 1-Dec-2017.
  66. Liu L, Chattopadhyay A and Mitra U On exploiting spectral properties for solving MDP with large state space 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), (1213-1219)
  67. Esposito G and Martin M (2017). Bellman residuals minimization using online support vector machines, Applied Intelligence, 47:3, (670-704), Online publication date: 1-Oct-2017.
  68. Zhang Z, Pan Z and Kochenderfer M Weighted double Q-learning Proceedings of the 26th International Joint Conference on Artificial Intelligence, (3455-3461)
  69. Hallak A and Mannor S Consistent on-line off-policy evaluation Proceedings of the 34th International Conference on Machine Learning - Volume 70, (1372-1383)
  70. Wang W, Lau V and Peng M (2017). Delay-Aware Uplink Fronthaul Allocation in Cloud Radio Access Networks, IEEE Transactions on Wireless Communications, 16:7, (4275-4287), Online publication date: 1-Jul-2017.
  71. Tan O, Gomez-Vilardebo J and Gunduz D (2017). Privacy-Cost Trade-offs in Demand-Side Management With Storage, IEEE Transactions on Information Forensics and Security, 12:6, (1458-1469), Online publication date: 1-Jun-2017.
  72. Rafieisakhaei M, Chakravorty S and Kumar P MT-LQG: Multi-agent planning in belief space via trajectory-optimized LQG 2017 IEEE International Conference on Robotics and Automation (ICRA), (5583-5590)
  73. Rafieisakhaei M, Chakravorty S and Kumar P T-LQG: Closed-loop belief space planning via trajectory-optimized LQG 2017 IEEE International Conference on Robotics and Automation (ICRA), (649-656)
  74. Gong J, Zhou S, Zhou Z and Niu Z (2017). Policy Optimization for Content Push via Energy Harvesting Small Cells in Heterogeneous Networks, IEEE Transactions on Wireless Communications, 16:2, (717-729), Online publication date: 1-Feb-2017.
  75. Ajdari A and Ghate A A model predictive control approach for discovering nonstationary fluence-maps in cancer radiotherapy fractionation Proceedings of the 2016 Winter Simulation Conference, (2065-2075)
  76. Parizi M and Ghate A Lot-sizing in sequential auctions while learning bid and demand distributions Proceedings of the 2016 Winter Simulation Conference, (895-906)
  77. Kamalapurkar R, Rosenfeld J and Dixon W (2016). Efficient model-based reinforcement learning for approximate online optimal control, Automatica (Journal of IFAC), 74:C, (247-258), Online publication date: 1-Dec-2016.
  78. Chattopadhyay A, Coupechoux M and Kumar A (2016). Sequential Decision Algorithms for Measurement-Based Impromptu Deployment of a Wireless Relay Network Along a Line, IEEE/ACM Transactions on Networking, 24:5, (2954-2968), Online publication date: 1-Oct-2016.
  79. Bian T and Jiang Z (2016). Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica (Journal of IFAC), 71:C, (348-360), Online publication date: 1-Sep-2016.
  80. GarcíaźCarrillo L, Vamvoudakis K and Hespanha J (2016). Approximate optimal adaptive control for weakly coupled nonlinear systems, International Journal of Adaptive Control and Signal Processing, 30:8-10, (1494-1522), Online publication date: 1-Aug-2016.
  81. Zazo S, Valcarcel Macua S, Sánchez-Fernández M and Zazo J (2016). Dynamic Potential Games With Constraints: Fundamentals and Applications in Communications, IEEE Transactions on Signal Processing, 64:14, (3806-3821), Online publication date: 15-Jul-2016.
  82. Gehring C, Pan Y and White M Incremental truncated LSTD Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (1505-1511)
  83. Amirnavaei F and Dong M (2016). Online Power Control Optimization for Wireless Transmission With Energy Harvesting and Storage, IEEE Transactions on Wireless Communications, 15:7, (4888-4901), Online publication date: 1-Jul-2016.
  84. Boyles S and Rambha T (2016). A note on detecting unbounded instances of the online shortest path problem, Networks, 67:4, (270-276), Online publication date: 1-Jul-2016.
  85. Büyüktahtakın İ and Liu N (2016). Dynamic programming approximation algorithms for the capacitated lot-sizing problem, Journal of Global Optimization, 65:2, (231-259), Online publication date: 1-Jun-2016.
  86. Dolinskaya I, Epelman M, Şişikoğlu Sir E and Smith R (2016). Parameter-Free Sampled Fictitious Play for Solving Deterministic Dynamic Programming Problems, Journal of Optimization Theory and Applications, 169:2, (631-655), Online publication date: 1-May-2016.
  87. Cui Y, Yeh E and Liu R (2016). Enhancing the delay performance of dynamic backpressure algorithms, IEEE/ACM Transactions on Networking, 24:2, (954-967), Online publication date: 1-Apr-2016.
  88. Kalyanakrishnan S, Misra N and Gopalan A Randomised procedures for initialising and switching actions in Policy Iteration Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, (3145-3151)
  89. Hansen E and Abdoulahi I General error bounds in heuristic search algorithms for stochastic shortest path problems Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, (3130-3137)
  90. Lever G, Shawe-Taylor J, Stafford R and Szepesvári C Compressed conditional mean embeddings for model-based reinforcement learning Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, (1779-1787)
  91. Hallak A, Tamar A, Munos R and Mannor S Generalized emphatic temporal difference learning Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, (1631-1637)
  92. Calandra R, Seyfarth A, Peters J and Deisenroth M (2016). Bayesian optimization for learning gaits under uncertainty, Annals of Mathematics and Artificial Intelligence, 76:1-2, (5-23), Online publication date: 1-Feb-2016.
  93. Wei Q, Liu D and Xu Y (2016). Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 20:2, (697-706), Online publication date: 1-Feb-2016.
  94. Yang Y, Chen Y, Jiang C and Liu K (2016). Wireless Network Association Game With Data-Driven Statistical Modeling, IEEE Transactions on Wireless Communications, 15:1, (512-524), Online publication date: 1-Jan-2016.
  95. Veatch M (2015). Approximate linear programming for networks, Computers and Operations Research, 63:C, (32-45), Online publication date: 1-Nov-2015.
  96. Khademi A, Saure D, Schaefer A, Braithwaite R and Roberts M (2015). The Price of Nonabandonment, Manufacturing & Service Operations Management, 17:4, (554-570), Online publication date: 1-Oct-2015.
  97. Bo Zhou , Ying Cui and Meixia Tao (2015). Stochastic Throughput Optimization for Two-Hop Systems With Finite Relay Buffers, IEEE Transactions on Signal Processing, 63:20, (5546-5560), Online publication date: 1-Oct-2015.
  98. Ying Cui , Lau V and Fan Zhang (2015). Grid Power-Delay Tradeoff for Energy Harvesting Wireless Communication Systems With Finite Renewable Energy Storage, IEEE Journal on Selected Areas in Communications, 33:8, (1651-1666), Online publication date: 1-Aug-2015.
  99. Mastin A and Jaillet P (2015). Average-Case Performance of Rollout Algorithms for Knapsack Problems, Journal of Optimization Theory and Applications, 165:3, (964-984), Online publication date: 1-Jun-2015.
  100. ACM
    Bui N and Rossi M (2015). Staying Alive, ACM Transactions on Sensor Networks, 11:3, (1-42), Online publication date: 28-May-2015.
  101. Efthymiadis K and Kudenko D Knowledge Revision for Reinforcement Learning with Abstract MDPs Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, (763-770)
  102. Berger A, Grigoriev A, Peeters R and Usotskaya N (2015). On Time-Optimal Trajectories in Non-Uniform Mediums, Journal of Optimization Theory and Applications, 165:2, (586-626), Online publication date: 1-May-2015.
  103. Wang M and Bertsekas D (2015). Incremental constraint projection methods for variational inequalities, Mathematical Programming: Series A and B, 150:2, (321-363), Online publication date: 1-May-2015.
  104. Lau V and Fan Zhang (2015). Optimal Beamforming for Video Streaming in Multiantenna Interference Networks via Diffusion Limit, IEEE Transactions on Information Theory, 61:4, (1819-1841), Online publication date: 1-Apr-2015.
  105. Ulukus S, Erkip E, Grover P, Huang K, Simeone O, Yener A and Zorzi M (2015). Guest Editorial: Wireless Communications Powered by Energy Harvesting and Wireless Energy Transfer (Part I), IEEE Journal on Selected Areas in Communications, 33:3, (357-359), Online publication date: 1-Mar-2015.
  106. Edalat N, Motani M, Walrand J and Longbo Huang (2015). A Methodology for Designing the Control of Energy Harvesting Sensor Nodes, IEEE Journal on Selected Areas in Communications, 33:3, (598-607), Online publication date: 1-Mar-2015.
  107. Wei Wang , Fan Zhang and Lau V (2015). Dynamic Power Control for Delay-Aware Device-to-Device Communications, IEEE Journal on Selected Areas in Communications, 33:1, (14-27), Online publication date: 1-Jan-2015.
  108. Schildbach G, Goulart P and Morari M (2015). Linear controller design for chance constrained systems, Automatica (Journal of IFAC), 51:C, (278-284), Online publication date: 1-Jan-2015.
  109. Prashanth L, Chatterjee A and Bhatnagar S (2014). Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networks, Wireless Networks, 20:8, (2589-2604), Online publication date: 1-Nov-2014.
  110. Patrinos P, Sopasakis P, Sarimveis H and Bemporad A (2014). Stochastic model predictive control for constrained discrete-time Markovian switching systems, Automatica (Journal of IFAC), 50:10, (2504-2514), Online publication date: 1-Oct-2014.
  111. ACM
    Hong L, Hu Z and Liu G (2014). Monte Carlo Methods for Value-at-Risk and Conditional Value-at-Risk, ACM Transactions on Modeling and Computer Simulation, 24:4, (1-37), Online publication date: 13-Aug-2014.
  112. ACM
    Liu T and Cerpa A (2014). Temporal Adaptive Link Quality Prediction with Online Learning, ACM Transactions on Sensor Networks, 10:3, (1-41), Online publication date: 1-Apr-2014.
  113. Jung T, Wehenkel L, Ernst D and Maes F (2014). Optimized look-ahead tree policies, International Journal of Adaptive Control and Signal Processing, 28:3-5, (255-289), Online publication date: 1-Mar-2014.
  114. Borkar V and Mathkar A Reinforcement Learning for Matrix Computations Proceedings of the 10th International Conference on Distributed Computing and Internet Technology - Volume 8337, (14-24)
  115. Wagner C, Sedigh S and Hurson A Accurate and Efficient Search Prediction Using Fuzzy Matching and Outcome Feedback Proceedings of the 6th International Conference on Similarity Search and Applications - Volume 8199, (219-232)
  116. Jiang Y, Zou Y and Niu Y (2013). Robust Explicit Solution of Multirate Predictive Control System with External Disturbances, Circuits, Systems, and Signal Processing, 32:5, (2503-2515), Online publication date: 1-Oct-2013.
  117. Veatch M (2013). Approximate Linear Programming for Average Cost MDPs, Mathematics of Operations Research, 38:3, (535-544), Online publication date: 1-Aug-2013.
  118. Wiesemann W, Kuhn D and Rustem B (2013). Robust Markov Decision Processes, Mathematics of Operations Research, 38:1, (153-183), Online publication date: 1-Feb-2013.
  119. Gaggero M, Gnecco G and Sanguineti M (2013). Dynamic Programming and Value-Function Approximation in Sequential Decision Problems, Journal of Optimization Theory and Applications, 156:2, (380-416), Online publication date: 1-Feb-2013.
  120. Alizamir S, de Véricourt F and Sun P (2013). Diagnostic Accuracy Under Congestion, Management Science, 59:1, (157-171), Online publication date: 1-Jan-2013.
  121. ACM
    Singh S, Chopin N and Whiteley N (2013). Bayesian Learning of Noisy Markov Decision Processes, ACM Transactions on Modeling and Computer Simulation, 23:1, (1-25), Online publication date: 1-Jan-2013.
  122. Chen X, Fernandez E and Kelton W Optimization model selection for simulation-based approximate dynamic programming approaches in semiconductor manufacturing operations Proceedings of the Winter Simulation Conference, (1-12)
  123. Frazier P Optimization via simulation with Bayesian statistics and dynamic programming Proceedings of the Winter Simulation Conference, (1-16)
  124. Desai V, Farias V and Moallemi C (2012). Pathwise Optimization for Optimal Stopping Problems, Management Science, 58:12, (2292-2308), Online publication date: 1-Dec-2012.
  125. Zhang Y, Wang Y and Wang X TEStore Proceedings of the 8th International Conference on Network and Service Management, (19-27)
  126. Blum C Hybrid metaheuristics in combinatorial optimization Proceedings of the First international conference on Theory and Practice of Natural Computing, (1-10)
  127. Gocgun Y and Ghate A (2012). Lagrangian relaxation and constraint generation for allocation and advanced scheduling, Computers and Operations Research, 39:10, (2323-2336), Online publication date: 1-Oct-2012.
  128. Grześ M and Hoey J Analysis of methods for solving MDPs Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3, (1237-1238)
  129. Devlin S and Kudenko D Dynamic potential-based reward shaping Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (433-440)
  130. Desai V, Farias V and Moallemi C (2012). Approximate Dynamic Programming via a Smoothed Linear Program, Operations Research, 60:3, (655-674), Online publication date: 1-May-2012.
  131. Papangelis A A comparative study of reinforcement learning techniques on dialogue management Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, (22-31)
  132. Richter H Analyzing dynamic fitness landscapes of the targeting problem of chaotic systems Proceedings of the 2012t European conference on Applications of Evolutionary Computation, (83-92)
  133. Vemu K, Bhatnagar S and Hemachandra N (2012). Optimal multi-layered congestion based pricing schemes for enhanced QoS, Computer Networks: The International Journal of Computer and Telecommunications Networking, 56:4, (1249-1262), Online publication date: 1-Mar-2012.
  134. Bertsekas D and Yu H (2012). Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming, Mathematics of Operations Research, 37:1, (66-94), Online publication date: 1-Feb-2012.
  135. Kogan K and Shnaiderman M (2011). On Optimality of a Class of Dynamic Myopic Policies for Continuous-Time Replenishment with Periodic Updates, Journal of Optimization Theory and Applications, 151:1, (191-209), Online publication date: 1-Oct-2011.
  136. Iwane H, Kira A and Anai H Construction of explicit optimal value functions by a symbolic-numeric cylindrical algebraic decomposition Proceedings of the 13th international conference on Computer algebra in scientific computing, (239-250)
  137. Hernandez-del-Olmo F and Gaudioso E Reinforcement learning techniques for the control of wastewater treatment plants Proceedings of the 4th international conference on Interplay between natural and artificial computation: new challenges on bioinspired applications - Volume Part II, (215-222)
  138. Devlin S and Kudenko D Theoretical considerations of potential-based reward shaping for multi-agent systems The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, (225-232)
  139. Sun C, Stevens-Navarro E, Shah-Mansouri V and Wong V (2011). A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks, Wireless Networks, 17:4, (1063-1081), Online publication date: 1-May-2011.
  140. Kim J, Lin X and Shroff N (2011). Optimal anycast technique for delay-sensitive energy-constrained asynchronous sensor networks, IEEE/ACM Transactions on Networking, 19:2, (484-497), Online publication date: 1-Apr-2011.
  141. Dion M and L'Ecuyer P American option pricing with randomized quasi-Monte Carlo simulations Proceedings of the Winter Simulation Conference, (2705-2720)
  142. Chen T and Lu J Towards analysis of semi-Markov decision processes Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I, (41-48)
  143. Di Castro D and Mannor S Adaptive bases for reinforcement learning Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I, (312-327)
  144. Di Castro D and Mannor S Adaptive bases for reinforcement learning Proceedings of the 2010th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I, (312-327)
  145. Gnecco G and Sanguineti M (2010). Suboptimal Solutions to Dynamic Optimization Problems via Approximations of the Policy Functions, Journal of Optimization Theory and Applications, 146:3, (764-794), Online publication date: 1-Sep-2010.
  146. Brázdil T, Krčál J, Křetínský J, Kucěra A and Řehák V Stochastic real-time games with qualitative timed automata objectives Proceedings of the 21st international conference on Concurrency theory, (207-221)
  147. ACM
    Levine S, Krähenbühl P, Thrun S and Koltun V Gesture controllers ACM SIGGRAPH 2010 papers, (1-11)
  148. ACM
    Levine S, Krähenbühl P, Thrun S and Koltun V (2010). Gesture controllers, ACM Transactions on Graphics, 29:4, (1-11), Online publication date: 26-Jul-2010.
  149. Fuemmeler J and Veeravalli V (2010). Energy efficient multi-object tracking in sensor networks, IEEE Transactions on Signal Processing, 58:7, (3742-3750), Online publication date: 1-Jul-2010.
  150. Yu H Convergence of least squares temporal difference methods under general conditions Proceedings of the 27th International Conference on International Conference on Machine Learning, (1207-1214)
  151. Zhao L and Permuter H (2010). Zero-error feedback capacity of channels with state information via dynamic programming, IEEE Transactions on Information Theory, 56:6, (2640-2650), Online publication date: 1-Jun-2010.
  152. Min D and Yih Y (2010). An elective surgery scheduling problem considering patient priority, Computers and Operations Research, 37:6, (1091-1099), Online publication date: 1-Jun-2010.
  153. Grześ M and Kudenko D PAC-MDP learning with knowledge-based admissible models Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1, (349-358)
  154. ACM
    Kamitsos I, Andrew L, Kim H and Chiang M Optimal sleep patterns for serving delay-tolerant jobs Proceedings of the 1st International Conference on Energy-Efficient Computing and Networking, (31-40)
  155. ACM
    Ramponi F, Chatterjee D, Summers S and Lygeros J On the connections between PCTL and dynamic programming Proceedings of the 13th ACM international conference on Hybrid systems: computation and control, (253-262)
  156. Akuiyibo E and Boyd S (2010). Adaptive modulation with smoothed flow utility, EURASIP Journal on Wireless Communications and Networking, 2010, (1-9), Online publication date: 1-Apr-2010.
  157. Kim J, Lin X, Shroff N and Sinha P (2010). Minimizing delay and maximizing lifetime for wireless sensor networks with anycast, IEEE/ACM Transactions on Networking, 18:2, (515-528), Online publication date: 1-Apr-2010.
  158. Chaporkar P, Proutiere A and Radunovic B Rate adaptation games in wireless LANs Proceedings of the 29th conference on Information communications, (2052-2060)
  159. Chaporkar P, Proutiere A and Asnani H Learning to optimally exploit multi-channel diversity in wireless systems Proceedings of the 29th conference on Information communications, (812-820)
  160. Fu F and van der Schaar M (2010). Decomposition principles and online learning in cross-layer optimization for delay-sensitive applications, IEEE Transactions on Signal Processing, 58:3, (1401-1415), Online publication date: 1-Mar-2010.
  161. Liu Y and Ma D Multiuser scalable video streaming over ad-hoc wireless network with strict delay and energy constraints Proceedings of the 7th international conference on Wireless on-demand network systems and services, (47-52)
  162. Pantelidou A and Ephremides A Minimum-length scheduling for multicast traffic under channel uncertainty Proceedings of the 28th IEEE conference on Global telecommunications, (5437-5442)
  163. Devlin S, Grzes M and Kudenko D Reinforcement Learning in RoboCup KeepAway with Partial Observability Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 02, (201-208)
  164. ACM
    da Silva M, Durand F and Popović J Linear Bellman combination for control of character animation ACM SIGGRAPH 2009 papers, (1-10)
  165. ACM
    da Silva M, Durand F and Popović J (2009). Linear Bellman combination for control of character animation, ACM Transactions on Graphics, 28:3, (1-10), Online publication date: 27-Jul-2009.
  166. Yolken B, Tsamis D and Bambos N Cost and target-based scheduling for switch power control Proceedings of the 2009 IEEE international conference on Communications, (1103-1108)
  167. Angelos B, Heasley M and Humpherys J Option pricing for inventory management and control Proceedings of the 2009 conference on American Control Conference, (2831-2836)
  168. Zhang W, Abate A and Hu J Efficient suboptimal solutions of switched LQR problems Proceedings of the 2009 conference on American Control Conference, (1084-1091)
  169. Bethke B and How J Approximate dynamic programming using Bellman residual elimination and Gaussian process regression Proceedings of the 2009 conference on American Control Conference, (745-750)
  170. ACM
    Chaporkar P, Proutiere A, Asnani H and Karandikar A Scheduling with limited information in wireless systems Proceedings of the tenth ACM international symposium on Mobile ad hoc networking and computing, (75-84)
  171. Huynh V and Roy N icLQG Proceedings of the 2009 IEEE international conference on Robotics and Automation, (2697-2704)
  172. Bühler J and Wunder G An optimization framework for heterogeneous access management Proceedings of the 2009 IEEE conference on Wireless Communications & Networking Conference, (2525-2530)
  173. ACM
    Arns M, Buchholz P and Müller D (2009). OPEDo, ACM SIGMETRICS Performance Evaluation Review, 36:4, (22-27), Online publication date: 25-Mar-2009.
  174. Deisenroth M, Rasmussen C and Peters J (2009). Gaussian process dynamic programming, Neurocomputing, 72:7-9, (1508-1524), Online publication date: 1-Mar-2009.
  175. Miller S, Harris Z and Chong E (2009). A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking, EURASIP Journal on Advances in Signal Processing, 2009, (1-17), Online publication date: 1-Jan-2009.
  176. Faryabi B, Vahedi G, Chamberland J, Datta A and Dougherty E (2009). Intervention in context-sensitive probabilistic Boolean networks revisited, EURASIP Journal on Bioinformatics and Systems Biology, 2009:S2, (1-13), Online publication date: 1-Jan-2009.
  177. Zhang G, Wang J and Liu Y Congestion management in delay tolerant networks Proceedings of the 4th Annual International Conference on Wireless Internet, (1-9)
  178. Csáji B and Monostori L (2008). Value Function Based Reinforcement Learning in Changing Markovian Environments, The Journal of Machine Learning Research, 9, (1679-1709), Online publication date: 1-Jun-2008.
  179. Oliehoek F, Spaan M and Vlassis N (2008). Optimal and approximate Q-value functions for decentralized POMDPs, Journal of Artificial Intelligence Research, 32:1, (289-353), Online publication date: 1-May-2008.
  180. Starobinski D, Weiyao Xiao , Xiangping Qin and Trachtenberg A Near-Optimal Data Dissemination Policies for Multi-Channel, Single Radio Wireless Sensor Networks Proceedings of the IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, (955-963)
  181. Bean N and Costa A (2005). An analytic modelling approach for network routing algorithms that use "ant-like" mobile agents, Computer Networks: The International Journal of Computer and Telecommunications Networking, 49:2, (243-268), Online publication date: 5-Oct-2005.
  182. Benyouss A, Jabi M, Le Treust M and Szczecinski L Joint coding/decoding for multi-message HARQ 2016 IEEE Wireless Communications and Networking Conference, (1-7)
  183. Zhao S, Lin X, Aliprantis D, Villegas H and Chen M Online multi-stage decisions for robust power-grid operations under high renewable uncertainty IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications, (1-9)
  184. Rafieisakhaei M, Tamjidi A, Chakravorty S and Kumar P Feedback motion planning under non-Gaussian uncertainty and non-convex state constraints 2016 IEEE International Conference on Robotics and Automation (ICRA), (4238-4244)
  185. Abd-Elmagid M, Dhillon H and Pappas N Online Age-Minimal Sampling Policy for RF-Powered IoT Networks 2019 IEEE Global Communications Conference (GLOBECOM), (1-6)
Contributors
  • School of Computing and Augmented Intelligence

Recommendations