Abstract
Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step towards widespread deployment of learning control, we provide stability analysis tools for controllers acting on dynamics represented by Gaussian processes (GPs). We consider differentiable Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For both cases, we analyze finite and infinite time horizons. Furthermore, we study the effect of disturbances on the stability results. Empirical evaluations on simulated benchmark problems support our theoretical results.
- G. Adomian. Stochastic systems. Mathematics in Science and Engineering. Elsevier Science, 1983.Google Scholar
- A. A. Ahmadi and P. A. Parrilo. Converse results on existence of sum of squares lyapunov functions. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 6516-6521, Dec 2011.Google ScholarCross Ref
- A. A. Ahmadi, A. Majumdar, and R. Tedrake. Complexity of ten decision problems in continuous time dynamical systems. In 2013 American Control Conference, pages 6376-6381, 2013.Google ScholarCross Ref
- T. Beckers and S. Hirche. Stability of gaussian process state space models. In Proceedings of the European Control Conference (ECC), 2016.Google ScholarCross Ref
- F. Blanchini. Set invariance in control. Automatica, 35(11):1747 - 1767, 1999. Google ScholarDigital Library
- J. Burkardt. Stroud - numerical integration in m dimensions. https://people.sc.fsu.edu/~jburkardt/m_src/stroud/stroud.html, 2014.Google Scholar
- G. Chesi. Estimating the domain of attraction for uncertain polynomial systems. Automatica, 40(11):1981-1986, 2004. Google ScholarDigital Library
- P.J. Davis, P. Rabinowitz, and W. Rheinbolt. Methods of Numerical Integration. Computer Science and Applied Mathematics. Elsevier Science, 2014.Google Scholar
- M.P. Deisenroth. Efficient Reinforcement Learning Using Gaussian Processes. Karlsruhe series on intelligent sensor actuator systems. KIT Scientific Publ., 2010.Google Scholar
- M.P. Deisenroth, D. Fox, and C.E. Rasmussen. Gaussian processes for data-efficient learning in robotics and control. IEEE Trans. Pattern Anal. Mach. Intell., 37(2):408-423, 2015.Google ScholarDigital Library
- K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12: 219-245, 2000. Google ScholarDigital Library
- Y. Engel, P. Szabo, and D. Volkinshtein. Learning to control an octopus arm with gaussian process temporal difference methods. In Y. Weiss, B. Schölkopf, and J.C. Platt, editors, Advances in Neural Information Processing Systems 18, pages 347-354. MIT Press, 2006. Google ScholarDigital Library
- G.A. Evans. The estimation of errors in numerical quadrature. International Journal of Mathematical Education in Science and Technology, 25(5):727-744, 1994.Google ScholarCross Ref
- G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 2013.Google Scholar
- N.R. Hansen. Geometric ergodicity of discrete-time approximations to multivariate diffusions. Bernoulli, 9(4):725-743, 08 2003.Google ScholarCross Ref
- F. Heiss and V. Winschel. Likelihood approximation by numerical integration on sparse grids. Journal of Econometrics, 144(1):62 - 80, 2008.Google ScholarCross Ref
- A. Hurwitz. Ueber die Bedingungen, unter welchen eine Gleichung nur Wurzeln mit negativen reellen Theilen besitzt. Mathematische Annalen, 46(2):273-284, 1895.Google ScholarCross Ref
- H.K. Khalil. Nonlinear control. Prentice Hall, 2014.Google Scholar
- R. Khasminskii and G.N. Milstein. Stochastic Stability of Differential Equations. Stochastic Modelling and Applied Probability. Springer Berlin Heidelberg, 2011.Google Scholar
- H.J. Kim and A.Y. Ng. Stable adaptive control with online learning. In L.K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 977-984. MIT Press, 2005. Google ScholarDigital Library
- E.D. Klenske, M.N. Zeilinger, B. Schölkopf, and P. Hennig. Nonparametric dynamics estimation for time periodic systems. In Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 486-493. IEEE, 2013.Google ScholarCross Ref
- J. Ko and D. Fox. GP-BayesFilters: Bayesian filtering using gaussian process prediction and observation models. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008., 2008.Google ScholarCross Ref
- J. Kocijan, R. Murray-Smith, C.E. Rasmussen, and A. Girard. Gaussian process model based predictive control. In American Control Conference, 2004. Proceedings of the 2004, volume 3, pages 2214-2219. IEEE, 2004.Google ScholarCross Ref
- H.J. Kushner. Finite time stochastic stability and the analysis of tracking systems. Automatic Control, IEEE Transactions on, 11(2):219-227, 1966.Google Scholar
- H.J. Kushner. Stochastic Stability and Control. Mathematics in science and engineering. Academic Press, 1967.Google Scholar
- A.M. Lyapunov. General Problem of the Stability Of Motion. Doctoral dissertation, Univesity of Kharkov, 1892. Englisch Translation by A.T. Fuller, Taylor & Francis, London 1992.Google Scholar
- J.M. Maciejowski and X. Yang. Fault tolerant control using gaussian processes and model predictive control. In Control and Fault-Tolerant Systems (SysTol), 2013 Conference on, pages 1-12. IEEE, 2013.Google ScholarCross Ref
- A. Majumdar, A. A. Ahmadi, and R. Tedrake. Control and verification of high-dimensional systems with dsos and sdsos programming. In 53rd IEEE Conference on Decision and Control, pages 394-401, 2014.Google ScholarCross Ref
- M. Masjed-Jamei. New error bounds for gauss-legendre quadrature rules. Filomat, 28(6): 1281-1293, 2014.Google ScholarCross Ref
- S. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, New York, NY, USA, 2nd edition, 2009. Google ScholarDigital Library
- Charles A. Micchelli, Yuesheng Xu, and Haizhang Zhang. Universal kernels. Journal of Machine Learning Research, 7:2651-2667, December 2006. Google ScholarDigital Library
- J. Moore and R. Tedrake. Adaptive control design for underactuated systems using sums-of-squares optimization. In 2014 American Control Conference, pages 721-728, June 2014.Google ScholarCross Ref
- J. Nakanishi, J.A. Farrell, and S. Schaal. A locally weighted learning composite adaptive controller with structure adaptation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2002., pages 882-889 vol.1, 2002.Google ScholarCross Ref
- K.S. Narendra and A.M. Annaswamy. Stable Adaptive Systems. Dover Books on Electrical Engineering. Dover Publications, 2012.Google Scholar
- D. Nguyen-Tuong and J. Peters. Model learning in robotics: a survey. Cognitive Processing, (4), 2011.Google Scholar
- E. Novak and K. Ritter. High dimensional integration of smooth functions over cubes. Numerische Mathematik, 75(1):79-97, 1996.Google ScholarCross Ref
- Y. Pan and E. Theodorou. Probabilistic differential dynamic programming. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 1907-1915. Curran Associates, Inc., 2014. Google ScholarDigital Library
- A. Papachristodoulou and S. Prajna. Analysis of Non-polynomial Systems Using the Sum of Squares Decomposition, pages 23-43. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.Google Scholar
- P.A. Parrilo. Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. PhD thesis, 2000.Google Scholar
- T.J. Perkins and A.G. Barto. Lyapunov design for safe reinforcement learning. Journal of Machine Learning Research, 3:803-832, March 2003. Google ScholarDigital Library
- J. Quiñonero-Candela, A. Girard, J. Larsen, and C.E. Rasmussen. Propagation of uncertainty in bayesian kernel models - application to multiple-step ahead forecasting. In International Conference on Acoustics, Speech and Signal Processing, pages 701-704, vol. 2, 2003.Google Scholar
- C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005. Google ScholarDigital Library
- E.J. Routh. A Treatise on the Stability of a Given State of Motion: Particularly Steady Motion. Macmillan and Company, 1877.Google Scholar
- E.K. Ryu and S.P. Boyd. Extensions of gauss quadrature via linear programming. Foundations of Computational Mathematics, 15(4):953-971, 2015. Google ScholarDigital Library
- S. Skogestad and I. Postlethwaite. Multivariable Feedback Control: Analysis and Design. John Wiley & Sons, 2005. Google ScholarDigital Library
- B.S. Skrainka and K.L. Judd. High performance quadrature rules: How numerical integration affects a popular model of product differentiation. Available at SSRN 1870703, 2011.Google Scholar
- J. Steinhardt and R. Tedrake. Finite-time regional verification of stochastic nonlinear systems. In H.F. Durrant-Whyte, N. Roy, and P. Abbeel, editors, Robotics: Science and Systems VII, pages 321-328. MIT Press, 2012.Google Scholar
- A.H. Stroud. Approximate calculation of multiple integrals. Prentice-Hall series in automatic computation. Prentice-Hall, 1971.Google Scholar
- E. Süli and D.F. Mayers. An Introduction to Numerical Analysis. Cambridge University Press, 2003.Google ScholarCross Ref
- R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. Google ScholarDigital Library
- G. Tao. Adaptive Control Design and Analysis (Adaptive and Learning Systems for Signal Processing, Communications and Control Series). John Wiley & Sons, Inc., New York, NY, USA, 2003. Google ScholarDigital Library
- U. Topcu, A. Packard, P. Seiler, and G. Balas. Help on sos [ask the experts]. IEEE Control Systems, 30(4):18-23, 2010a.Google ScholarCross Ref
- U. Topcu, A. K. Packard, P. Seiler, and G. J. Balas. Robust region-of-attraction estimation. IEEE Transactions on Automatic Control, 55(1):137-142, 2010b.Google ScholarCross Ref
- J. Vinogradska, B. Bischoff, D. Nguyen-Tuong, A. Romer, H. Schmidt, and J. Peters. Stability of controllers for gaussian process forward models. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 545-554, 2016. Google ScholarDigital Library
- S. Wasowicz. On error bounds for gauss-legendre and lobatto quadrature rules. Journal of Inequalities in Pure & Applied Mathematics, 7(3):Paper No. 84, 7 p., 2006.Google Scholar
- H. Xiao and Z. Gimbutas. A numerical algorithm for the construction of efficient quadrature rules in two and higher dimensions. Computers & Mathematics with Applications, 59(2): 663 - 676, 2010. Google ScholarDigital Library
- K. Zhou and J.C. Doyle. Essentials of Robust Control. Prentice Hall Modular Series for Eng. Prentice Hall, 1998.Google Scholar
Recommendations
On stability and stabilization of T-S fuzzy time-delayed systems
In this paper, the stability analysis and control design of Takagi-Sugeno (TS) fuzzy systems subject to uncertain time-delay are addressed. The proposed approach is based on linear matrix inequalities and the Lyapunov-Krasovskii theory, where a new ...
Novel robust stability criteria for uncertain systems with time-varying delay
This paper is concerned with the delay-dependent stability and robust stability criteria for linear systems with time-varying delay and norm-bounded uncertainties. Through constructing a general form of Lyapunov-Krasovskii functional, and using integral ...
Control for stability and positivity of 2-D linear discrete-time systems
This paper investigate the stabilizability of 2-D linear discrete-time systems described by the Roesser model with closed-loop positivity. Necessary and sufficient condition for the existence of desired state-feedback controllers guaranteeing the ...
Comments