Editorial Notes
A Corrected Version of Record for this paper was published in the ACM Digital Library on June 7, 2023, in keeping with an agreement with IEEE, which had consented to the addition of an author after the paper was originally published. For reference purposes, the Version of Record can be accessed via the Supplemental Material section of this page.
ABSTRACT
Novel algorithmic advances have paved the way for robotics to transform the dynamics of many social and enterprise applications. To achieve true autonomy, robots need to continuously process and interact with their environment through computationally-intensive motion planning and control algorithms under a low power budget. Specialized architectures offer a potent choice to provide low-power, high-performance accelerators for these algorithms. Instead of taking a traditional route which profiles and maps hot code regions to accelerators, this paper delves into the algorithmic characteristics of the application domain. We observe that many motion planning and control algorithms are formulated as a constrained optimization problems solved online through Model Predictive Control (MPC). While models and objective functions differ between robotic systems and tasks, the structure of the optimization problem and solver remain fixed. Using this theoretical insight, we create RoboX, an end-to-end solution which exposes a high-level domain-specific language to roboticists. This interface allows roboticists to express the physics of the robot and its task in a form close to its concise mathematical expressions. The RoboX backend then automatically maps this high-level specification to a novel programmable architecture, which harbors a programmable memory access engine and compute-enabled interconnects. Hops in the interconnect are augmented with simple functional units that either operate on in-fight data or are bypassed according a micro-program. Evaluations with six different robotic systems and tasks show that RoboX provides a 29.4 x (7.3 x) speedup and 22.1 x (79.4 x) performance-per-watt improvement over an ARM Cortex A57 (Intel Xeon E3). Compared to GPUs, RoboX attains 7.8x, 65.5x, and 71.8x higher Performance-per-Watt to Tegra X2, GTX 650 Ti, and Tesla K40 with a power envelope of only 3.4 Watts at 45 nm.
Supplemental Material
Available for Download
Version of Record for "Robox: an end-to-end solution to accelerate autonomous control in robotics" by Sacks et al., Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA '18).
- Curiosity rover. https://mars.nasa.gov/msl/.Google Scholar
- The da vinci surgical system. http://www.intuitivesurgical.com/products/davinci_surgical_system/.Google Scholar
- Oshbot. http://www.lowesinnovationlabs.com/.Google Scholar
- L. Pericca, P. Ohlckers, and C. Grinde, "Micro- and nano-air vehicles: State of the art," in International Journal of Aerospace Engineering, 2011.Google Scholar
- Phantom 2. https://www.dji.com/phantom-2/.Google Scholar
- M. A. M. Kamel, K. Alexis and R. Siegwart, "Fast nonlinear model predictive control for multicopter attitude tracking on so(3)," in IEEE Multi-Conference on Systems and Control, 2015.Google Scholar
- Z. Zhang, A. Suleiman, L. Carlone, V. Sze, and S. Karaman, "Visual- inertial odometry on chip: An algorithm-and-hardware co-design approach," in Robotics: Science and Systems (RSS), 2017.Google Scholar
- M. Keennon, K. Klingebiel, H. Won, and A. Andriukov, "Development of the nano hummingbird: A tailless flapping wing micro air vehicle," in AIAA Aerospace Sciences Meeting and Exhibit, 2012.Google Scholar
- R. J. Wood, B. Finio, M. Karpelson, K. Ma, N. O. Perez-Arancibia, P. S. Sreetharan, H. Tanaka, and J. P. Whitney, 'Progress on 'pico' air vehicles," in The International Journal of Robotics Research, 2012. Google ScholarDigital Library
- R. He and S. Sato, "Design of a single-motor nano aerial vehicle with a gearless torque-canceling mechanism," in AIAA Aerospace Sciences Meeting and Exhibit, 2008.Google Scholar
- T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning," in ASPLOS, 2014. Google ScholarDigital Library
- Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam, "ShiDianNao: shifting vision processing closer to the sensor," in ISCA, 2015. Google ScholarDigital Library
- D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, "PuDianNao: A polyvalent machine learning accelerator," in ASPLOS, 2015. Google ScholarDigital Library
- B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernandez-Lobato, G. Y. Wei, and D. Brooks, "Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators," in ISCA, 2016. Google ScholarDigital Library
- S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, "EIE: Efficient Inference Engine on Compressed Deep Neural Network," in ISCA, 2016. Google ScholarDigital Library
- Y.-H. Chen, J. Emer, and V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," in ISCA, 2016. Google ScholarDigital Library
- J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing," in ISCA, 2016. Google ScholarDigital Library
- H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh, "From High-Level Deep Neural Models to FPGAs," in MICRO, 2016. Google ScholarDigital Library
- D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh, "TABLA: A Unified Template-based Framework for Accelerating Statistical Machine Learning," in HPCA, 2016.Google Scholar
- A. Liniger, A. Domahidi, and M. Morari, "Optimization-based autonomous racing of 1:43 scale rc cars," in Optimal Control Applications and Methods, 2014.Google Scholar
- F. Kuhne, J. M. G. da Silva Jr, and W. F. Lages, "Mobile robot trajectory tracking using model predictive control," in Latin American Robotics Symposium, 2005.Google Scholar
- O. Hegrenaes, J. T. Gravdahl, and P. Tondel, "Spacecraft attitude control using explicit model predictive control," in Automatica, 2005. Google ScholarDigital Library
- S. Bouabdallah and R. Siegwart, "Full control of a quadrotor," in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007.Google Scholar
- R. M. Murray, Z. X. Li, and S. S. Sastry, A Mathematical Introduction to Robotic Manipulation. CRC Press, 1994. Google ScholarDigital Library
- M. Neunert, C. de Crousaz, F. Furrer, M. Kamel, F. Farshidian, R. Siegwart, and J. Buchli, "Fast Nonlinear Model Predictive Control for Unified Trajectory Optimization and Tracking," in ICRA, 2016.Google Scholar
- P. Bouffard, A. Aswani, , and C. Tomlin, "Learning-based model predictive control on a quadrotor: Onboard implementation and experimental results," in IEEE International Conference on Robotics and Automation, 2012.Google Scholar
- G. N. K. Alexis, C. Papachristos and A. Tzes, "Model predictive quadrotor indoor position control," in MCCA, 2011.Google Scholar
- E. Todorov and W. Li, "A generalized iterative lqg method for locally optimal feedback control of constrained nonlinear stochastic systems," in ACC, 2005.Google Scholar
- T. Erez, Y. Tassa, and E. Todorov, "Synthesis and stabilization of complex behaviors through online trajectory optimization," in International Conference on Intelligent Robots and Systems, 2012.Google Scholar
- S. Boyd and L. Vandenberghe, "Interior-point methods," in Convex Optimization, 1st ed. Cambridge University Press, 2008.Google Scholar
- M. Vukov, A. Domahidi, H. J. Ferreau, M. Morari, and M. Diehl, "Auto-Generated Algorithms for Nonlinear Model Predictive Control on Long and on Short Horizons," in CDC, 2013.Google Scholar
- B. Houska, H. J. Ferreau, and M. Diehl, "An Auto-Generated Real-Time Iteration Algorithm for Nonlinear MPC in the Microsecond Range," in Automatica, 2011. Google ScholarDigital Library
- M. Diehl, R. Findeisen, and F. Allgower, "A Stabilizing Real- time Implementation of Nonlinear Model Predictive Control," in Real-Time and Online PDE-Constrained Optimization, L. Biegler, O. Ghattas, D. K. M. Heinkenschloss, and B. van Bloemen Waanders, Eds. SIAM, 2007.Google Scholar
- B. Houska, H. J. Ferreau, and M. Diehl, "ACADO Toolkit - An open-source framework for automatic control and dynamic optimization." in Optimal Control Applications and Methods, 2010.Google Scholar
- A. Domahidi, A. U. Zgraggen, M. N. Zeilinger, M. Morari, and C. N. Jones, "Efficient Interior Point Methods for Multistage Problems Arising in Receding Horizon Control," in CDC, 2012.Google Scholar
- J. Mattingley and S. Boyd, "Automatic Code Generation for Real-Time Convex Optimization," in Convex Optimization in Signal Processing and Communication, 2009.Google Scholar
- L. N. Trefethen and D. B. III, "Cholesky factorization," in Numerical Linear Algebra, 1st ed. SIAM, 1997.Google Scholar
- G. Frison, D. Kouzoupis, A. Zanelli, and M. Diehl, "BLASFEO: Basic Linear Algebra Subroutines for Embedded Optimization," in ArXiV, 2017.Google Scholar
- G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. B. Taylor, "Conservation cores: Reducing the energy of mature computations," in International Conference on Architectural Support for Programming Languages and Operating Systems, 2010. Google ScholarDigital Library
- H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, "Neural acceleration for general-purpose approximate programs," in MICRO, 2012. Google ScholarDigital Library
- R. S. Amant, A. Yazdanbakhsh, J. Park, B. Thwaites, H. Esmaeilzadeh, A. Hassibi, L. Ceze, and D. Burger, "General-purpose code acceleration with limited-precision analog computation," in ISCA, 2014.Google Scholar
- A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, and H. Esmaeilzadeh, "Neural acceleration for gpu throughput processors," in MICRO, 2015. Google ScholarDigital Library
- V. Govindaraju, C.-H. Ho, T. Nowatzki, J. Chhugani, N. Satish, K. Sankaralingam, and C. Kim, "Dyser: Unifying functionality and parallelism specialization for energy-efficient comptuing," in MICRO, 2012. Google ScholarDigital Library
- A. Putnam, D. Bennett, E. Dellinger, J. Mason, P. Sundararajan, and S. Eggers, "Chimps: A c-level compilation flow for hybrid cpu-fpga architectures," in International Conference on Field Programmable Logic and Applications, 2008.Google ScholarDigital Library
- N. Clark, M. Kudlur, H. Park, S. Mahlke, and K. Flautner, "Application-specific processing on a general-purpose core via transparent instruction set customization," in MICRO, 2004. Google ScholarDigital Library
- Z. A. Ye, A. Moshovos, S. Hauck, and P. Banerjee, "Chimaera: A high-performance architecture with a tightly-coupled reconfigurable functional unit," in ISCA, 2000. Google ScholarDigital Library
- J. R. Hauser and J. Wawrzynek, "Garp: A mips processor with a reconfigurable coprocessor," in IEEE Symposium on FPGA-Based Custom Computing Machines, 1997. Google ScholarDigital Library
- K. Fan, M. Kudlur, G. Dasika, and S. Mahlke, "Bridging the computation gap between programmable processors and hardwired accelerators," in HPCA, 2009.Google Scholar
- K. Karagianni, T. Chronopoulos, A. Tzes, N. Koussoulas, and T. Stouraitis, "Efficient processor arrays for the implementation of generalised predictive-control algorithm," in IEEE Proceedings - Control Theory and Applications, 1998.Google Scholar
- L. G. Bleris, J. Garcia, M. V. Kothare, and M. G. Arnold, "Towards embedded model predictive control for system-on-a-chip applications," in Journal of Process Control, 2006.Google Scholar
- J. L. Jerez, P. J. Goulart, S. Richter, G. A. Constantinides, E. C. Kerrigan, and M. Morari, "Embedded Online Optimization for Model Predictive Control at Megahertz Rates," in IEEE Transactions on Automatic Control, vol. 59, 2014.Google Scholar
- M.-A. Boechat, J. Liu, H. Peyrl, A. Zanarini, and T. Besselmann, "An Architecture for Solving Quadratic Programs with the Fast Gradient Method on a Field Programmable Gate Array," in MCCA, 2013.Google Scholar
- E. N. Hartley and J. M. Maciejowski, "Predictive Control for Spacecraft Rendezvous in an Elliptical Orbit using an FPGA," in ECC, 2013.Google Scholar
- T. A. Johansen, W. Jackson, R. Schreiber, and P. Tondel, "Hardware architecture design for explicit model predictive control," in ACC, 2006.Google Scholar
- K. V. Ling, S. P. Yue, and J. M. Maciejowski, "A fpga implementation of model predictive control," in ACC, 2006.Google Scholar
- P. D. Vouzis, L. G. Bleris, M. G. Arnold, and M. V. Kothare, "A system-on-a-chip implementation for embedded real-time model predictive control," in IEEE Transactions on Control Systems Technology, 2009.Google Scholar
- D. Soudbakhsh and A. M. Annaswamy, "Parallel model predictive control," in ACC, 2013.Google Scholar
- B. Kapernick, S. Sub, E. Schubert, and K. Graichen, "A Synthesis Strategy for Nonlinear Model Predictive Controller on FPGA," in UKACC International Conference on Control, 2014.Google Scholar
- F. Xu, H. Chen, X. Gong, and Q. Mei, "Fast Nonlinear Model Predictive Control on FPGA Using Particle Swarm Optimization," in IEEE Transactions on Industrial Electronics, 2016.Google Scholar
- F. Xu, H. Chen, W. Jin, and Y. Xu, "FPGA Implementation of Nonlinear Model Predictive Control," in CCDC, 2014.Google Scholar
- H. Peyrl, H. Ferreau, and D. Kouzoupis, "A Hybrid Hardware Implementation for Nonlinear Model Predictive Control," in IFAC, 2015, pp. 87--93.Google Scholar
- B. Khusainov, E. C. Kerrigan, A. Suardi, and G. A. Constantinides, "Nonlinear predictive control on heterogeneous computing platform," in IFAC, 2017.Google Scholar
- M. Frigerio, J. Buchli, and D. G. Caldwell, "A domain specific language for kinematic models and fast implementations of robot dynamics algorithms," in International Workshop on Domain-Specific Languages and Models for Robotic Systems, 2015.Google Scholar
- C. A. Jara, F. A. Candelas, P. Gil, F. Torres, F. Esquembre, and S. Dormido, "Ejs+ejsrl: An interactive tool for industrial robots simulation, computer vision and remote operation," in Robotics and Autonomous Systems, 2011. Google ScholarDigital Library
- M. Frigerio, J. Buchli, and D. G. Caldwell, "Code generation of algebraic quantities for robot controllers," in International Conference on Intelligent Robots and Systems, 2012.Google Scholar
- M. Bordignon, K. Stoy, and U. P. Schultz, "Generalized programming of modular robots through kinematic configurations," in International Conference on Intelligent Robots and Systems, 2011.Google Scholar
- J. Buch, J. Laursen, L. Sorensen, L. pEter Ellekilde, D. Kraft, U. Schultz, and H. Peterson, "Applying simulation and a domain-specific language for an adaptive action library," in SIMPAR, 2014. Google ScholarDigital Library
- H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas, "From structured english to robot motion," in IROS, 2007.Google Scholar
- H. Kress-Gazit, G. E. Fainekos, and G. J. Pappas, "A motion description language and a hybrid architecture for motion planning with nonholonomic robots," in IROS, 2007.Google Scholar
- E. Aertbelien and J. D. Schutter, "etasl/etc: A constraint-based task specification language and robot controller using expression graphs," in International Conference on Intelligent Robots and Systems, 2014.Google Scholar
- D. Vanthienen, M. Klotzbucher, J. D. Schutter, T. D. Laet, and H. Bruyninckx, "Rapid application development of constrained-based task modelling and execution using domain specific languages," in International Conference on Intelligent Robots and Systems, 2013.Google Scholar
- C. Finucane, G. Jing, and H. Kress-Gazit, "Ltlmop: Experimenting with language, temporal logic, and robot control," in IROS, 2010.Google Scholar
- U. Thomas, G. Hirzinger, B. Rumpe, C. Schulze, and A. Wortmann, "A new skill based robot programming language using uml/p statecharts," in ICRA, 2013.Google Scholar
- T. Kim and J. Yuh, "Task description language for underwater robots," in IROS, 2003.Google Scholar
- M. Morelli and M. D. Natale, "Control and scheduling co-design for a simulated quadcopter robot: A model-driven approach," in SIMPAR, 2014. Google ScholarDigital Library
- N. Dantam, A. Hereid, A. Ames, and M. Stilman, "Correct software synthesis for stable speed-controlled robotic walking," in RSS, 2009.Google Scholar
- S. Longo, E. C. Kerrigan, K. V. Ling, and G. A. Constantinides, "Parallel move blocking model predictive control," in CDC-ECC, 2011.Google Scholar
- S. Kawakami, A. Iwanaga, and K. Inoue, "Many-core acceleration for model predictive control systems," in Proceedings of the First International Workshop on Many-Core Embedded Systems, 2013. Google ScholarDigital Library
- P. Costa, A. Donnelly, A. Rowstron, and G. O'Shea, "Camdoop: Exploiting in-network aggregation for big data applications," in USENIX Symposium on Networked Systems Design and Implementation, 2012. Google ScholarDigital Library
- L. Mai, L. Rupprecht, A. Alim, P. Costa, M. Migliavacca, P. Pietzuch, and A. L. Wolf, "Netagg: Using middleboxes for application-specific on-path aggregation in data centres," in International Conference on Emerging Networking Experiments and Technologies, 2013. Google ScholarDigital Library
- M. Liu, L. Luo, J. Nelson, L. Ceze, A. Krishnamurthy, and K. Atreya, "Incbricks: Toward in-network computation with an in-network cache," in International Conference on Architectural Support for Programming Languages and Operating Systems, 2017. Google ScholarDigital Library
- V. Jeyakumar, M. Alizadeh, Y. Geng, C. Kim, and D. Mazieres, "Millions of little minions: Using packets for low latency network programming and visibility," in ACM Conference on SIGCOMM, 2014. Google ScholarDigital Library
- B. Schwartz, A. W. Jackson, W. T. Strayer, W. Zhou, R. D. Rockwell, and C. Partridge, "Smart packets: Applying active networks to network management," in IEEE Second Conference on Open Architectures and Network Programming Proceedings, 1999.Google Scholar
Index Terms
- Robox: an end-to-end solution to accelerate autonomous control in robotics
Recommendations
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
SBAC-PAD '13: Proceedings of the 2013 25th International Symposium on Computer Architecture and High Performance ComputingThis paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, ...
Programming the Linpack benchmark for the IBM PowerXCell 8i processor
High Performance Computing with the Cell Broadband EngineIn this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i 1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™ 2 architecture ...
Performance and toolchain of a combined GPU/FPGA desktop (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysLow-power, high-performance computing nowadays relies on accelerator cards to speed up the calculations. Combining the power of GPUs with the flexibility of FPGAs enlarges the scope of problems that can be accelerated [2, 3]. We describe the performance ...
Comments