skip to main content
research-article
Public Access

Distributed strategies for computational sprints

Published:28 January 2019Publication History
Skip Abstract Section

Abstract

Computational sprinting is a class of mechanisms that boost performance but dissipate additional power. We describe a sprinting architecture in which many, independent chip multiprocessors share a power supply and sprints are constrained by the chips' thermal limits and the rack's power limits. Moreover, we present the computational sprinting game, a multi-agent perspective on managing sprints. Strategic agents decide whether to sprint based on application phases and system conditions. The game produces an equilibrium that improves task throughput for data analytics workloads by 4--6x over prior greedy heuristics and performs within 90% of an upper bound on throughput from a globally optimized policy.

References

  1. Adlakha, S., Johari, R. Mean field equilibrium in dynamic games with strategic complementarities. Oper. Res. 61, 4 (2013), 971--989.Google ScholarGoogle ScholarCross RefCross Ref
  2. Brooks, D. Martonosi, M. Dynamic thermal management for high-performance microprocessors. In Proceedings of the 7th IEEE International Symposium on High Performance Computer Architecture (HPCA) (Monterrey, Nuevo Leon, Mexico, 2001), 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chase, J.S., Anderson, D.C., Thakar, P.N., Vahdat, A.M., Doyle, R.P. Managing energy and server resources in hosting centers. In Proceedings of the 18th Symposium on Operating Systems Principles (SOSP) (Banff, Alberta, Canada, 2001), 103--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fan, X., Weber, W.-D., Barroso, L.A. Power provisioning for a warehouse-sized computer. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA) (San Diego, CA, USA, 2007), 13--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fu, X., Wang, X., Lefurgy, C. How much power oversubscription is safe and allowed in data centers. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC) (Karlsruhe, Germany, 2011), 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I. Dominant resource fairness: Fair allocation of multiple resource types. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI) (Boston, MA, USA, 2011), 323--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Govindan, S., Sivasubramaniam, A., Urgaonkar, B. Benefits and limitations of tapping into stored energy for datacenters. In Proceeding of the 38th Annual International Symposium on Computer Architecture (ISCA) (San Jose, CA, USA, 2011), 341--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guevara, M., Lubin, B., Lee, B.C.. Navigating heterogeneous processors with market mechanisms. In Proceeding of the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA) (Shenzhen, China, 2013), 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Guevara, M., Lubin, B., Lee, B.C. Strategies for anticipating risk in heterogeneous system design. In Proceeding of the 20th IEEE International Symposium on High Performance Computer Architecture (HPCA) (Orlando, FL, USA, 2014), 154--164.Google ScholarGoogle ScholarCross RefCross Ref
  10. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI) (Boston, MA, USA, 2011), 295--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Liu, Z., Wierman, A., Chen, Y., Razon, B., Chen, N. Data center demand response: Avoiding the coincident peak via workload shifting and local generation. Perform. Eval. 70, 10 (2013), 770--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Llull, Q., Fan, S., Zahedi, S.M., Lee, B.C. Cooper: Task colocation with cooperative games. In Proceedings of the 23rd IEEE International Symposium on High-Performance Computer Architecture (HPCA) (Austin, TX, USA, 2017), 421--432.Google ScholarGoogle Scholar
  13. Raghavan, A., Emurian, L., Shao, L., Papaefthymiou, M., Pipe, K.P., Wenisch, T.F., Martin, M.M. Computational sprinting on a hardware/software testbed. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Houston, TX, USA, 2013), 155--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Raghavan, A., Luo, Y., Chandawalla, A., Papaefthymiou, M., Pipe, K.P., Wenisch, T.F., Martin, M.M.K. Computational sprinting. In Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture (HPCA) (New Orleans, LA, USA, 2012), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shao, L., Raghavan, A., Emurian, L., Papaefthymiou, M.C., Wenisch, T.F., Martin, M.M., Pipe, K.P. On-chip phase change heat sinks designed for computational sprinting. In Proceedings of the 30th Annual Semiconductor Thermal Measurement and Management Symposium (San Jose, CA, USA, 2014),29--34.Google ScholarGoogle ScholarCross RefCross Ref
  16. Skach, M., Arora, M., Hsu, C.-H., Li, Q., Tullsen, D., Tang, L., Mars, J. Thermal time shifting: Leveraging phase change materials to reduce cooling costs in warehouse-scale computers. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA) (Portland, OR, USA, 2015), 439--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Somu Muthukaruppan, T., Pathania, A., Mitra, T. Price theory based power management for heterogeneous multi-cores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Salt Lake City, UT, USA, 2014), 161--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wang, X., Chen, M., Lefurgy, C., Keller, T.W. Ship: A scalable hierarchical power control architecture for large-scale data centers. IEEE Trans. Parallel Distrib. Syst. 23, 1 (2012), 168--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zahedi, S.M., Lee, B.C. Sharing incentives and fair division for multiprocessors. IEEE Micro 35, 3 (2015), 92--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zahedi, S.M., Llull, Q., Lee, B.C. Amdahl's Law in the datacenter era: A market for fair processor allocation. In Proceedings of the 24rd IEEE International Symposium on High-Performance Computer Architecture (HPCA) (Vienna, Austria, 2018).Google ScholarGoogle ScholarCross RefCross Ref
  21. Zheng, W., Wang, X. Data center sprinting: Enabling computational sprinting at the data center level. In Proceedings of the 35th International Conference on Distributed Computing Systems (ICDCS) (Columbus, OH, USA, 2015), 175--184.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Distributed strategies for computational sprints

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Communications of the ACM
        Communications of the ACM  Volume 62, Issue 2
        February 2019
        112 pages
        ISSN:0001-0782
        EISSN:1557-7317
        DOI:10.1145/3310134
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 January 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)5,286
        • Downloads (Last 6 weeks)40

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format