ABSTRACT
Power and energy consumption are now key design concerns in HPC. To develop software that meets power and energy constraints, scientific application developers must have a reliable way to measure these values and relate them to application-specific events. Scientists face two challenges when measuring and controlling power: (1) diversity---power and energy measurement interfaces differ between vendors---and (2) distribution---power measurements of MPI simulations should be unaffected by the mapping of MPI processes to physical hardware nodes. While some prior work defines standardized software interfaces for power management, these efforts do not support distributed environments. The result is that the current state-of-the-art requires scientists interested in power optimization to write tedious, error-prone application-and system-specific code. To make power measurement and management easier for scientists, we propose PoLiMEr, a user-space library that supports fine-grained application-level power monitoring and capping. We evaluate PoLiMEr by deploying it on Argonne National Laboratory's Theta system and using it to measure and cap power, scaling the performance and power of several applications on up to 1024 nodes. We find that PoLiMEr requires only a few additional lines of code, but easily allows users to detect energy anomalies, apply power caps, and evaluate Theta's unique architectural features.
- Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing Lusk, and Rajeev Thakur. 2010. PMI: A scalable parallel process-management interface for extreme-scale systems. Recent Advances in the Message Passing Interface (2010), 31--41.Google Scholar
- Arnaldo Carvalho de Melo. 2010. The new linux perf tools. In Slides from Linux Kongress, Vol. 18.Google Scholar
- Spencer Desrochers, Chad Paradis, and Vincent M Weaver. 2016. A Validation of DRAM RAPL Power Measurements. In Proceedings of the Second International Symposium on Memory Systems. ACM, 455--470. Google ScholarDigital Library
- Jonathan Eastep, Steve Sylvester, Christopher Cantalupo, Federico Ardanaz, Brad Geltz, Asma Al-Rawi, Fuat Keceli, and Kelly Livingston. 2016. Global extensible open power manager: a vehicle for HPC community collaboration toward co-designed energy management solutions. Supercomputing PMBS (2016).Google Scholar
- Vladimir Getov, Darren J. Kerbyson, Matt Macduff, and Adolfy Hoisie. 2015. Towards an Application-specific Thermal Energy Model of Current Processors. In Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15). ACM, New York, NY, USA, Article 5, 10 pages. https://doi.org/10.1145/2834800.2834805Google ScholarDigital Library
- R. E. Grant, M. Levenhagen, S. L. Olivier, D. DeBonis, K. T. Pedretti, and J. H. Laros III. 2016. Standardizing Power Monitoring and Control at Exascale. Computer 49, 10 (Oct 2016), 38--46. https://doi.org/10.1109/MC.2016.308 Google ScholarCross Ref
- Connor Imes, Lars Bergstrom, and Henry Hoffmann. 2016. A portable interface for runtime energy monitoring. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 968--974. Google ScholarDigital Library
- Intel. 2015. 64 and IA-32 Architectures Software Developer's Manual. Volume 3A: System Programming Guide, Part (2015).Google Scholar
- LAMMPS. 2004. LAMMPS WWW Site. http://lammps.sandia. gov. (2004). Accessed: August, 2017.Google Scholar
- James H Laros, Phil Pokorny, and David DeBonis. 2013. Powerinsight-a commodity power measurement capability. In Green Computing Conference (IGCC), 2013 International. IEEE, 1--6.Google ScholarCross Ref
- Gary Lawson, Vaibhav Sundriyal, Masha Sosonkina, and Yuzhong Shen. 2016. Runtime power limiting of parallel applications on Intel Xeon Phi processors. In Proceedings of the 4th International Workshop on Energy Efficient Supercomputing. IEEE Press, 39--45. Google ScholarCross Ref
- SJ Martin, D Rush, and M Kappel. 2015. Cray advanced platform monitoring and control (CAPMC). In Proc. Cray Users' Group Technical Conference (CUG).Google Scholar
- S Martin, D Rush, M Kappel, M Sandstedt, and J Williams. 2016. Cray XC40 Power Monitoring and Control for Knights Landing. Proceedings of the Cray User Group (CUG) (2016).Google Scholar
- Philip J Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A portable interface to hardware performance counters. In Proceedings of the department of defense HPCMP users group conference, Vol. 710.Google Scholar
- Scott Parker, Vitali Morozov, Sudheer Chunduri, Kevin Harms, Chris Knight, and Kalyan Kumaran. 2017. Early Evaluation of the Cray XC40 Xeon Phi System Theta at Argonne. Cray User Group 2017 proceedings (2017).Google Scholar
- Kevin Pedretti, Stephen L. Olivier, Kurt B. Ferreira, Galen Shipman, and Wei Shu. 2015. Early Experiences with Node-level Power Capping on the Cray XC40 Platform. In Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing (E2SC '15). ACM, New York, NY, USA, Article 1, 10 pages. https://doi.org/10.1145/2834800.2834801Google ScholarDigital Library
- Steve Plimpton. 1995. Fast parallel algorithms for short-range molecular dynamics. Journal of computational physics 117, 1 (1995), 1--19. Google ScholarDigital Library
- Kathleen Shoga, Barry Rountree, Martin Schulz, and Jeff Shafer. 2014. Whitelisting MSRs with msr-safe. In 3rd Workshop on Exascale Systems Programming Tools, in conjunction with SC14.Google Scholar
- Sean Wallace, Venkatram Vishwanath, Susan Coghlan, Zhiling Lan, and Michael E Papka. 2015. Comparison of vendor supplied environmental data collection mechanisms. In Cluster Computing (CLUSTER), 2015 IEEE International Conference on. IEEE, 690--697.Google ScholarDigital Library
- Sean Wallace, Zhou Zhou, Venkatram Vishwanath, Susan Coghlan, John Tramm, Zhiling Lan, and Michael E Papka. 2016. Application power profiling on IBM Blue Gene/Q. Parallel Comput. 57 (2016), 73--86. Google ScholarDigital Library
- Vincent M Weaver, Matt Johnson, Kiran Kasichayanula, James Ralph, Piotr Luszczek, Dan Terpstra, and Shirley Moore. 2012. Measuring energy and power with PAPI. In Parallel Processing Workshops (ICPPW), 2012 41st International Conference on. IEEE, 262--268.Google ScholarDigital Library
Index Terms
- PoLiMEr: An Energy Monitoring and Power Limiting Interface for HPC Applications
Recommendations
Benefits in Relaxing the Power Capping Constraint
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC SystemsIn this manuscript we evaluate the impact of HW power capping mechanisms on a real scientific application composed by parallel execution. By comparing HW capping mechanism against static frequency allocation schemes we show that a speed up can be ...
Power consumption evaluation of an MHD simulation with CPU power capping
CCGRID '14: Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid ComputingRecently to achieve the Exa-flops next generation computer system, the power consumption becomes the important issue. On the other hand, the power consumption character of application program is not so considered now. In this study we examine the power ...
Dynamic Application-aware Power Capping
E2SC'17: Proceedings of the 5th International Workshop on Energy Efficient SupercomputingA future large-scale high-performance computing (HPC) cluster will likely be power capped since the surrounding infrastructure like power supply and cooling is constrained. For such a cluster, it may be impossible to supply thermal design power (TDP) to ...
Comments