ABSTRACT
High Throughput Computing allows workloads of many thousands of tasks to be performed efficiently over many distributed resources and frees the user from the laborious process of managing task deployment, execution and result collection. However, in many cases the High Throughput Computing system is comprised from volunteer computational resources where tasks may be evicted by the owner of the resource. This has two main disadvantages. First, tasks may take longer to run as they may require multiple deployments before finally obtaining enough time on a resource to complete. Second, the wasted computation time will lead to wasted energy. We may be able to reduce the effect of the first disadvantage here by submitting multiple replicas of the task and take the results from the first one to complete. This, though, could lead to a significant increase in energy consumption. Thus we desire to only ever submit the minimum number of replicas required to run the task in the allocated time whilst simultaneously minimising energy. In this work we evaluate the use of fixed replica counts and Reinforcement Learning on the proportion of task which fail to finish in a given time-frame and the energy consumed by the system.
- David P Anderson. 2004. Boinc: A system for public-resource computing and storage Grid Computing, 2004. IEEE, 4--10. Google ScholarDigital Library
- David P. Anderson, Jeff Cobb, Eric Korpela, Matt Lebofsky, and Dan Werthimer. 2002. {email protected}: An Experiment in Public-resource Computing. Commun. ACM Vol. 45, 11 (Nov. 2002), 56--61. Google ScholarDigital Library
- Peter Bodík, Rean Griffith, Charles Sutton, Armando Fox, Michael Jordan, and David Patterson. 2009. Statistical Machine Learning Makes Automatic Control Practical for Internet Datacenters. In USENIX HotCloud. Article 12. http://dl.acm.org/citation.cfm?id=1855533.1855545Google Scholar
- M. Forshaw, A.S. McGough, and N. Thomas. 2016. HTC-Sim: a trace-driven simulation framework for energy consumption in high-throughput computing systems. Concurrency and Computation: Practice and Experience 28, 12 (2016), 3260-3290. Google ScholarDigital Library
- E. M. Heien, N. Fujimoto, and K. Hagihara. 2008. Computing low latency batches with unreliable workers in volunteer computing environments. In 2008 IEEE International Symposium on Parallel and Distributed Processing. 1-8.Google Scholar
- Derrick Kondo, Andrew A Chien, and Henri Casanova. 2004. Resource management for rapid application turnaround on enterprise desktop grids. In ACM/IEEE Supercomputing. Google ScholarDigital Library
- D. Kondo, B. Javadi, P. Malecot, F. Cappello, and D. P. Anderson. 2009. Cost-benefit analysis of Cloud Computing versus desktop grids. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1-12. Google ScholarDigital Library
- Eric J. Korpela. 2012. SETI@home, BOINC, and Volunteer Distributed Computing. Annual Review of Earth and Planetary Sciences 40, 1 (2012), 69-87.Google ScholarCross Ref
- Antonios Litke, Dimitrios Skoutas, Konstantinos Tserpes, and Theodora Varvarigou. 2007. Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments. Future Generation Computer Systems 23, 2 (2007), 163 - 178. Google ScholarDigital Library
- M. Litzkow, M. Livney, and M. W. Mutka. 1988. Condor-a hunter of idle workstations. In ICDCS.Google Scholar
- A.S. McGough, C. Gerrard, J. Noble, P. Robinson, and S. Wheater. 2011. Analysis of Power-Saving Techniques over a Large Multi-use Cluster. In IEEE DASC. Google ScholarDigital Library
- A. Stephen McGough and Matthew Forshaw. 2014. Reduction of wasted energy in a volunteer computing system through Reinforcement Learning. Sustainable Computing: Informatics and Systems 4, 4 (2014), 262 - 275.Google ScholarCross Ref
- Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Mingliang Liu, Yan Zhai, Wenguang Chen, and Weimin Zheng. 2013. Employing checkpoint to improve job scheduling in large-scale systems. In Job Scheduling Strategies for Parallel Processing. Springer, 36-55.Google Scholar
- R.S. Sutton and A.G. Barto. 1998. Reinforcement Learning: An Introduction. Bradford Book. Google ScholarDigital Library
- The Condor Team. 2010. Condor Manual. http://www.cs.wisc.edu/condor/manual/. (Oct. 2010). University of Wisconsin.Google Scholar
Index Terms
- Evaluation of Energy Consumption of Replicated Tasks in a Volunteer Computing Environment
Recommendations
Task and Server Assignment for Reduction of Energy Consumption in Datacenters
NCA '12: Proceedings of the 2012 IEEE 11th International Symposium on Network Computing and ApplicationsEnergy consumption of cloud data centers accounts for a major operational cost. This paper presents an optimization model for task scheduling to minimize task processing time and energy consumption in data centers for cloud computing. We formulate an ...
Energy consumption modeling for hybrid computing
Euro-Par'12: Proceedings of the 18th international conference on Parallel ProcessingEnergy efficiency is increasingly critical for embedded systems and mobile devices, where their continuous operation is based on battery life. In order to increase energy efficiency, chip manufacturers are developing heterogeneous CMP chips.
We present ...
A Simplified Method of Measurement of Energy Consumption in Cloud and Virtualized Environment
BDCLOUD '14: Proceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud ComputingMeasuring energy consumption is an essential step in the development of policies for the management of energy in every IT system. There is a wide range of methods using both hardware and software for measuring energy consumed by the system accurately. ...
Comments