ABSTRACT
To ensure peak utilization of hardware resources, as well as handle the increasingly dynamic demands placed on its render farm infrastructure, WETA Digital developed custom queuing, scheduling, job description and submission systems - which work in concert to maximize the available cores across a large range of non-uniform task types.
The render farm is one of the most important, high traffic components of a modern VFX pipeline. Beyond the hardware itself a render farm requires careful management and maintenance to ensure it is operating at peak efficiency. In WETAs case this hardware consists of a mix of over 80,000 CPU cores and a number of GPU resources, and as this has grown it has introduced many interesting scalability challenges.
In this talk we aim to present our end-to-end solutions in the render farm space, from the structure of the resource and the inherent problems introduced at this scale, through the development of Plow - our management, queuing and monitoring software. Finally we will detail the deployment process and production benefits realized. Within each section we intend to present the scalability issues encountered, and detail our strategy, process and results in solving these problems. The ever increasing complexity and computational demands of modern VFX drives WETAs need to innovate in all areas, from surfacing, rendering and simulation but also to core pipeline infrastructure.
Supplemental Material
Index Terms
- Large scale VFX pipelines
Recommendations
Large scale VFX pipelines
DigiPro '16: Proceedings of the 2016 Symposium on Digital ProductionTo ensure peak utilization of hardware resources, as well as handle the increasingly dynamic demands placed on its render farm infrastructure, Weta Digital developed custom queuing, scheduling, job description and submission systems - which work in ...
Task-based FMM for heterogeneous architectures
High performance fast multipole method is crucial for the numerical simulation of many physical problems. In a previous study, we have shown that task-based fast multipole method provides the flexibility required to process a wide spectrum of particle ...
A scalable queue for work distribution on GPUs
PPoPP '18Harnessing the power of massively parallel devices like the graphics processing unit (GPU) is difficult for algorithms that show dynamic or inhomogeneous workloads. To achieve high performance, such advanced algorithms require scalable, concurrent ...
Comments