Abstract
Distributing a simulation across many machines can drastically speed up computations and increase detail. The computing cloud provides tremendous computing resources, but weak service guarantees force programs to manage significant system complexity: nodes, networks, and storage occasionally perform poorly or fail.
We describe Nimbus, a system that automatically distributes grid-based and hybrid simulations across cloud computing nodes. The main simulation loop is sequential code and launches distributed computations across many cores. The simulation on each core runs as if it is stand-alone: Nimbus automatically stitches these simulations into a single, larger one. To do this efficiently, Nimbus introduces a four-layer data model that translates between the contiguous, geometric objects used by simulation libraries and the replicated, fine-grain objects managed by its underlying cloud computing runtime.
Using PhysBAM particle-level set fluid simulations, we demonstrate that Nimbus can run higher detail simulations faster, distribute simulations on up to 512 cores, and run enormous simulations (10243 cells). Nimbus automatically manages these distributed simulations, balancing load across nodes and recovering from failures. Implementations of PhysBAM water and smoke simulations as well as an open source heat-diffusion simulation show that Nimbus is general and can support complex simulations.
Nimbus can be downloaded from https://nimbus.stanford.edu.
Supplemental Material
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). 265--283. http://dl.acm.org/citation.cfm?id=3026877.3026899 Google ScholarDigital Library
- Academy of Motion Picture Arts and Sciences. 2017. Oscar Sci-Tech Awards. Retrieved April 3, 2018, from http://www.oscars.org/sci-tech.Google Scholar
- Jérémie Allard and Bruno Raffin. 2005. A shader-based parallel rendering framework. In Proceedings of IEEE Visualization (VIS’05). IEEE, Los Alamitos, CA, 127--134.Google Scholar
- Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective straggler mitigation: Attack of the clones. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). 185--198. http://dl.acm.org/citation.cfm?id=2482626.2482645 Google ScholarDigital Library
- Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the outliers in map-reduce clusters using Mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10). 265--278. http://dl.acm.org/citation.cfm?id=1924943.1924962 Google ScholarDigital Library
- Jason Ansel, Kapil Arya, and Gene Cooperman. 2009. DMTCP: Transparent checkpointing for cluster computations and the desktop. In Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS’09). IEEE, Los Alamitos, CA, 1--12. Google ScholarDigital Library
- David C. Arney and Joseph E. Flaherty. 1990. An adaptive mesh-moving and local refinement method for time-dependent partial differential equations. ACM Trans. Math. Softw. 16, 1, 48--71. Google ScholarDigital Library
- Michael Edward Bauer. 2014. Legion: Programming Distributed Heterogeneous Architectures With Logical Regions. Ph.D. Dissertation.Google Scholar
- Gilbert Louis Bernstein, Chinmayee Shah, Crystal Lemire, Zachary Devito, Matthew Fisher, Philip Levis, and Pat Hanrahan. 2016. Ebb: A DSL for physical simulation on CPUs and GPUs. ACM Transactions on Graphics 35, 2, Article 21, 12 pages. Google ScholarDigital Library
- Umit V. Catalyurek, Erik G. Boman, Karen D. Devine, Doruk Bozdag, Robert Heaphy, and Lee Ann Riesen. 2007. Hypergraph-based dynamic load balancing for adaptive scientific computations. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS’07). IEEE, Los Alamitos, CA, 1--11.Google ScholarCross Ref
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph Von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices 40, 519--538. Google ScholarDigital Library
- NVIDIA Corporation. 2017. CUDA C Programming Guide. Retrieved April 3, 2018, from https://docs.nvidia.com/cuda/cuda-c-programming-guide/.Google Scholar
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). 1223--1231. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1, 107--113. Google ScholarDigital Library
- Steven J. Deitz, Bradford L. Chamberlain, and Lawrence Snyder. 2004. Abstractions for dynamic data distribution. In Proceedings of the 9th International Workshops on High-Level Parallel Programming Models and Supportive Environments. IEEE, Los Alamitos, CA, 42--51.Google ScholarCross Ref
- Tyler Denniston, Shoaib Kamil, and Saman Amarasinghe. 2016. Distributed Halide. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). ACM, New York, NY, Article 5, 12 pages. Google ScholarDigital Library
- Mathieu Desbrun and Marie-Paule Gascuel. 1996. Smoothed particles: A new paradigm for animating highly deformable bodies. In Computer Animation and Simulation’96. Springer, 61--76. Google ScholarDigital Library
- Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, and others. 2011. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’11). ACM, New York, NY, 9. Google ScholarDigital Library
- Sheng Di, Yves Robert, Frédéric Vivien, Derrick Kondo, Cho-Li Wang, and Franck Cappello. 2013. Optimization of cloud task processing with checkpoint-restart mechanism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’13). IEEE, Los Alamitos, CA, 1--12. Google ScholarDigital Library
- J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. 2009. Scalable work stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis (SC’09). ACM, New York, NY, Article 53, 11 pages. Google ScholarDigital Library
- Jens Dittrich and Jorge-Arnulfo Quiané-Ruiz. 2012. Efficient big data processing in Hadoop MapReduce. Proc. VLDB Endow. 5, 12, 2014--2015. Google ScholarDigital Library
- Pradeep Dubey, Pat Hanrahan, Ronald Fedkiw, Michael Lentine, and Craig Schroeder. 2011. PhysBAM: Physically based simulation. In Proceedings of the ACM SIGGRAPH 2011 Courses. ACM, New York, NY, 10. Google ScholarDigital Library
- R. Elliot English, Linhai Qiu, Yue Yu, and Ronald Fedkiw. 2013. Chimera grids for water simulation. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, New York, NY, 85--94. Google ScholarDigital Library
- Douglas Enright, Ronald Fedkiw, Joel Ferziger, and Ian Mitchell. 2002a. A hybrid particle level set method for improved interface capturing. J. Comput. Phys. 183, 1, 83--116. Google ScholarDigital Library
- Douglas Enright, Stephen Marschner, and Ronald Fedkiw. 2002b. Animation and rendering of complex water surfaces. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’02). ACM, New York, NY, 736--744. Google ScholarDigital Library
- Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, et al. 2006. Sequoia: Programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 83. Google ScholarDigital Library
- J. Davison de St Germain, John McCorquodale, Steven G. Parker, and Christopher R. Johnson. 2000. Uintah: A massively parallel problem solving environment. In Proceedings of the 9th International Symposium on High-Performance Distributed Computing. IEEE, Los Alamitos, CA, 33--41. Google ScholarDigital Library
- Sanjay Ghemawat and Jeff Dean. 2017. LevelDB. Retrieved April 3, 2018, from https://github.com/google/leveldb.Google Scholar
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google ScholarDigital Library
- Nolan Goodnight. 2007. CUDA/OpenGL Fluid Simulation. NVIDIA Corporation.Google Scholar
- Pat Hanrahan and Jim Lawson. 1990. A language for shading and lighting calculations. ACM SIGGRAPH Computer Graphics 24, 289--298. Google ScholarDigital Library
- Paul H. Hargrove and Jason C. Duell. 2006. Berkeley lab checkpoint/restart (BLCR)for Linux clusters. J. Phys. Conf. Ser. 46, 494.Google ScholarCross Ref
- Francis H. Harlow. 1962. The Particle-in-Cell Method for Numerical Solution of Problems in Fluid Dynamics. Technical Report. Los Alamos Scientific Laboratory, New Mexico.Google Scholar
- Christopher J. Hughes, Radek Grzeszczuk, Eftychios Sifakis, Daehyun Kim, Sanjeev Kumar, Andrew P. Selle, Jatin Chhugani, Matthew Holliman, and Yen-Kuang Chen. 2007. Physical simulation for animation and visual effects: Parallelization and characterization for chip multiprocessors. ACM SIGARCH Computer Architecture News 35, 220--231. Google ScholarDigital Library
- Greg Humphreys, Mike Houston, Ren Ng, Randall Frank, Sean Ahern, Peter D. Kirchner, and James T. Klosowski. 2002. Chromium: A stream-processing framework for interactive rendering on clusters. ACM Trans. Graphics 21, 3, 693--702. Google ScholarDigital Library
- Emmanuel Jeannot, Esteban Meneses, Guillaume Mercier, François Tessier, and Gengbin Zheng. 2013. Communication and topology-aware load balancing in Charm++ with TreeMatch. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’13). 1--8.Google ScholarCross Ref
- Chenfanfu Jiang, Craig Schroeder, Andrew Selle, Joseph Teran, and Alexey Stomakhin. 2015. The affine particle-in-cell method. ACM Trans. Graphics 34, 4, 51. Google ScholarDigital Library
- Laxmikant V. Kale and Sanjeev Krishnan. 1993. CHARM++: A Portable Concurrent Object Oriented System Based on C++. Vol. 28. ACM, New York, NY.Google Scholar
- Shoaib Kamil. 2017. StencilProbe: A Microbenchmark for Stencil Applications. Retrieved April 3, 2018, from http://people.csail.mit.edu/skamil/projects/stencilprobe/.Google Scholar
- George Karypis and Vipin Kumar. 1996. Parallel multilevel K-way partitioning scheme for irregular graphs. In Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (SC’96). IEEE, Los Alamitos, CA, Article 35. Google ScholarDigital Library
- Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David I. W. Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, et al. 2016. Simit: A language for physical simulation. ACM Trans. Graphics 35, 2, 20. Google ScholarDigital Library
- Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). ACM, New York, NY, 609--616. Google ScholarDigital Library
- Jonathan Lifflander, Sriram Krishnamoorthy, and Laxmikant V. Kale. 2012. Work stealing and persistence-based load balancers for iterative overdecomposed applications. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC’12). ACM, New York, NY, 137--148. Google ScholarDigital Library
- Frank Losasso, Frédéric Gibou, and Ron Fedkiw. 2004. Simulating water and smoke with an octree data structure. ACM Trans. Graphics 23, 457--462. Google ScholarDigital Library
- David Luebke. 2008. CUDA: Scalable parallel programming for high-performance scientific computing. In Proceedings of the 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE, Los Alamitos, CA, 836--838.Google ScholarCross Ref
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 135--146. Google ScholarDigital Library
- Omid Mashayekhi, Hang Qu, Chnimayee Shah, and Philip Levis. 2017. Execution templates: Caching control plane decisions for strong scaling of data analytics. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC’17). 513--526. https://www.usenix.org/conference/atc17/technical-sessions/presentation/mashayekhi. Google ScholarDigital Library
- Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A timely dataflow system. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 439--455. Google ScholarDigital Library
- Ken Museth, Jeff Lait, John Johanson, Jeff Budsberg, Ron Henderson, Mihai Alden, Peter Cucka, David Hill, and Andrew Pearce. 2013. OpenVDB: An open-source data structure and toolkit for high-resolution volumes. In Proceedings of the ACM SIGGRAPH 2013 Courses. ACM, New York, NY, 19. Google ScholarDigital Library
- Lionel M. Ni and Kai Hwang. 1985. Optimal load balancing in a multiple processor system with many job classes. IEEE Trans. Softw. Eng. 11, 5, 491--496. Google ScholarDigital Library
- Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, and VMware ICSI. 2015. Making sense of performance in data analytics frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI’15). 293--307. Google ScholarDigital Library
- Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 69--84. Google ScholarDigital Library
- Saket Patkar, Mridul Aanjaneya, Dmitriy Karpman, and Ronald Fedkiw. 2013. A hybrid Lagrangian-Eulerian formulation for bubble generation and dynamics. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, New York, NY, 105--114. Google ScholarDigital Library
- Joshua Peraza, Ananta Tiwari, Michael Laurenzano, Laura Carrington, William A. Ward, and Roy Campbell. 2013. Understanding the performance of stencil computations on Intel’s Xeon Phi. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’13). IEEE, Los Alamitos, CA, 1--5.Google Scholar
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48, 6, 519--530. Google ScholarDigital Library
- Jon Reisch, Stephen Marshall, Magnus Wrenninge, Tolga Göktekin, Michael Hall, Michael O’Brien, Jason Johnston, Jordan Rempel, and Andy Lin. 2016. Simulating rivers in the good dinosaur. In Proceedings of the ACM SIGGRAPH 2016 Talks. ACM, New York, NY, 40. Google ScholarDigital Library
- Gabriel Rivera and Chau-Wen Tseng. 2000. Tiling optimizations for 3D scientific computations. In Proceedings of the ACM/IEEE 2000 Conference on Supercomputing. IEEE, Los Alamitos, CA, 32--32. Google ScholarDigital Library
- Elliott Slaughter, Wonchan Lee, Sean Treichler, Michael Bauer, and Alex Aiken. 2015. Regent: A high-productivity programming language for HPC with logical regions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'15). ACM, New York, NY, 81. Google ScholarDigital Library
- Marc Snir. 1998. MPI—The Complete Reference: The MPI Core. Vol. 1. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Jos Stam. 1999. Stable fluids. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 121--128. Google ScholarDigital Library
- Matt Stanton, Ben Humberston, Brandon Kase, James F. O’Brien, Kayvon Fatahalian, and Adrien Treuille. 2014. Self-refining games using player analytics. ACM Trans. Graphics 33, 4, 73. Google ScholarDigital Library
- Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 117--128. Google ScholarDigital Library
- The Khronos Group. 2017a. OpenCL Overview. Retrieved April 3, 2018, from https://www.khronos.org/opencl/.Google Scholar
- The Khronos Group. 2017b. OpenGL. Retrieved April 3, 2018, from https://www.opengl.org/.Google Scholar
- Kashi Venkatesh Vishwanath and Nachiappan Nagappan. 2010. Characterizing cloud computing hardware reliability. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). ACM, New York, NY, 193--204. Google ScholarDigital Library
- William W. White. 2012. River Running Through It. Retrieved April 3, 2018, from https://www.cs.siue.edu/∼wwhite/SIGGRAPH/SIGGRAPH2012Itinerary.pdf.Google Scholar
- Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick. 2006. The potential of the cell processor for scientific computing. In Proceedings of the 3rd Conference on Computing Frontiers. ACM, New York, NY, 9--20. Google ScholarDigital Library
- John W. Young. 1974. A first order approximation to the optimum checkpoint interval. Commun. ACM 17, 9, 530--531. Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’16). 2. Google ScholarDigital Library
- Gengbin Zheng, Abhinav Bhatelé, Esteban Meneses, and Laxmikant V. Kalé. 2011. Periodic hierarchical load balancing for large supercomputers. Int. J. High Perform. Comput. Appl. 25, 4, 371--385. Google ScholarDigital Library
- Yongning Zhu and Robert Bridson. 2005. Animating sand as a fluid. ACM Trans. Graphics 24, 965--972. Google ScholarDigital Library
Index Terms
- Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud
Recommendations
Resource Allocation Scheme in Cloud Infrastructure
CUBE '13: Proceedings of the 2013 International Conference on Cloud & Ubiquitous Computing & Emerging TechnologiesCloud computing is a paradigm for large-scale distributed computing that makes use of existing technologies such as virtualization, service-orientation, and grid computing. In cloud environment, pool of virtual resources is always changing. Thus ...
Cloud service engineering
ICSE '10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2Building on compute and storage virtualization, Cloud Computing provides scalable, network-centric, abstracted IT infrastructure, platforms, and applications as on-demand services that are billed by consumption. Cloud Service Engineering is the ...
DevOps patterns to scale web applications using cloud services
SPLASH '13: Proceedings of the 2013 companion publication for conference on Systems, programming, & applications: software for humanityScaling a web applications can be easy for simple CRUD software running when you use Platform as a Service Clouds (PaaS). But if you need to deploy a complex software, with many components and a lot users, you will need have a mix of cloud services in ...
Comments