ABSTRACT
With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.
- T. Tu, H. Yu, L. Ramirez-Guzmanz, J. Bielak, O. Ghattas, K.-L. Ma, and D. R. O'Hallaron, "From Mesh Generation to Scientific Visualization: An End-to-End Approach to Parallel Supercomputing," in Proceedings of ACM/IEEE Supercomputing Conference, 2006. Google ScholarDigital Library
- H. Yu, T. Tu, J. Bielak, O. Ghattas, J. C. López, K.-L. Ma, D. R. O'Hallaron, L. Ramirez-Guzmanz, N. Stone, R. Taborda-Rios, and J. Urbanic, "Remote Runtime Steering of Integrated Terascale Simulation and Visualization," in ACM/IEEE Supercomputing Conference HPC Analytics Challenge, 2006. Google ScholarDigital Library
- H. Yu, C. Wang, R. Grout, J. Chen, and K.-L. Ma, "In Situ Visualization for Large-Scale Combustion Simulations," IEEE Computer Graphics and Applications, vol. 30, no. 3, pp. 45--57, 2010. Google ScholarDigital Library
- J.-M. F. Brad Whitlock and J. S. Meredith, "Parallel In Situ Coupling of Simulation with a Fully Featured Visualization System," in Proc. of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV'11), April 2011. Google ScholarDigital Library
- N. Fabian, K. Moreland, D. Thompson, A. Bauer, P. Marion, B. Gevecik, M. Rasquin, and K. Jansen, "The paraview coprocessing library: A scalable, general purpose in situ visualization library," in Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV), October 2011, pp. 89--96.Google Scholar
- S. Lakshminarasimhan, J. Jenkins, I. Arkatkar, Z. Gong, H. Kolla, S.-H. Ku, S. Ethier, J. Chen, C. Chang, S. Klasky, R. Latham, R. Ross, and N. Samatova, "Isabela-qa: Query-driven analytics with isabela-compressed extreme-scale scientific data," in Proc. of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), November 2011, pp. 1--11. Google ScholarDigital Library
- M. Li, S. S. Vazhkudai, A. R. Butt, F. Meng, X. Ma, Y. Kim, C. Engelmann, and G. Shipman, "Functional partitioning to optimize end-to-end performance on many-core architectures," in Proc. of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2010, pp. 1--12. Google ScholarDigital Library
- F. Zhang, C. Docan, M. Parashar, S. Klasky, N. Podhorszki, and H. Abbasi, "Enabling in-situ execution of coupled scientific workflow on multi-core platform," in Proc. 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS'12), 2012. Google ScholarDigital Library
- H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng, "Datastager: scalable data staging services for petascale applications," in Proc. of 18th International Symposium on High Performance Distributed Computing (HPDC'09), 2009. Google ScholarDigital Library
- F. Zheng, H. Abbasi, C. Docan, J. Lofstead, S. Klasky, Q. Liu, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf, "PreDatA - preparatory data analytics on peta-scale machines," in Proc. of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS'10), April 2010.Google Scholar
- H. Abbasi, G. Eisenhauer, M. Wolf, K. Schwan, and S. Klasky, "Just In Time: Adding Value to The IO Pipelines of High Performance Applications with JITStaging," in Proc. of 20th International Symposium on High Performance Distributed Computing (HPDC'11), June 2011. Google ScholarDigital Library
- C. Docan, M. Parashar, and S. Klasky, "DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows," in Proc. of 19th International Symposium on High Performance and Distributed Computing (HPDC'10), June 2010. Google ScholarDigital Library
- C. Docan, M. Parashar, J. Cummings, and S. Klasky, "Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces," in Proc. of 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS'11), May 2011. Google ScholarDigital Library
- V. Vishwanath, M. Hereld, and M. Papka, "Toward simulation-time data analysis and i/o acceleration on leadership-class systems," in Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV), October 2011.Google Scholar
- A. Globus, "A Software Model for Visualization of Time Dependent 3-D Computational Fluid Dynamics Results," NAS Applied Research, NASA Ames Research Center, Tech. Rep. RNR 92--031, 1992.Google Scholar
- K.-L. Ma, "Runtime Volume Visualization of Parallel CFD," in Proceedings of Parallel CFD Conference, 1995, pp. 307--314.Google Scholar
- J. Rowlan, E. Lent, N. Gokhale, and S. Bradshaw, "A Distributed, Parallel, Interactive Volume Rendering Package," in Proceedings of IEEE Visualization Conference, 1994, pp. 21--30. Google ScholarDigital Library
- S. G. Parker and C. R. Johnson, "SCIRun: A Scientific Programming Environment for Computational Steering," in Proceedings of ACM/IEEE Supercomputing Conference, 1995. Google ScholarDigital Library
- "The R project for statistical computing," http://www.r-project.org/.Google Scholar
- M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney, and U. Mansmann, "State-of-the-art in parallel computing with R," Department of Statistics, University of Munich, Tech. Rep. 47, 2009.Google Scholar
- J. Bennett, P. Pébay, D. Roe, and D. Thompson, "Numerically stable, single-pass, parallel statistics algorithms," in Proc. 2009 IEEE International Conference on Cluster Computing, New Orleans, LA, Aug. 2009.Google Scholar
- P. P. Pébay, D. C. Thompson, and J. Bennett, "Computing contingency statistics in parallel: Design trade-offs and limiting cases," in CLUSTER. IEEE, 2010, pp. 156--165. {Online}. Available: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5599992 Google ScholarDigital Library
- P. P. Pébay, D. C. Thompson, J. Bennett, and A. Mascarenhas, "Design and performance of a scalable, parallel statistics toolkit," in IPDPS Workshops. IEEE, 2011, pp. 1475--1484. {Online}. Available: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6008655 Google ScholarDigital Library
- "VTK Doxygen documentation," http://www.vtk.org/doc/nightly/html.Google Scholar
- A. Mascarenhas, R. W. Grout, P.-T. Bremer, E. R. Hawkes, V. Pascucci, and J. H. Chen, "Topological feature extraction for comparison of terascale combustion simulation data," in Topological Methods in Data Analysis and Visualization, ser. Mathematics and Visualization, V. Pascucci, X. Tricoche, H. Hagen, and J. Tierny, Eds. Springer Berlin Heidelberg, 2011, pp. 229--240.Google Scholar
- A. Mascarenhas and J. Snoeyink, "Isocontour based visualization of time-varying scalar fields," in Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration, ser. Mathematics and Visualization. Springer Berlin Heidelberg, 2009, pp. 41--68.Google Scholar
- P.-T. Bremer, G. H. Weber, V. Pascucci, M. S. Day, and J. B. Bell, "Analyzing and tracking burning structures in lean premixed hydrogen flames." IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 2, pp. 248--260, 2010. Google ScholarDigital Library
- D. Laney, P. T. Bremer, A. Mascarenhas, P. Miller, and V. Pascucci, "Understanding the structure of the turbulent mixing layer in hydrodynamic instabilities," IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 5, pp. 1053--1060, Sep. 2006. Google ScholarDigital Library
- A. Gyulassy, M. Duchaineau, V. Natarajan, V. Pascucci, E. Bringa, A. Higginbotham, and B. Hamann, "Topologically clean distance fields," IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1432--1439, 2007. Google ScholarDigital Library
- J. Bennett, V. Krishnamoorthy, S. Liu, R. Grout, E. R. Hawkes, J. H. Chen, J. Shepherd, V. Pascucci, and P.-T. Bremer, "Feature-based statistical analysis of combustion simulation data," IEEE Trans. Vis. Comp. Graph., vol. 17, no. 12, pp. 1822--1831, 2011. Google ScholarDigital Library
- G. Reeb, "Sur les points singuliers d'une forme de pfaff completement intergrable ou d'une fonction numerique {on the singular points of a complete integral pfaff form or of a numerical function}," Comptes Rendus Acad.Science Paris, vol. 222, pp. 847--849, 1946.Google Scholar
- H. Carr, J. Snoeyink, and U. Axen, "Computing contour trees in all dimensions," in Proc. of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM. New York, NY, USA: ACM Press, Jan. 2000, pp. 918--926. Google ScholarDigital Library
- H. Edelsbrunner, J. Harer, and A. Zomorodian, "Hierarchical Morse-Smale complexes for piecewise linear 2-manifolds," Discrete Computational Geometry, vol. 30, pp. 173--192, 2003.Google ScholarCross Ref
- P.-T. Bremer, H. Edelsbrunner, B. Hamann, and V. Pascucci, "Topological hierarchy for functions on triangulated surfaces," IEEE Transactions on Visualization and Computer Graphics, vol. 10, pp. 385--396, 2004. Google ScholarDigital Library
- A. Gyulassy, V. Natarajan, V. Pascucci, P.-T. Bremer, and B. Hamann, "A topological approach to simplification of three-dimensional scalar functions," IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 4, pp. 474--484, 2006. Google ScholarDigital Library
- V. Pascucci, G. Scorzelli, P.-T. Bremer, and A. Mascarenhas, "Robust on-line computation of Reeb graphs: simplicity and speed," ACM Trans. Graph., vol. 26, no. 3, Jul. 2007. Google ScholarDigital Library
- P.-T. Bremer, G. H. Weber, J. Tierny, V. Pascucci, M. S. Day, and J. B. Bell, "Interactive exploration and analysis of large-scale simulations using topology-based data segmentation," IEEE Transactions on Visualization and Computer Graphics, vol. 17, pp. 1307--1324, 2011. Google ScholarDigital Library
- V. Pascucci and K. Cole-McLaughlin, "Parallel computation of the topology of level sets," Algorithmica, vol. 38, pp. 249--268, 2003. Google ScholarDigital Library
- A. Gyulassy, P.-T. Bremer, B. Hamann, and V. Pascucci, "A practical approach to Morse-Smale complex computation: scalability and generality," IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 6, pp. 1619--1626, 2008. Google ScholarDigital Library
- A. Gyulassy, T. Peterka, R. Ross, and V. Pascucci, "The parallel computation of Morse-Smale complexes," IEEE International Parallel and Distributed Processing Symposium, to appear, 2012. Google ScholarDigital Library
- P.-T. Bremer, G. Weber, V. Pascucci, M. Day, and J. Bell, "Analyzing and tracking burning structures in lean premixed hydrogen flames," IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 2, pp. 248--260, 2010. Google ScholarDigital Library
- V. Pascucci, G. Scorzelli, P.-T. Bremer, and A. Mascarenhas, "Robust on-line computation of Reeb graphs: Simplicity and speed," ACM Transactions on Graphics, vol. 26, no. 3, pp. 58.1--58.9, 2007, proceedings of SIGGRAPH 2007. Google ScholarDigital Library
- P.-T. Bremer, G. Weber, J. Tierny, V. Pascucci, M. Day, and J. B. Bell, "Interactive exploration and analysis of large scale simulations using topology-based data segmentation," IEEE Trans. on Visualization and Computer Graphics, vol. 17, no. 99, 2010. Google ScholarDigital Library
- A. Mascarenhas, R. W. Grout, P.-T. Bremer, E. R. Hawkes, V. Pascucci, and J. Chen, Topological feature extraction for comparison of terascale combustion simulation data, ser. Mathematics and Visualization. Springer, 2011, pp. 229--240.Google ScholarCross Ref
- P.-T. Bremer, E. Brings, M. Duchaineau, A. Gyulassy, D. Laney, A. Mascarenhas, and V. Pascucci, "Topological feature extraction and tracking," Proceedings of SciDAC 2007 - Scientific Discovery Through Advanced Computing, vol. 78, pp. 012 032 (5pp), Journal of Physics Conference Series, 2007.Google Scholar
- S. Williams, M. Petersen, P.-T. Bremer, M. Hecht, V. Pascucci, J. Ahrens, M. Hlawitschka, and B. Hamann, "Adaptive extraction and quantification of atmospheric and oceanic vortices," IEEE Trans. Vis. Comp. Graph., vol. 17, no. 12, pp. 2088--2095, 2011. Google ScholarDigital Library
- V. Pascucci and K. Cole-McLaughlin, "Parallel computation of the topology of level sets," Algorithmica, vol. 38, no. 1, pp. 249--268, Oct. 2003. Google ScholarDigital Library
- J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," in OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, Dec. 2004. Google ScholarDigital Library
- "Dataspaces project," http://www.dataspaces.org/.Google Scholar
- C. Docan, M. Parashar, and S. Klasky, "Dart: a substrate for high speed asynchronous data io," in Proc. of 17th International Symposium on High Performance Distributed Computing (HPDC'08), 2008. Google ScholarDigital Library
- J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorski, R. Sankaran, S. Shende, and C. S. Yoo, "Terascale direct numerical simulations of turbulent combustion using s3d," Computational Science and Discovery, vol. 2, pp. 1--31, 2009.Google ScholarCross Ref
- C. S. Yoo, R. Sankaran, and J. H. Chen, "Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: Flame stabilization and structure," Journal of Fluid Mechanics, vol. 640, pp. 453--481, 2009.Google ScholarCross Ref
- Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
Recommendations
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and AnalysisWith the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-...
Improvements of the UltraScan scientific gateway to enable computational jobs on large-scale and open-standards based cyberinfrastructures
XSEDE '13: Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to DiscoveryThe UltraScan data analysis application is a software package that is able to take advantage of computational resources in order to support the interpretation of analytical ultracentrifugation (AUC) experiments. Since 2006, the UltraScan scientific ...
Comments