skip to main content
10.5555/2388996.2389063acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

Published:10 November 2012Publication History

ABSTRACT

With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

References

  1. T. Tu, H. Yu, L. Ramirez-Guzmanz, J. Bielak, O. Ghattas, K.-L. Ma, and D. R. O'Hallaron, "From Mesh Generation to Scientific Visualization: An End-to-End Approach to Parallel Supercomputing," in Proceedings of ACM/IEEE Supercomputing Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Yu, T. Tu, J. Bielak, O. Ghattas, J. C. López, K.-L. Ma, D. R. O'Hallaron, L. Ramirez-Guzmanz, N. Stone, R. Taborda-Rios, and J. Urbanic, "Remote Runtime Steering of Integrated Terascale Simulation and Visualization," in ACM/IEEE Supercomputing Conference HPC Analytics Challenge, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Yu, C. Wang, R. Grout, J. Chen, and K.-L. Ma, "In Situ Visualization for Large-Scale Combustion Simulations," IEEE Computer Graphics and Applications, vol. 30, no. 3, pp. 45--57, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J.-M. F. Brad Whitlock and J. S. Meredith, "Parallel In Situ Coupling of Simulation with a Fully Featured Visualization System," in Proc. of 11th Eurographics Symposium on Parallel Graphics and Visualization (EGPGV'11), April 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Fabian, K. Moreland, D. Thompson, A. Bauer, P. Marion, B. Gevecik, M. Rasquin, and K. Jansen, "The paraview coprocessing library: A scalable, general purpose in situ visualization library," in Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV), October 2011, pp. 89--96.Google ScholarGoogle Scholar
  6. S. Lakshminarasimhan, J. Jenkins, I. Arkatkar, Z. Gong, H. Kolla, S.-H. Ku, S. Ethier, J. Chen, C. Chang, S. Klasky, R. Latham, R. Ross, and N. Samatova, "Isabela-qa: Query-driven analytics with isabela-compressed extreme-scale scientific data," in Proc. of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), November 2011, pp. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Li, S. S. Vazhkudai, A. R. Butt, F. Meng, X. Ma, Y. Kim, C. Engelmann, and G. Shipman, "Functional partitioning to optimize end-to-end performance on many-core architectures," in Proc. of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2010, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Zhang, C. Docan, M. Parashar, S. Klasky, N. Podhorszki, and H. Abbasi, "Enabling in-situ execution of coupled scientific workflow on multi-core platform," in Proc. 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS'12), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng, "Datastager: scalable data staging services for petascale applications," in Proc. of 18th International Symposium on High Performance Distributed Computing (HPDC'09), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Zheng, H. Abbasi, C. Docan, J. Lofstead, S. Klasky, Q. Liu, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf, "PreDatA - preparatory data analytics on peta-scale machines," in Proc. of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS'10), April 2010.Google ScholarGoogle Scholar
  11. H. Abbasi, G. Eisenhauer, M. Wolf, K. Schwan, and S. Klasky, "Just In Time: Adding Value to The IO Pipelines of High Performance Applications with JITStaging," in Proc. of 20th International Symposium on High Performance Distributed Computing (HPDC'11), June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Docan, M. Parashar, and S. Klasky, "DataSpaces: An Interaction and Coordination Framework for Coupled Simulation Workflows," in Proc. of 19th International Symposium on High Performance and Distributed Computing (HPDC'10), June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Docan, M. Parashar, J. Cummings, and S. Klasky, "Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces," in Proc. of 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS'11), May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Vishwanath, M. Hereld, and M. Papka, "Toward simulation-time data analysis and i/o acceleration on leadership-class systems," in Proc. of IEEE Symposium on Large Data Analysis and Visualization (LDAV), October 2011.Google ScholarGoogle Scholar
  15. A. Globus, "A Software Model for Visualization of Time Dependent 3-D Computational Fluid Dynamics Results," NAS Applied Research, NASA Ames Research Center, Tech. Rep. RNR 92--031, 1992.Google ScholarGoogle Scholar
  16. K.-L. Ma, "Runtime Volume Visualization of Parallel CFD," in Proceedings of Parallel CFD Conference, 1995, pp. 307--314.Google ScholarGoogle Scholar
  17. J. Rowlan, E. Lent, N. Gokhale, and S. Bradshaw, "A Distributed, Parallel, Interactive Volume Rendering Package," in Proceedings of IEEE Visualization Conference, 1994, pp. 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. G. Parker and C. R. Johnson, "SCIRun: A Scientific Programming Environment for Computational Steering," in Proceedings of ACM/IEEE Supercomputing Conference, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. "The R project for statistical computing," http://www.r-project.org/.Google ScholarGoogle Scholar
  20. M. Schmidberger, M. Morgan, D. Eddelbuettel, H. Yu, L. Tierney, and U. Mansmann, "State-of-the-art in parallel computing with R," Department of Statistics, University of Munich, Tech. Rep. 47, 2009.Google ScholarGoogle Scholar
  21. J. Bennett, P. Pébay, D. Roe, and D. Thompson, "Numerically stable, single-pass, parallel statistics algorithms," in Proc. 2009 IEEE International Conference on Cluster Computing, New Orleans, LA, Aug. 2009.Google ScholarGoogle Scholar
  22. P. P. Pébay, D. C. Thompson, and J. Bennett, "Computing contingency statistics in parallel: Design trade-offs and limiting cases," in CLUSTER. IEEE, 2010, pp. 156--165. {Online}. Available: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5599992 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. P. Pébay, D. C. Thompson, J. Bennett, and A. Mascarenhas, "Design and performance of a scalable, parallel statistics toolkit," in IPDPS Workshops. IEEE, 2011, pp. 1475--1484. {Online}. Available: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=6008655 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. "VTK Doxygen documentation," http://www.vtk.org/doc/nightly/html.Google ScholarGoogle Scholar
  25. A. Mascarenhas, R. W. Grout, P.-T. Bremer, E. R. Hawkes, V. Pascucci, and J. H. Chen, "Topological feature extraction for comparison of terascale combustion simulation data," in Topological Methods in Data Analysis and Visualization, ser. Mathematics and Visualization, V. Pascucci, X. Tricoche, H. Hagen, and J. Tierny, Eds. Springer Berlin Heidelberg, 2011, pp. 229--240.Google ScholarGoogle Scholar
  26. A. Mascarenhas and J. Snoeyink, "Isocontour based visualization of time-varying scalar fields," in Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration, ser. Mathematics and Visualization. Springer Berlin Heidelberg, 2009, pp. 41--68.Google ScholarGoogle Scholar
  27. P.-T. Bremer, G. H. Weber, V. Pascucci, M. S. Day, and J. B. Bell, "Analyzing and tracking burning structures in lean premixed hydrogen flames." IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 2, pp. 248--260, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Laney, P. T. Bremer, A. Mascarenhas, P. Miller, and V. Pascucci, "Understanding the structure of the turbulent mixing layer in hydrodynamic instabilities," IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 5, pp. 1053--1060, Sep. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Gyulassy, M. Duchaineau, V. Natarajan, V. Pascucci, E. Bringa, A. Higginbotham, and B. Hamann, "Topologically clean distance fields," IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1432--1439, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Bennett, V. Krishnamoorthy, S. Liu, R. Grout, E. R. Hawkes, J. H. Chen, J. Shepherd, V. Pascucci, and P.-T. Bremer, "Feature-based statistical analysis of combustion simulation data," IEEE Trans. Vis. Comp. Graph., vol. 17, no. 12, pp. 1822--1831, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Reeb, "Sur les points singuliers d'une forme de pfaff completement intergrable ou d'une fonction numerique {on the singular points of a complete integral pfaff form or of a numerical function}," Comptes Rendus Acad.Science Paris, vol. 222, pp. 847--849, 1946.Google ScholarGoogle Scholar
  32. H. Carr, J. Snoeyink, and U. Axen, "Computing contour trees in all dimensions," in Proc. of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, ACM. New York, NY, USA: ACM Press, Jan. 2000, pp. 918--926. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Edelsbrunner, J. Harer, and A. Zomorodian, "Hierarchical Morse-Smale complexes for piecewise linear 2-manifolds," Discrete Computational Geometry, vol. 30, pp. 173--192, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  34. P.-T. Bremer, H. Edelsbrunner, B. Hamann, and V. Pascucci, "Topological hierarchy for functions on triangulated surfaces," IEEE Transactions on Visualization and Computer Graphics, vol. 10, pp. 385--396, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Gyulassy, V. Natarajan, V. Pascucci, P.-T. Bremer, and B. Hamann, "A topological approach to simplification of three-dimensional scalar functions," IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 4, pp. 474--484, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. V. Pascucci, G. Scorzelli, P.-T. Bremer, and A. Mascarenhas, "Robust on-line computation of Reeb graphs: simplicity and speed," ACM Trans. Graph., vol. 26, no. 3, Jul. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P.-T. Bremer, G. H. Weber, J. Tierny, V. Pascucci, M. S. Day, and J. B. Bell, "Interactive exploration and analysis of large-scale simulations using topology-based data segmentation," IEEE Transactions on Visualization and Computer Graphics, vol. 17, pp. 1307--1324, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Pascucci and K. Cole-McLaughlin, "Parallel computation of the topology of level sets," Algorithmica, vol. 38, pp. 249--268, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Gyulassy, P.-T. Bremer, B. Hamann, and V. Pascucci, "A practical approach to Morse-Smale complex computation: scalability and generality," IEEE Transactions on Visualization and Computer Graphics, vol. 14, no. 6, pp. 1619--1626, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Gyulassy, T. Peterka, R. Ross, and V. Pascucci, "The parallel computation of Morse-Smale complexes," IEEE International Parallel and Distributed Processing Symposium, to appear, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. P.-T. Bremer, G. Weber, V. Pascucci, M. Day, and J. Bell, "Analyzing and tracking burning structures in lean premixed hydrogen flames," IEEE Transactions on Visualization and Computer Graphics, vol. 16, no. 2, pp. 248--260, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. V. Pascucci, G. Scorzelli, P.-T. Bremer, and A. Mascarenhas, "Robust on-line computation of Reeb graphs: Simplicity and speed," ACM Transactions on Graphics, vol. 26, no. 3, pp. 58.1--58.9, 2007, proceedings of SIGGRAPH 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. P.-T. Bremer, G. Weber, J. Tierny, V. Pascucci, M. Day, and J. B. Bell, "Interactive exploration and analysis of large scale simulations using topology-based data segmentation," IEEE Trans. on Visualization and Computer Graphics, vol. 17, no. 99, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Mascarenhas, R. W. Grout, P.-T. Bremer, E. R. Hawkes, V. Pascucci, and J. Chen, Topological feature extraction for comparison of terascale combustion simulation data, ser. Mathematics and Visualization. Springer, 2011, pp. 229--240.Google ScholarGoogle ScholarCross RefCross Ref
  45. P.-T. Bremer, E. Brings, M. Duchaineau, A. Gyulassy, D. Laney, A. Mascarenhas, and V. Pascucci, "Topological feature extraction and tracking," Proceedings of SciDAC 2007 - Scientific Discovery Through Advanced Computing, vol. 78, pp. 012 032 (5pp), Journal of Physics Conference Series, 2007.Google ScholarGoogle Scholar
  46. S. Williams, M. Petersen, P.-T. Bremer, M. Hecht, V. Pascucci, J. Ahrens, M. Hlawitschka, and B. Hamann, "Adaptive extraction and quantification of atmospheric and oceanic vortices," IEEE Trans. Vis. Comp. Graph., vol. 17, no. 12, pp. 2088--2095, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. V. Pascucci and K. Cole-McLaughlin, "Parallel computation of the topology of level sets," Algorithmica, vol. 38, no. 1, pp. 249--268, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," in OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. "Dataspaces project," http://www.dataspaces.org/.Google ScholarGoogle Scholar
  50. C. Docan, M. Parashar, and S. Klasky, "Dart: a substrate for high speed asynchronous data io," in Proc. of 17th International Symposium on High Performance Distributed Computing (HPDC'08), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorski, R. Sankaran, S. Shende, and C. S. Yoo, "Terascale direct numerical simulations of turbulent combustion using s3d," Computational Science and Discovery, vol. 2, pp. 1--31, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  52. C. S. Yoo, R. Sankaran, and J. H. Chen, "Three-dimensional direct numerical simulation of a turbulent lifted hydrogen jet flame in heated coflow: Flame stabilization and structure," Journal of Fluid Mechanics, vol. 640, pp. 453--481, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  1. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
          November 2012
          1161 pages
          ISBN:9781467308045

          Publisher

          IEEE Computer Society Press

          Washington, DC, United States

          Publication History

          • Published: 10 November 2012

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SC '12 Paper Acceptance Rate100of461submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader