ABSTRACT
Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.
- L. Chacón, "A non-staggered, conservative, finite-volume scheme for 3D implicit extended magnetohydrodynamics in curvilinear geometries," Computer Physics Communications, vol. 163, no. 3, pp. 143--171, Nov. 2004.Google ScholarCross Ref
- C. S. Chang and S. Ku, "Spontaneous rotation sources in a quiescent tokamak edge plasma," Physics of Plasmas, vol. 15, no. 6, June 2008.Google Scholar
- J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo, "Terascale direct numerical simulations of turbulent combustion using S3D," Computational Science and Discovery, vol. 2, no. 1, Jan. 2009.Google Scholar
- S. Klasky, S. Ethier, Z. Lin, K. Martins, D. McCune, and R. Samtaney, "Grid-based parallel data streaming implemented for the Gyrokinetic Toroidal Code," in Proceedings of the 2003 ACM/IEEE conference on Supercomputing (SC '03), Phoenix, AZ, Nov. 2003. Google ScholarDigital Library
- D. Nowak and M. Seagar, "ASCI terascale simulation: requirements and deployments," in ASCI, Nov. 1999.Google Scholar
- Y. Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorski, Q. Liu, W. Yandong, and Y. Weikuan, "EDO: improving read performance for scientific applications through elastic data organization," in Proceedings of IEEE Cluster 2011, Austin, TX, Sep. 2011, pp. 93--102. Google ScholarDigital Library
- J. Lofstead, Z. Fang, S. Klasky, and K. Schwan, "Adaptable, metadata-rich I/O methods for portable high performance I/O," in Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS'09), Rome, Italy, May 2009. Google ScholarDigital Library
- J. Lofstead, F. Zheng, Q. Liu, S. Klasky, R. Oldfield, T. Kordenbrock, K. Schwan, and M. Wolf, "Managing variability in the I/O performance of petascale storage systems," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), Washington, DC, Nov. 2010. Google ScholarDigital Library
- D. A. Dillow, G. M. Shipman, H. S. Oral, Z. Zhange, D. Z. Zhang, and Y. Kim, "Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems," in Performance Computing and Communications Conference (IPCCC), Nov. 2011. Google ScholarDigital Library
- S. Oral, F. Wang, D. Dillow, G. Shipman, R. Miller, and O. Drokin, "Efficient object storage journaling in a distributed parallel file system," in Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST '10), San Jose, CA, Feb. 2010, pp. 143--154. Google ScholarDigital Library
- H. Shan, K. Antypas, and J. Shalf, "Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark," in Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC '08), Austin, TX, Nov. 2008. Google ScholarDigital Library
- Y. Kim, G. R., S. G. M., D. D. Z. Zhang, and B. Settlemyer, "Workload characterization of a leadership class storage cluster," in Petascale Data Workshop (PDSW), New Orleans, LA, Nov. 2010.Google Scholar
- A. Uselton, K. Antypas, D. M. Ushizima, and J. Sukharev, "File system monitoring as a window into user I/O requirements," in Proceedings of the 2010 Cray User Group Meeting, Edinburgh, Scotland, May 2010.Google Scholar
- B. Schroeder and G. A. Gibson, "A large-scale study of failures in high-performance computing systems," in Proceedings of the International Conference on Dependable Systems and Networks (DSN '06), Philadelphia, PA, June 2006, pp. 249--258. Google ScholarDigital Library
- S. Ku, C. S. Chang, M. Adams, J. Cummings, F. Hinton, D. Keyes, S. Klasky, W. Lee, Z. Lin, S. Parker, and the CPES team, "Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a tokamak plasma," Journal of Physics, vol. 46, no. 1, 2006.Google Scholar
- Y. Cui, K. B. Olsen, T. H. Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, D. K. Panda, A. Chourasia, J. Levesque, S. M. Day, and P. Maechling, "Scalable earthquake simulation on petascale supercomputers," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC'10, Washington, DC, Nov. 2010. Google ScholarDigital Library
- F. Wang, S. Oral, G. Shipman, O. Drokin, T. Wang, and I. Huang, "Understanding Lustre filesystem internals," Technical Report ORNL/TM-2009/117, Apr. 2009.Google ScholarCross Ref
- S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn, "Ceph: a scalable, high-performance distributed file system," in Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06), Seattle, WA, Nov. 2006, pp. 307--320. Google ScholarDigital Library
- N. P. Kronenberg, H. M. Levy, and W. D. Strecker, "VAXcluster: a closely-coupled distributed system," ACM Trans. Comput. Syst., vol. 4, no. 2, pp. 130--146, May 1986. Google ScholarDigital Library
- A. L. N. Reddy and P. Banerjee, "A study of I/O behavior of perfect benchmarks on a multiprocessor," SIGARCH Comput. Archit. News, vol. 18, no. 3a, pp. 312--321, May 1990. Google ScholarDigital Library
- G. R. Ganger, "Generating representative synthetic workloads: an unsolved problem," in in Proceedings of the Computer Measurement Group (CMG) Conference, Nashville, TN, Dec. 1995, pp. 1263--1269.Google Scholar
- N. Nieuwejaar, D. Kotz, A. Purakayastha, C. S. Ellis, and M. L. Best, "File-access characteristics of parallel scientific workloads," Parallel and Distributed Systems, IEEE Transactions on, vol. 7, no. 10, pp. 1075--1089, 1996. Google ScholarDigital Library
- R. Cypher, A. Ho, S. Konstantinidou, and P. Messina, "Architectural requirements of parallel scientific applications with explicit communication," SIGARCH Comput. Archit. News, vol. 21, no. 2, May 1993. Google ScholarDigital Library
- P. E. Crandall, R. A. Aydt, A. A. Chien, and D. A. Reed, "Input/Output characteristics of scalable parallel applications," in Proceedings of the 1995 ACM/IEEE conference on Supercomputing (SC '95), San Diego, CA, Dec. 1995, pp. 59--89. Google ScholarDigital Library
- P. H. Carns, R. Latham, R. B. Ross, K. Iskra, S. Lang, and K. Riley, "24/7 characterization of petascale I/O workloads," in IEEE International Conference on Cluster Computing (Cluster '09), New Orleans, LA, Sep. 2009.Google Scholar
- P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross, "Understanding and improving computational science storage access through continuous characterization," Trans. Storage, vol. 7, no. 3, Oct. 2011. Google ScholarDigital Library
- A. Uselton, M. Howison, N. Wright, D. Skinner, N. Keen, J. Shalf, K. Karavanic, and L. Oliker, "Parallel I/O performance: from events to ensembles," in IEEE International Symposium on Parallel and Distributed Processing (IPDPS'10)", Atlanta, GA, Apr. 2010.Google Scholar
- S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock, "I/O performance challenges at leadership scale," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09), Portland, OR, Nov. 2009. Google ScholarDigital Library
- H. Shan and J. Shalf, "Using IOR to analyze the I/O performance for HPC platforms," in in Cray Users Group Meeting (CUG), Washington, DC, May 2007.Google Scholar
- Y. Kim, Gunasekaran, D. R. and Shipman, G. M. and Dillow, Z. Zhang, and B. Settlemyer, "Workload Characterization of a Leadership Class Storage Cluster," in Petascale Data Storage Workshop (PDSW), Nov 2010.Google Scholar
- Characterizing output bottlenecks in a supercomputer
Recommendations
Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications
Usenix Fast 2019 Special Section and Regular PapersThis article studies the I/O write behaviors of the Titan supercomputer and its Lustre parallel file stores under production load. The results can inform the design, deployment, and configuration of file systems along with the design of I/O software in ...
Characterizing output bottlenecks in a supercomputer
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and AnalysisSupercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below ...
Comments