research-article

Characterizing output bottlenecks in a supercomputer

Authors:
Bing Xie

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

,
Jeffrey Chase

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

,
David Dillow

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Oleg Drokin

Intel Corporation, Knoxville, TN

Intel Corporation, Knoxville, TN
View Profile

,
Scott Klasky

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Sarp Oral

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Norbert Podhorszki

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisNovember 2012Article No.: 8Pages 1–11

Published:10 November 2012Publication History

SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Pages 1–11

ABSTRACT

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic, contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.

References

L. Chacón, "A non-staggered, conservative, finite-volume scheme for 3D implicit extended magnetohydrodynamics in curvilinear geometries," Computer Physics Communications, vol. 163, no. 3, pp. 143--171, Nov. 2004.Google ScholarCross Ref
C. S. Chang and S. Ku, "Spontaneous rotation sources in a quiescent tokamak edge plasma," Physics of Plasmas, vol. 15, no. 6, June 2008.Google Scholar
J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo, "Terascale direct numerical simulations of turbulent combustion using S3D," Computational Science and Discovery, vol. 2, no. 1, Jan. 2009.Google Scholar
S. Klasky, S. Ethier, Z. Lin, K. Martins, D. McCune, and R. Samtaney, "Grid-based parallel data streaming implemented for the Gyrokinetic Toroidal Code," in Proceedings of the 2003 ACM/IEEE conference on Supercomputing (SC '03), Phoenix, AZ, Nov. 2003. Google ScholarDigital Library
D. Nowak and M. Seagar, "ASCI terascale simulation: requirements and deployments," in ASCI, Nov. 1999.Google Scholar
Y. Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorski, Q. Liu, W. Yandong, and Y. Weikuan, "EDO: improving read performance for scientific applications through elastic data organization," in Proceedings of IEEE Cluster 2011, Austin, TX, Sep. 2011, pp. 93--102. Google ScholarDigital Library
J. Lofstead, Z. Fang, S. Klasky, and K. Schwan, "Adaptable, metadata-rich I/O methods for portable high performance I/O," in Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS'09), Rome, Italy, May 2009. Google ScholarDigital Library
J. Lofstead, F. Zheng, Q. Liu, S. Klasky, R. Oldfield, T. Kordenbrock, K. Schwan, and M. Wolf, "Managing variability in the I/O performance of petascale storage systems," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC'10), Washington, DC, Nov. 2010. Google ScholarDigital Library
D. A. Dillow, G. M. Shipman, H. S. Oral, Z. Zhange, D. Z. Zhang, and Y. Kim, "Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems," in Performance Computing and Communications Conference (IPCCC), Nov. 2011. Google ScholarDigital Library
S. Oral, F. Wang, D. Dillow, G. Shipman, R. Miller, and O. Drokin, "Efficient object storage journaling in a distributed parallel file system," in Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST '10), San Jose, CA, Feb. 2010, pp. 143--154. Google ScholarDigital Library
H. Shan, K. Antypas, and J. Shalf, "Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark," in Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC '08), Austin, TX, Nov. 2008. Google ScholarDigital Library
Y. Kim, G. R., S. G. M., D. D. Z. Zhang, and B. Settlemyer, "Workload characterization of a leadership class storage cluster," in Petascale Data Workshop (PDSW), New Orleans, LA, Nov. 2010.Google Scholar
A. Uselton, K. Antypas, D. M. Ushizima, and J. Sukharev, "File system monitoring as a window into user I/O requirements," in Proceedings of the 2010 Cray User Group Meeting, Edinburgh, Scotland, May 2010.Google Scholar
B. Schroeder and G. A. Gibson, "A large-scale study of failures in high-performance computing systems," in Proceedings of the International Conference on Dependable Systems and Networks (DSN '06), Philadelphia, PA, June 2006, pp. 249--258. Google ScholarDigital Library
S. Ku, C. S. Chang, M. Adams, J. Cummings, F. Hinton, D. Keyes, S. Klasky, W. Lee, Z. Lin, S. Parker, and the CPES team, "Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a tokamak plasma," Journal of Physics, vol. 46, no. 1, 2006.Google Scholar
Y. Cui, K. B. Olsen, T. H. Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, D. K. Panda, A. Chourasia, J. Levesque, S. M. Day, and P. Maechling, "Scalable earthquake simulation on petascale supercomputers," in Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, ser. SC'10, Washington, DC, Nov. 2010. Google ScholarDigital Library
F. Wang, S. Oral, G. Shipman, O. Drokin, T. Wang, and I. Huang, "Understanding Lustre filesystem internals," Technical Report ORNL/TM-2009/117, Apr. 2009.Google ScholarCross Ref
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn, "Ceph: a scalable, high-performance distributed file system," in Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06), Seattle, WA, Nov. 2006, pp. 307--320. Google ScholarDigital Library
N. P. Kronenberg, H. M. Levy, and W. D. Strecker, "VAXcluster: a closely-coupled distributed system," ACM Trans. Comput. Syst., vol. 4, no. 2, pp. 130--146, May 1986. Google ScholarDigital Library
A. L. N. Reddy and P. Banerjee, "A study of I/O behavior of perfect benchmarks on a multiprocessor," SIGARCH Comput. Archit. News, vol. 18, no. 3a, pp. 312--321, May 1990. Google ScholarDigital Library
G. R. Ganger, "Generating representative synthetic workloads: an unsolved problem," in in Proceedings of the Computer Measurement Group (CMG) Conference, Nashville, TN, Dec. 1995, pp. 1263--1269.Google Scholar
N. Nieuwejaar, D. Kotz, A. Purakayastha, C. S. Ellis, and M. L. Best, "File-access characteristics of parallel scientific workloads," Parallel and Distributed Systems, IEEE Transactions on, vol. 7, no. 10, pp. 1075--1089, 1996. Google ScholarDigital Library
R. Cypher, A. Ho, S. Konstantinidou, and P. Messina, "Architectural requirements of parallel scientific applications with explicit communication," SIGARCH Comput. Archit. News, vol. 21, no. 2, May 1993. Google ScholarDigital Library
P. E. Crandall, R. A. Aydt, A. A. Chien, and D. A. Reed, "Input/Output characteristics of scalable parallel applications," in Proceedings of the 1995 ACM/IEEE conference on Supercomputing (SC '95), San Diego, CA, Dec. 1995, pp. 59--89. Google ScholarDigital Library
P. H. Carns, R. Latham, R. B. Ross, K. Iskra, S. Lang, and K. Riley, "24/7 characterization of petascale I/O workloads," in IEEE International Conference on Cluster Computing (Cluster '09), New Orleans, LA, Sep. 2009.Google Scholar
P. Carns, K. Harms, W. Allcock, C. Bacon, S. Lang, R. Latham, and R. Ross, "Understanding and improving computational science storage access through continuous characterization," Trans. Storage, vol. 7, no. 3, Oct. 2011. Google ScholarDigital Library
A. Uselton, M. Howison, N. Wright, D. Skinner, N. Keen, J. Shalf, K. Karavanic, and L. Oliker, "Parallel I/O performance: from events to ensembles," in IEEE International Symposium on Parallel and Distributed Processing (IPDPS'10)", Atlanta, GA, Apr. 2010.Google Scholar
S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock, "I/O performance challenges at leadership scale," in Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC'09), Portland, OR, Nov. 2009. Google ScholarDigital Library
H. Shan and J. Shalf, "Using IOR to analyze the I/O performance for HPC platforms," in in Cray Users Group Meeting (CUG), Washington, DC, May 2007.Google Scholar
Y. Kim, Gunasekaran, D. R. and Shipman, G. M. and Dillow, Z. Zhang, and B. Settlemyer, "Workload Characterization of a Leadership Class Storage Cluster," in Petascale Data Storage Workshop (PDSW), Nov 2010.Google Scholar

Characterizing output bottlenecks in a supercomputer

Recommendations

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications
Usenix Fast 2019 Special Section and Regular Papers

This article studies the I/O write behaviors of the Titan supercomputer and its Lustre parallel file stores under production load. The results can inform the design, deployment, and configuration of file systems along with the design of I/O software in ...
Read More
Characterizing output bottlenecks in a supercomputer
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below ...
Read More
Characterizing and optimizing parallel file systems
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2012
1161 pages
ISBN:9781467308045
General Chair:
Jeffrey K. Hollingsworth
University of Maryland
Sponsors
In-Cooperation
Publisher
IEEE Computer Society Press
Washington, DC, United States
Publication History
- Published: 10 November 2012
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SC '12 Paper Acceptance Rate100of461submissions,22%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 27
  Total Citations
  View Citations
- 455
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Characterizing output bottlenecks in a supercomputer

SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Recommendations

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications

Characterizing output bottlenecks in a supercomputer

Characterizing and optimizing parallel file systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Characterizing output bottlenecks in a supercomputer

SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

ABSTRACT

References

Cited By

Recommendations

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications

Characterizing output bottlenecks in a supercomputer

Characterizing and optimizing parallel file systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media