research-article

Public Access

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications

Authors:
Bing Xie

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN

0000-0002-5409-4378
View Profile

,
Sarp Oral

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Christopher Zimmer

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Jong Youl Choi

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
David Dillow

[email protected]

[email protected]
View Profile

,
Scott Klasky

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Jay Lofstead

Sandia National Laboratories, Eubank SE, Albuquerque, NM

Sandia National Laboratories, Eubank SE, Albuquerque, NM
View Profile

,
Norbert Podhorszki

Oak Ridge National Laboratory, Oak Ridge, TN

Oak Ridge National Laboratory, Oak Ridge, TN
View Profile

,
Jeffrey S. Chase

Duke University, Durham, NC

Duke University, Durham, NC
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 15 Issue 4Article No.: 26pp 1–39https://doi.org/10.1145/3335205

Published:16 January 2020Publication History

ACM Transactions on Storage

Abstract

This article studies the I/O write behaviors of the Titan supercomputer and its Lustre parallel file stores under production load. The results can inform the design, deployment, and configuration of file systems along with the design of I/O software in the application, operating system, and adaptive I/O libraries.

We propose a statistical benchmarking methodology to measure write performance across I/O configurations, hardware settings, and system conditions. Moreover, we introduce two relative measures to quantify the write-performance behaviors of hardware components under production load. In addition to designing experiments and benchmarking on Titan, we verify the experimental results on one real application and one real application I/O kernel, XGC and HACC IO, respectively. These two are representative and widely used to address the typical I/O behaviors of applications.

In summary, we find that Titan’s I/O system is variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O libraries—distributes the I/O load so that each target stores files for multiple clients and each client writes files on multiple targets in a balanced way with minimal contention. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good locations” in the machine or in the file system: component performance is driven by transient load conditions and past performance is not a useful predictor of future performance. For example, we do not observe diurnal load patterns that are predictable.

References

Argonne National Laboratory. 2018. Retrieved November 9, 2019 from Darshan: HPC I/O Characterization Tool. http://www.mcs.anl.gov/research/projects/darshan.Google Scholar
Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross. 2011. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage 7, 3, 8--26.Google ScholarDigital Library
Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, Samuel Lang, and Katherine Riley. 2009. 24/7 characterization of petascale I/O workloads. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’09). New Orleans, LA, 1--10.Google ScholarCross Ref
Luis Chacón. 2004. A non-staggered, conservative, finite-volume scheme for 3D implicit extended magnetohydrodynamics in curvilinear geometries. Computer Physics Communications 163, 3, 143--171.Google ScholarCross Ref
C. S. Chang and Susan Ku. 2008. Spontaneous rotation sources in a quiescent tokamak edge plasma. Physics of Plasmas 15, 6, 062510.Google ScholarCross Ref
J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. Liao, K. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. Yoo. 2009. Terascale direct numerical simulations of turbulent combustion using S3D. Computational Science 8 Discovery 2, 1, 015001.Google Scholar
Yanpei Chen, Kiran Srinivasan, Garth Goodson, and Randy Katz. 2011. Design implications for enterprise storage systems via multi-dimensional trace analysis. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP’11). Cascais, Portugal, 43--56.Google ScholarDigital Library
Y. Cui, K. Olsen, T. Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, D. K. Panda, A. Chourasia, J. Levesque, S. Day, and P. Maechling. 2010. Scalable earthquake simulation on petascale supercomputers. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10). Washington, DC. 1--20.Google Scholar
David A. Dillow, Galen M. Shipman, Sarp Oral, Zhe Zhang, and Youngjae Kim. 2011. Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems. In Proceedings of the 30th IEEE International Performance Computing and Communications Conference (IPCCC’11). Orlando, FL, 21--29.Google ScholarDigital Library
Matthieu Dorier, Shadi Ibrahim, Gabriel Antoniu, and Rob Ross. 2014. Omnisc’IO: A grammar-based approach to spatial and temporal I/O patterns prediction. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14). New Orleans, LA, 623--634.Google ScholarDigital Library
Matt Ezell, David Dillow, Sarp Oral, Feiyi Wang, Devesh Tiwari, Don Maxwell, Dustin Leverman, and Jason Hill. 2014. I/O router placement and fine-grained routing on Titan to support Spider II. In Proceedings of the Cray User Group Conference (CUG’14). Lugano, Switzerland, 1--6.Google Scholar
Youngjae Kim and Raghul Gunasekaran. 2014. Understanding I/O workload characteristics of a peta-scale storage system. The Journal of Supercomputing 71, 3, 761--780.Google ScholarDigital Library
Youngjae Kim, Raghul Gunasekaran, Galen M. Shipman, David A. Dillow, Zhe Zhang, and Bradley W. Settlemyer. 2010. Workload characterization of a leadership class storage cluster. In Proceedings of the 5th Petascale Data Storage Workshop (PDSW’10). New Orleans, LA, 1--5.Google Scholar
S. Klasky, S. Ethier, Z. Lin, K. Martins, D. McCune, and R. Samtaney. 2003. Grid-based parallel data streaming implemented for the Gyrokinetic Toroidal Code. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’03). Phoenix, AZ, 24--36.Google Scholar
Nancy P. Kronenberg, Henry M. Levy, and William D. Strecker. 1986. VAXcluster: A closely-coupled distributed system. ACM Transactions on Computer Systems 4, 2, 130--146.Google ScholarDigital Library
S. Ku, C. S. Chang, M. Adams, J. Cummings, F. Hinton, D. Keyes, S. Klasky, W. Lee, Z. Lin, S. Parker, and the CPES team. 2006. Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a tokamak plasma. Journal of Physics 46, 1, 87--91.Google Scholar
Julian Kunkel, Michaela Zimmer, and Eugen Betke. 2015. Predicting performance of non-contiguous I/O with machine learning. In Proceedings of the International Conference on High Performance Computing (ISC’15). Frankfurt, Germany, 257--273.Google ScholarCross Ref
S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock. 2009. I/O performance challenges at leadership scale. In Proceedings of the ACM/IEEE International Conference for High Performance Computing Networking, Storage and Analysis (SC’09). Portland, OR, 40--52.Google Scholar
Qing Liu, Jeremy Logan, Yuan Tian, Hasan Abbasi, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Roselyne Tchoua, Jay Lofstead, Ron Oldfield, et al. 2014. Hello ADIOS: The challenges and lessons of developing leadership class I/O frameworks. Concurrency and Computation: Practice and Experience 26, 7, 1453--1473.Google ScholarDigital Library
Jay Lofstead, Fang Zheng, Scott Klasky, and Karsten Schwan. 2009. Adaptable, metadata-rich I/O methods for portable high performance I/O. In Proceedings of the 23rd IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’09). Rome, Italy, 1--10.Google ScholarDigital Library
Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, and Matthew Wolf. 2010. Managing variability in the I/O performance of petascale storage systems. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10). Washington, DC, 1--12.Google ScholarDigital Library
Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A multiplatform study of I/O behavior on petascale supercomputers. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC’15). Portland, OR, 33--44.Google ScholarDigital Library
Sandeep Madireddy, Prasanna Balaprakash, Phil Carns, Robert Latham, Robert Ross, Shane Snyder, and Stefan M. Wild. 2018. Machine learning based parallel I/O predictive modeling: A case study on Lustre file systems. In Proceedings of the International Conference on High Performance Computing. Hyderabad, India, 184--204.Google Scholar
Ryan McKenna, Stephen Herbein, Adam Moody, Todd Gamblin, and Michela Taufer. 2016. Machine learning predictions of runtime and IO traffic on high-end clusters. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’16). Taipei, Taiwan, 255--258.Google ScholarCross Ref
David A. Nowark and Mark Seager. 1999. ASCI terascale simulation: Requirements and deployments. In Oak Ridge Interconnect Workshop (ASCI-00-003.1). Oak Ridge, TN, 1--15.Google Scholar
Oak Ridge National Laboratory. 2018. HACC. Retrieved November 9, 2019 from https://www.olcf.ornl.gov/caar/hacc/.Google Scholar
Sarp Oral, Feiyi Wang, David Dillow, Galen Shipman, Ross Miller, and Oleg Drokin. 2010. Efficient object storage journaling in a distributed parallel file system. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). San Jose, CA, 143--154.Google ScholarDigital Library
Hongzhang Shan, Katie Antypas, and John Shalf. 2008. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’08). Austin, TX, 42--54.Google ScholarCross Ref
Hongzhang Shan and John Shalf. 2007. Using IOR to analyze the I/O performance for HPC platforms. In Proceedings of the Cray User Group Meeting (CUG’07). Washington, DC, 1--15.Google Scholar
Galen Shipman, David Dillow, Douglas Fuller, Raghul Gunasekaran, Jason Hill, Youngjae Kim, Sarp Oral, Doug Reitz, James Simmons, and Feiyi Wang. 2012. A next-generation parallel file system environment for the OLCF. In Proceedings of the Cray User Group Conference (CUG’12). Stuttgart, Germany, 1--12.Google Scholar
Galen Shipman, David Dillow, Sarp Oral, and Feiyi Wang. 2009. The Spider center wide file system: from concept to reality. In Proceedings of the Cray User Group Meeting (CUG’09). Atlanta GA, 1--10.Google Scholar
Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. Data sieving and collective I/O in ROMIO. In Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation (Frontiers’99). Annapolis, MD, 182--189.Google ScholarDigital Library
Yuan Tian, Scott Klasky, Hasan Abbasi, Jay Lofstead, Ray Grout, Norbert Podhorszki, Qing Liu, Yandong Wang, and Weikuan Yu. 2011. EDO: Improving read performance for scientific applications through elastic data organization. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’11). Austin, TX, 93--102.Google ScholarDigital Library
Andrew Uselton, Mark Howison, Nicholas J. Wright, David Skinner, Noel Keen, John Shalf, Karen L. Karavanic, and Leonid Oliker. 2010. Parallel I/O performance: From events to ensembles. In Proceedings of the 24th IEEE International Parallel 8 Distributed Processing Symposium (IPDPS’10). Atlanta, GA, 1--11.Google ScholarCross Ref
Lipeng Wan, Matthew Wolf, Feiyi Wang, Jong Youl Choi, George Ostrouchov, and Scott Klasky. 2017. Analysis and modeling of the end-to-end I/O performance on OLCF’s titan supercomputer. In Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS’17). Salt Lake City, Utah, 1--9.Google Scholar
Feiyi Wang, Sarp Oral, Galen Shipman, Oleg Drokin, Tom Wang, and Isaac Huang. 2009. Understanding Lustre filesystem internals. Technical Report ORNL TM-2009, 117, 1--80.Google Scholar
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). Seattle, WA, 307--320.Google ScholarDigital Library
Bing Xie. 2017. Output Performance of Petascale File Systems. Ph.D. Dissertation. Duke University, Durham, NC.Google Scholar
Bing Xie, Jeffrey Chase, David Dillow, Oleg Drokin, Scott Klasky, Sarp Oral, and Norbert Podhorszki. 2012. Characterizing output bottlenecks in a supercomputer. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’12). Salt Lake City, UT, 1--11.Google ScholarDigital Library
Bing Xie, Jeffrey S. Chase, David Dillow, Scott Klasky, Jay Lofstead, Sarp Oral, and Norbert Podhorszki. 2017. Output performance study on a production petascale filesystem. In HPC I/O in the Data Center Workshop (HPC-IODC’17). Frankfurt, Germany, 1--14.Google Scholar
Bing Xie, Yezhou Huang, Jeffrey Chase, Jong Youl Choi, Scott Klasky, Jay Lofstead, and Sarp Oral. 2017. Predicting output performance of a petascale supercomputer. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC’17). ACM, Washington DC, 181--192.Google ScholarDigital Library

Index Terms

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Grid computing
  2. Dependable and fault-tolerant systems and networks
    1. Secondary storage organization
2. General and reference
  1. Cross-computing tools and techniques
    1. Performance

Recommendations

A multiple-file write scheme for improving write performance of small files in Fast File System

Fast File System (FFS) stores files to disk in separate disk writes, each of which incurs a disk positioning (seek + rotation) limiting the write performance for small files. We propose a new scheme called co-writing to accelerate small file writes in ...
Read More
WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication

Journaling is a commonly used technique to ensure data consistency in file systems, such as ext3 and ext4. With journaling technique, file system updates are first recorded in a journal (in the commit phase) and later applied to their home locations in ...
Read More
Implementation of a stackable file system for real-time network backup

We propose a backup system based on a stackable mirroring file system, general-purpose mirroring file system (GMFS). This file system mirrors data in real-time on the file system layer. It uses the typical network file system (NFS) and backs up data to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Storage Volume 15, Issue 4
Usenix Fast 2019 Special Section and Regular Papers
November 2019
228 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3373756
Editor:
Sam H. Noh
Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
Issue’s Table of Contents
Copyright © 2020 Public Domain
This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 January 2020
- Accepted: 1 May 2019
- Revised: 1 March 2019
- Received: 1 April 2018
Published in tos Volume 15, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
High-performance computing
benchmarking
file systems
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 478
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

A multiple-file write scheme for improving write performance of small files in Fast File System

WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication

Implementation of a stackable file system for real-time network backup

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Characterizing Output Bottlenecks of a Production Supercomputer: Analysis and Implications

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

A multiple-file write scheme for improving write performance of small files in Fast File System

WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication

Implementation of a stackable file system for real-time network backup

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media