ABSTRACT
Numerical Weather Prediction (NWP) and Climate simulations sit in the intersection between classically understood High Performance Computing (HPC) and the Big Data / High Performance Data Analytics (HPDA) communities. Driven by ever more ambitious scientific goals, both the size and number of output data elements generated as part of NWP operations has grown by several orders of magnitude, and will continue to grow into the future. The total amount of data is expected to grow exponentially with time, and over the last 30 years this increase has been approximately 40% per year. This poses significant scalability challenges for the data processing pipeline, and the movement of data through and between stages is one of the most significant factors in this. At ECMWF, meteorological data within the HPC facility is stored in an indexed data store for retrieval according to a well defined schema of meteorological metadata. This paper discusses the design and implementation of the next version (5th) of this indexed data store, which aims to increase the range of contexts within the operational workflow in which it can be used, and to increase its tolerance to failure. Further, it aims to pre-emptively head off some upcoming scalability bottlenecks present in the previous versions.
- Amazon. 2016. AWS Storage Services Whitepaper. https://d0.awsstatic.com/whitepapers/Storage/AWS%20Storage%20Services%20Whitepaper-v9.pdf. (2016).Google Scholar
- Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A Distributed Storage System for Structured Data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7 (OSDI '06). USENIX Association, Berkeley, CA, USA, 15--15. http://dl.acm.org/citation.cfm?id=1267308.1267323 Google ScholarDigital Library
- Cray Inc. 2014. Cray XC40 DataWarp applications I/O accelerator. http://www.cray.com/sites/default/files/resources/CrayXC40-DataWarp.pdf. (2014).Google Scholar
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazonâĂŹs highly available key-value store. In IN PROC. SOSP. 205--220. Google ScholarDigital Library
- Brian Eaton, Jonathan Gregory, Bob Drach, Karl Taylor, Steve Hankin, John Caron, Rich Signell, Phil Bentley, Greg Rappa, Heinke HÃűck, Alison Pamment, and Martin Juckes. 2011. NetCDF Climate and Forecast (CF) Metadata Conventions. http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.pdf. (2011).Google Scholar
- ECMWF. 2015. ECMWF Strategy 2016--2015, The strength of a common goal. http://www.ecmwf.int/sites/default/files/ECMWF_Strategy_2016-2025.pdf. (2015).Google Scholar
- Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux J. 2004, 124 (Aug. 2004), 5--. http://dl.acm.org/citation.cfm?id=1012889.1012894 Google ScholarDigital Library
- Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. 1995. Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. Google ScholarDigital Library
- Intel. 2014. DAOS API Design Document. https://wiki.hpdd.intel.com/download/attachments/12127153/DAOS%202.5%20DAOS%20API%20and%20DAOS%20POSIX%20Design%20Document.pdf. (2014).Google Scholar
- Intel Corporation. 2017. pmem.io Persistent Memory Programming. http://pmem.io. (2017).Google Scholar
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35--40. Google ScholarDigital Library
- Yann Meurdesoif, Ozdoba H., Caubel A., and Marti O. 2012. XIOS. http://forge.ipsl.jussieu.fr/ioserver/raw-attachment/wiki/WikiStart/XIOS_IO_Workshop_Hamburg.pdf. (2012).Google Scholar
- NextGenIO. 2016. NextGenIO website. http://www.nextgenio.eu/. (2016).Google Scholar
- University of Michigan. Department of Electrical Engineering, Computer Science. Computer Science, Engineering Division, D Thaler, and C Ravishankar. 1996. A name-based mapping scheme for rendezvous.Google Scholar
- B. Raoult. 2012. Architecture of the new MARS server. (2012).Google Scholar
- Redis MPI Forum. 2017. Redis Cluster Specification. https://redis.io/topics/cluster-spec. (2017).Google Scholar
- SAGE. 2016. Data Storage for Extreme Scale; The SAGE Project Technical White Paper. http://sagestorage.eu/sites/default/files/Sage%20White%20Paper%20v1.0.pdf. (2016).Google Scholar
- SAGE. 2016. Percipient StorAGe for Exascale. http://www.sagestorage.eu. (2016).Google Scholar
- The HDF Group. 2017. Parallel HDF5. https://support.hdfgroup.org/HDF5/PHDF5. (2017).Google Scholar
- The MPI Forum. 2012. MPI: A Message Passing Interface. http://mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf. (2012).Google Scholar
- University Corporation for Atmospheric Research. 2016. NetCDF Format. www.unidata.ucar.edu/software/netcdf/docs. (2016).Google Scholar
- Richard W Watson and Robert A Coyne. 1995. The parallel I/O architecture of the high-performance storage system (HPSS). In Mass Storage Systems, 1995. 'Storage-At the Forefront of Information Infrastructures', Proceedings of the Fourteenth IEEE Symposium on. IEEE, 27--44. Google ScholarDigital Library
- World Meteorological Organization. 2013. GRIB Format. http://www.wmo.int/pages/prog/www/DPS/FM92-GRIB2-11-2003.pdf. (2013).Google Scholar
Index Terms
- A Scalable Object Store for Meteorological and Climate Data
Recommendations
A High-Performance Distributed Object-Store for Exascale Numerical Weather Prediction and Climate
PASC '19: Proceedings of the Platform for Advanced Scientific Computing ConferenceNumerical Weather Prediction (NWP) and Climate simulations sit at the intersection between classically understood High Performance Computing (HPC) and Big Data / High Performance Data Analytics (HPDA). Driven by ever more ambitious scientific goals, ...
Estimating daily meteorological data and downscaling climate models over landscapes
AbstractHigh-resolution meteorological data are necessary to understand and predict climate-driven impacts on the structure and function of terrestrial ecosystems. However, the spatial resolution of climate reanalysis data and climate model ...
Highlights- High-resolution under current and future conditions data are needed to predict climate-driven impacts on ecosystems.
Simple forecasts and impacts of climate change on meteorological trends in the Chi River Basin, Thailand
ICOSSSE'11: Proceedings of the 10th WSEAS international conference on System science and simulation in engineeringDrought and flood presently occurring in Thailand are becoming severe. The better understanding of meteorological variability is particularly important in managing water resources and agriculture of the country. The frequency of occurrence on disaster ...
Comments