Big Data Analytics at Scale: Lessons learnt from processing CMIP-5 on Mira
Time 07/14/14 02:45PM-03:15PM
The TECA (Toolkit for Extreme Climate Analysis) software, developed under the CASCADE project, enables petascale-class climate data analytics. TECA can be configured to extract extreme weather events, such as Tropical Cyclones, Atmospheric Rivers, Extra-Tropical Cyclones, etc from massive datasets. In this work, we present lessons learnt from the application of TECA to a large fraction of the CMIP-5 archive.
Our end-to-end data analysis workflow consists of the following steps:
- downloading 60TBs of CMIP-5 data via ESGF
- storing and pre-processing data at NERSC
- transferring 6TB sized dataset to ALCF over ESnet
- running TECA at full concurrency on Mira
We will present a performance data pertaining to end-to-end network bandwidth and server utilization. We will present TECA MPI initialization, Parallel I/O, and communication optimizations that enabled the full-scale Mira run. We also present preliminary results on the characterization of extra-tropical cyclone activity in CMIP-5 (historical period), and comparison to numerous reanalysis products (NCEP2, NCEP-CFSR, JRA55, ERA_Interim). Preliminary investigations seem to suggest a strong dependence of extra-tropical counts on model resolution. We are exploring the applicability of scaling relationships to account for resolution, followed by comparisons with RCP8.5 results to determine changes in ETC activity.
Speaker . Prabhat Lawrence Berkeley National Laboratory