Improving Data Mobility & Management for International Climate Science

close
Use Internet2 SiteID

Already have an Internet2 SiteID?
Sign in here.

Internet2 SiteID

Your organization not listed? Create a local account to use Internet2 services.

Create SiteID

Big Data Analytics at Scale: Lessons learnt from processing CMIP-5 on Mira

Time 07/14/14 02:45PM-03:15PM

Room GC402

Session Abstract

The TECA (Toolkit for Extreme Climate Analysis) software, developed under the CASCADE project, enables petascale-class climate data analytics. TECA can be configured to extract extreme weather events, such as Tropical Cyclones, Atmospheric Rivers, Extra-Tropical Cyclones, etc from massive datasets. In this work, we present lessons learnt from the application of TECA to a large fraction of the CMIP-5 archive.

Our end-to-end data analysis workflow consists of the following steps:
- downloading 60TBs of CMIP-5 data via ESGF
- storing and pre-processing data at NERSC
- transferring 6TB sized dataset to ALCF over ESnet
- running TECA at full concurrency on Mira

We will present a performance data pertaining to end-to-end network bandwidth and server utilization. We will present TECA MPI initialization, Parallel I/O, and communication optimizations that enabled the full-scale Mira run. We also present preliminary results on the characterization of extra-tropical cyclone activity in CMIP-5 (historical period), and comparison to numerous reanalysis products (NCEP2, NCEP-CFSR, JRA55, ERA_Interim). Preliminary investigations seem to suggest a strong dependence of extra-tropical counts on model resolution. We are exploring the applicability of scaling relationships to account for resolution, followed by comparisons with RCP8.5 results to determine changes in ETC activity.

Speakers

Speaker . Prabhat Lawrence Berkeley National Laboratory