Improving Data Mobility & Management for International Climate Science

close
Use Internet2 SiteID

Already have an Internet2 SiteID?
Sign in here.

Internet2 SiteID

KEYNOTE: Infrastructure and standards governance for Earth system model inter-comparison projects

Time 07/15/14 08:30AM-09:30AM

Room GC402

Session Abstract

The system for distribution of output from Earth system models has joined the global data infrastructure alongside observational data, reanalysis output, and so on. The global data infrastructure has become a critical underpinning for climate science and policy and has been recognized as such by the NRC (A National Strategy for Advancing Climate Modeling, NRC 2012).

Software distributed by the Earth System Grid Federation (ESGF) provides the backbone for this infrastructure, which since CMIP5 has become a federated global archive. This software critically depends on standards that guarantee that different data distribution centers can discover, browse, catalog and archive datasets from each other. These standards include:

- IETF standards for metadata and metadata harvesting
- URL and catalog standards (OPeNDAP and THREDDS)
- scientific data standards, including netCDF and the CF conventions
- the ES-DOC Common Information Model for the description of models and
data provenance
- the CMIP5 Data Reference Syntax (DRS) allowing for creation of a
uniform URL namespace for CMIP5 data
- quality control standards guaranteeing dataset conformance
in order to issue DOIs for data citation

While this has worked with demonstrable success for CMIP5 itself, we acknowledge that this success has spawned a veritable cottage industry for MIPs. Smaller communities of modelers are setting up specialized MIPs for studying specific problems of interest to smaller groups.

We present here the current state of the art in creating and maintaining scientific standards for Earth system science data, show examples of the new kinds of science these have enabled, and outline the challenges in "scaling up" these efforts where dispersed scientific communities can be nimble in setting up new MIPs.

This is a major challenge for the modeling centers, which will be faced with meeting the diverse requests of multiple MIPs. Without enforcement of common standards, special procedures will be required for preparing data for each intercomparison activity, and this will ultimately overwhelm modeling group resources. The secondary effects will be felt by data users (often, but not always, the same people), who despite heroic efforts by the modeling groups, will inevitably be faced with heterogeneities in accessing the data and in the data structures themselves that will ultimately substantially impede scientific progress. Moreover without unified infrastructure approaches, it will be difficult to appropriately credit the data providers and systematically gauge the impact of modeling efforts.

Motivated by the above considerations, the WGCM has appointed a small panel (perhaps named the WGCM Infrastructure Panel (WIP)) tasked with establishing and maintaining standards and policies for model data sharing. This group would serve as a counterpart to the CMIP Modeling Panel and would allow the modeling groups, through the WGCM, to maintain some internal control over the technical requirements imposed by the increasingly burdensome MIPs. The membership would also include representation of those responsible for the infrastructure underpinning the MIPs.

This new working group will create and maintain a document outlining the technologies necessary for operation of a global data infrastructure, along with the standards necessary for maintaining these technologies. The document will outline a protocol for creating and running a MIP. The working group will also be tasked with drawing a broader community into a discussion of these standards, such as by hosting sessions at AGU/EGU and other meetings. Finally they will ensure that model data itself be recognized as a community resource on par with peer-reviewed articles, and work towards ensuring proper credit for the providers of this resource by making data citations a reality.

Speakers

placeholder image

Keynote Speaker Venkatramani Balaji

Dr. V. Balaji has an M.Sc in Physics from the Indian Institute of Technology, Kanpur, and a Ph.D in Physics from the Ohio State University. His background is in the modeling of cloudscale dynamics (non-hydrostatic moist convection and gravity waves) and its effect on climate. This field has always involved the application of the most advanced computational technologies to science, and the development of interdisciplinary models requiring specialists in many fields (meteorology at various scales, hydrology, radiative transfer, boundary layer dynamics). His publications cover the breadth of the field: they include papers on atmospheric climate, ocean dynamics, cloudscale dynamics, non-hydrostatic models of gravity waves, computational methods, and software engineering. With his background in physics and climate science, he has become an expert in the area of parallel computing and scientific infrastructure, providing high-level programming interfaces for expressing parallelism in scientific algorithms. Over the years, this breadth has evolved into an interest in the development of modeling infrastructures, beginning with the creation of the GFDL Flexible Modeling System, of which he is the chief architect, moving to technical leadership roles in international consortia to build similar frameworks across multiple institutions (the Earth System Modeling Framework in the US, and the Program for Integrated Earth System Modeling in Europe). This pioneering use of frameworks allowing the construction of climate models out of independently developed components sharing a technical architecture, is now accepted practice. Going further, he has led the development of curators (FMS Runtime Environment FRE) for the execution of complex workflows to manage the complete climate modeling process. The Earth System Curator (US) and Metafor (EU) projects, in which he plays a key role, have developed the use of a common information model which allows the execution of complex scientific queries on model data archives. Dr. Balaji plays a key role in designing systems for the interpretation of climate projection data, including the delivery of data from the globally coordinated CMIP3 and CMIP5 experiments. Globally coordinated modeling studies demand a global data sharing infrastructure, and Dr. Balaji is a founding member of the Earth System Grid Federation, and on the steering committee for the Global Organization of Earth System Science Portals (GO-ESSP). He was recently appointed co-chair of the WGCM Infrastructure Panel(WIP), tasked with developing the scientific requirements for the global data infrastructure underlying CMIP. He has grants now covering the development of climate analytics on exascale data archives (ExArch: international grant under the NSF-G8 initiative) and of governance of global software infrastructure (Commodity Governance: COG, an interdisciplinary collaboration of physical and social scientists, software engineers and historians). Dr. Balaji's writings and research include efforts to raise awareness of the serious challenges of climate change, and making available widespread access to the scientific tools for the study of climate change. He is particularly active in the area of attracting students with an interest in advanced informational technology and computational science to this scientific problem of high societal and policy significance. This includes articles aimed at students at ACM (Association for Computing Machinery) and AAAS (American Association for the Advancement of Science), talks and courses at institutions like the Computer Society of India and the Abdus-Salam International Centre of Theoretical Physics. He is also on a Faculty team helping develop coursework for a Graduate Certification in Computational Science through the Princeton Institute for Computational Science and Engineering (PICSciE). Dr. V. Balaji has headed the Modeling Systems Group since 2003, serving developers of Earth System models at GFDL and Princeton University. He serves on the Scientific Advisory Boards of the ORNL Climate Change Science Institute and the Max-Planck Institute for Meteorology in Hamburg. He also plays advisory roles on many other NSF, NOAA and DOE review panels, including the 2010 series of DOE exascale workshops, and was a co-author of the 2012 National Academies report on "A National Strategy for Advancing CLimate Modeling". He is a sought-after speaker and lecturer and is committed to provide training in the use of climate models in developing nations, leading workshops to advanced students and researchers in South Africa and India. His last workshop (July 2013) was sponsored by the Abdus-Salam International Centre for Theoretical Physics (ICTP) from Trieste, Italy, and hosted by the Indian National Centre for Ocean Information Science (INCOIS) in Hyderabad, India. The course will help train advanced students in India in ocean modeling, aimed at the study of the Indian monsoon, which is an ocean-atmosphere coupled problem. With this and future courses he aims to help develop an indigenous capability in India for coupled modeling of the Earth System.

Presentation Media