Smart Infrastructure for Big Data
Time 07/15/14 10:30AM-11:00AM
We report on a major expansion to JASMIN, a big data infrastructure upon which the UK Centre for Environmental Data Archival (CEDA) operates data centres and a major scientific analysis environment. Activities include the British Atmospheric Data Centre (BADC), support for UK Earth System Grid Federation nodes, and the academic component of the facility for Environmental Monitoring from Space (CEMS). Internal and external network design are key components of an infrastructure ready for the challenges coming from climate science, earth observation and wider environmental science.
JASMIN Phase 2, now underway with many components already in operation, includes major expansion to over 3.5K compute cores with 12 PB of fast parallel disk storage (and equivalent capacity in near-line tape). Expansion and re-implementation of the internal network has massively increased capacity for large-scale parallel data processing and has enabled the provision of managed and unmanaged private clouds. The former gives filesystem access to archive and shared workspace environments; the latter will enable tenant organisations to host their own virtual infrastructures including web services and portals. A dedicated “Science DMZ” has been implemented, with high-performance data transfer nodes and monitoring tools for a variety of large bidirectional science data flows. Collaboration with other sites as part of the International Climate Network Working Group has set ambitious throughput targets representative of climate community needs, but these are likely to be at least matched if not exceeded by the data transfer requirements of the Earth Observation community.
Drivers for the current architecture come from a wide range of use cases across climate and environmental science, with the “Big” in “Big Data” expressed in several forms - from the traditional volume, variety and velocity, to a complex interplay between compute and input/output.
Current work focuses on completion of the cloud infrastructure, whereby CEDA (including BADC and its other data centres) becomes a tenant organisation in the managed cloud, with third party organisations able to operate autonomously in a private cloud environment well-positioned for efficient access to key resources.
Speaker Matt Pritchard STFC Rutherford Appleton Laboratory