Creating a O(100 PFLOPS) GPU Multi-cloud HTC System for MMA with IceCube

Time 04/01/20 08:45AM-10:00AM

Room Grand Ballroom 3

Session Abstract

The IceCube Neutrino Observatory is the National Science Foundations (NSF)’s premier facility to detect neutrinos with energies above approximately 10 GeV and a pillar for NSF’s Multi-Messenger Astrophysics (MMA) program, one of NSF’s 10 Big Ideas. The detector is located at the geographic South Pole and is designed to detect interactions of neutrinos of astrophysical origin by instrumenting over a gigaton of polar ice with 5160 optical sensors. The sensors are buried between 1450 and 2450 meters below the surface of the South Pole ice sheet.

To understand the impact of ice properties on the incoming neutrino detection, and origin, photon propagation simulations on GPUs are used. We executed a series of runs where we aggregated O(100 PFLOPS) worth of GPUs across multiple public Cloud providers and used them to run the IceCube simulations to produce the much needed calibration data. One such run harvested all available for sale GPUs across Amazon Web Services, Microsoft Azure, and Google Cloud Platform the weekend before SC19, reaching over 51k GPUs total and 380 PFLOP32s, with GPU types spanning the full range of generations from the NVIDIA GRID K520 to the most modern NVIDIA T4 and V100. Another run, while smaller in scale, sustained the peak performance for much longer and used just-in-time fetching of input data.

We report on the tools and effort needed to create and operate such systems, as well as the science motivation to do so.


Speaker Frank Wuerthwein University of California - San Diego

Speaker Benedikt Riedel University of Wisconsin - Madison

Primary track Cloud Integration

