The MDTM Project
Time 10/05/15 11:20AM-12:10PM
Room Room 25-B
Multicore and manycore have become the norm for high-performance computing. These new architectures provide advanced features that can be exploited to design and implement a new generation of high-performance data movement tools. To date, numerous efforts have been made to exploit multicore parallelism to speed up data transfer performance: At the application level, various data movement tools or technologies have been developed, such as TCP-based GridFTP and BBCP. Parallel data transfer technologies are widely used in bulk data movement and provide significant improvement in aggregated data transfer throughput. These data transfer tools typically employ a multi-threaded architecture. For a data transfer, multiple threads can be spawned, with each thread handling one or multiple flows, depending on the runtime environments. At the OS level, major OSes (e.g., Windows, and Linux) have been redesigned and parallelized to better utilize additional cores. At the hardware level, new multi-queue NIC technologies have been introduced, and the use of NUMA (non-uniform memory access) systems is on the rise. Due to the scalability advantage of NUMA architecture over UMA (uniform memory access) architecture, high-performance data transfer systems are typically NUMA based and feature several nodes distributed across the system.
Although these parallelization efforts have enhanced data transfer performance significantly, existing data movement tools are still bound by major inefficiencies when running on multicore systems. While there are numerous reasons for these inefficiencies, the inefficiencies fall into two general problem areas: (1) existing data transfer tools are unable to fully and efficiently exploit multicore hardware under the default OS support, especially on NUMA systems. And (2) the disconnect between software and multicore hardware renders network I/O processing on multicore systems inefficient. These inefficiencies are fundamental and common problems that data movement tools will inevitably encounter when running on multicore systems. Ultimately, these inefficiencies result in performance bottlenecks on the end systems. Such end system performance bottlenecks also impede the effective use of advanced networks. The DOE is working towards deploying terabit networks in support of distributed extreme-scale data movement. Existing backbone networks were built on 10-Gigabit technologies but will soon be upgraded with ultra-scalable 100-Gigabit line rate technologies. Resolving performance issues within end systems is becoming the critical element within the end-to-end loop of distributed extreme-scale data movement. Terabit networks need terabit-capable end systems to efficiently move data on and off of the network.
To address these inefficiencies and limitations, DOE ASCR program is funding FNAL and BNL to collaboratively work on a Multicore-Aware Data Transfer Middleware (MDTM) project. MDTM aims to accelerate data movement toolkits on multicore systems. Essentially, MDTM consists of two major components:
• MDTM data transfer applications (client or server). An MDTM application performs data transfer tasks. It adopts an I/O-centric architecture that uses dedicated threads to perform network and disk I/O operations. Additionally, it makes use of MDTM middleware services to fully utilize the underlying multicore system.
• MDTM middleware services. The middleware will hardness multicore parallelism to scale data movement toolkits on host systems. It will provide generic services and functions that can be called by an MDTM application to ensure efficient resource utilization at host systems. MDTM middleware schedules and assign system resources based on the needs and requirements of data transfer applications (i.e., data transfer-centric scheduling). It also takes into account other factors, including NUMA topology, I/O locality, and QoS.
The development and implementation of this project has proceeded on schedule. A prototype version of MDTM is currently undergoing evaluation and enhancement. In this talk, we will describe our architectural approach in developing MDTM data transfer application and MDTM middleware services.
Speaker Dantong Yu ESnet - Brookhaven National Laboratory (BNL)
Speaker Liang Zhang Fermi National Accelerator Laboratory (FNAL)
Primary track Research
Secondary tracks Advanced Networking/Joint Techs