2014 Technology Exchange

close
Use Internet2 SiteID

Already have an Internet2 SiteID?
Sign in here.

Internet2 SiteID

Your organization not listed? Create a local account to use Internet2 services.

Create SiteID

High-Performance Distributed Data Management on Terabit Networks

Time 10/30/14 09:15AM-10:00AM

Room White River Ballroom C

Session Abstract

As the advanced 100G networks have been deployed in Internet2 [1] and ESnet[2], the opportunities for new research into high-performance distributed data management on terabit networks have also been extended. This is promising in a sense that high-speed networks can expedite the data exchange among research institutions distributed in a long distance, which eventually reduces the time-to-solution of scientific computing.

Globus transfer service/GridFTP has become the de facto standard for data exchanges for distributed sites in research communities [3]. Many studies related to Globus/GridFTP [4-9] have been conducted to improve performance of data movements over wide-area network and could achieve much better performance compared with normal ftp and other traditional data transfer tools such as scp. Recently, the attempt to saturate a 100G link between two hosts was successful with appropriate tuning of software stack [10].

However, the ever-increasing rate of data growth still require higher performance beyond 100G and the more complicated data flows need sophisticated frameworks to manage allocated resources and balance load among a number of data flows. Data growth is outpacing network bandwidth, requiring additional data management computations to reduce data rates (e.g. compression [11]) and parallel network resource allocations (e.g. multiple network paths [12]). Such research is still in an early stage, and needs more work to be deployed in production system. In addition, many data flows with various quality of service requirements would fail to meet their requirements without proper management frameworks. For example, OSCARS in ESnet [13] provides only path reservations for WAN, not for entire end-to-end paths traversing LAN. DYNES [14] tried to address end-to-end network path setup, but it does not provide a QoS guarantee when multiple data flows compete for shared network resources.

This session will discuss the following:
• How can we sustainably saturate 100G or 1T links for diverse data flows? For example, data flows can be memory-to-memory or disk-to-disk.
• What are the end-system, storage and network challenges in scaling end-to-end transfers beyond 100G?
• How can we schedule multiple flows such that their QoS requirements are met and overall throughput is maximized?
• How can we take into account the overall workflow and related computations if data flows are parts of the scientific workflow and compute resource should be assigned in accordance with data flows?
• What kinds of framework are needed or what will be the design principles for future data movement services?

Speakers

Speaker Raj Kettimuthu Argonne National Laboratory

Speaker Eunsung Jung Argonne National Laboratory

Presentation Media

Primary track Advanced Networking/Joint Techs

platinum Sponsors

gold Sponsors

silver Sponsors

bronze Sponsors

supporter Sponsors