Network Issues for Life Sciences Research

Big Data Management in the Era of Genomic Medicine

Time 07/18/13 02:20PM-02:35PM

Session Abstract

As sequencing technologies continue to evolve and the use of sequencing data makes its way from research into the clinic and hospital, the proliferation of data will continue to accelerate. With this trend and the application of this data to personalized medicine, new challenges in data storage, sharing, security, analysis and retrieval of information will arise. While many of these issues are only now starting to be addressed and anticipated, there is a considerable dearth of readily available solutions to these problems. The creation of data repositories capable of managing genomic information in a manner that enables streamlined access to data has emerged as a critical requirement as the application and use of such data progresses.

One highly relevant example of a data repository solution that fulfills multiple needs for a variety of different users is The Cancer Genomics Hub (CGHub). CGHub is a cancer genome data repository built to support all three major NCI cancer genome sequencing programs: TCGA, TARGET, and the CGCI. CGHub was launched in 2012, hosted by UC Santa Cruz and with only TCGA data online, has more than 44,000 data files totaling more than 500 Terabytes with capacity to grow quickly to 5 Petabytes as additional datasets become available. CGHub is co-located with a biocompute farm that enables cancer researchers the ability to seamlessly access the data files for subsequent analysis using a variety of commercially and/or freely available tools. This repository was built with products and technologies developed at Annai Systems.

In the CGHub example, data are stored in a vast public repository enabling widespread access to a large number of researchers and clinicians. There are also a growing number of smaller sized private repositories used to inform drug discovery, disease diagnosis and patient treatment. While some of the requirements of these repositories are quite similar to one another, there are a number of differences with respect to how the data are used, who will access it, and what type of regulatory and security considerations must be adhered to. Because of these differences, having a set of tools that can be used to provide flexible, scalable solutions that can address multiple use cases is of paramount importance. We will discuss how Annai Systems


Speaker Dan Maltbie Annai Systems

Presentation Media

Secondary tracks Data Movement