Don’t Copy Data! Instead, Share it at Web-Scale!
Time 10/28/14 08:30AM-09:15AM
Since its start in 2006, Amazon Web Services has grown to over 40 different services. S3, our object store, one of our first services, is now home to trillions of objects and regularly peaks at 1.5 million requests/second. S3 is used to store many data types, including map tiles, genome data, video, and database backups. This presentation’s primary goal is to illustrate best practice around open data sets on AWS. To do so, it showcases a simple map tiling architecture, built using just a few services, including CloudFront (CDN), S3 (object Store), and Elastic Beanstalk (Application Management) in combination with open source tools, Leaflet, Mapserver/GDAL and Yas3fs. My demo will use the USDA’s NAIP aerial imagery dataset (48TB), plus other higher resolution data at the city level, and show how you can deliver images derived from over 219,000 GeoTIFFs to both TMS and OGC WMS clients for the 48 States, without pre-caching tiles while keeping your server environment appropriately sized via EC2 (Virtual Machine) auto-scaling. Because the NAIP data sits in a requester-pays bucket that allows authenticated read access, anyone with an AWS account has immediate access to the all of the source GeoTIFFs, and can copy the data in bulk to anywhere they desire. However, I will show that the pay-for-use model of the cloud, allows for open-data architectures that are not possible with on-prem environments, and that for certain kinds of data, especially BIG data, rather than move the data, it makes more sense to use it in-situ in an environment that can support demanding SLAs.
Speaker Mark Korver Amazon Web Services
Primary track Cloud Services (Systems Integration & Ownership)
Secondary tracks Security