Strategies for data transfer to Amazon Web Services

Creating a GIS deployment with Amazon Web Services requires you to transfer some or all of your GIS data over the Internet to locations on the cloud. This topic lists some options of where you can store your data on the cloud and how you can transfer the data. It also discusses some factors that affect data transfer time.

Places to store the data

Once you create an EC2 instance running ArcGIS Server, you need to prepare to transfer your data to the cloud. There are several places you can store your data. All the following options incur charges from Amazon that are subject to change and that you should research before making your choice.

Options for transferring data to the cloud

Transferring data from your on-premises deployment into the cloud takes time and, in some cases, coordination with your IT security staff. Exporting data to a location on the Internet (in other words, the cloud) is often not as fast or secure as the common data transfers that you do within your local network.

There are many strategies you can use to get data onto the cloud, but if you work with sensitive data, you'll want to make sure you coordinate with your IT staff to make sure your method is secure and approved by your organization. Following are some of your options:

Amazon works with many Solution Providers, some of whom provide data transfer, storage, and security solutions. See Find an AWS Solution Provider to understand whether one of these companies can help with your cloud strategy. Esri itself is one of these providers and offers various project and implementation services for deploying GIS in the Amazon cloud.

Factors that affect data transfer time

Performance of the above data transfer options can vary based on your physical proximity to the Amazon cloud, the time of day, and the quality of your connection to the Internet.

GIS datasets, especially imagery and map caches, can take large amounts of space and may need to be zipped before transfer, either to reduce the size of the file or to reduce the total number of files for more efficient transfer (especially in the case of map caches). Some S3 client utilities may place limits on the size of any one file you can transfer or the number of individual files you can store. Also, some zipping programs have limits on the amount of data that can be zipped. The zipping time and effort should be taken into account when you choose a data transfer option.

Finally, if using S3, be aware of the limitations on the number of buckets you can create and other restrictions on S3 buckets. Amazon lists these in Bucket Restrictions and Limitations.

Maintaining the integrity of data paths

Any time you move data to a new location, you need to be aware of any paths referencing the data that may also need to be updated. This is a concern with map documents, which may reference dozens of data layers at different paths.

Registering your Amazon EC2 data location with your ArcGIS server can help reduce the effort of fixing broken data paths after publishing. See Registering your data with ArcGIS Server using ArcGIS for Desktop.

Another option is to log in to your instance and use ArcMap to repair the out-of-date paths. ArcGIS for Desktop is included on the ArcGIS Server AMI so that you can easily make the repairs. See Repairing broken data links to learn about updating data path information in a map document.

Another way to reduce the need to repair data connections is to use relative paths in your map documents and store your maps and data in a common folder.

12/29/2014