- Map caching is a way you can speed up web map delivery by predrawing a series of images across the extent of the map at many different scales. The server saves the images, or tiles, in a cache. The server can then distribute tiles to satisfy client requests for maps, much faster than drawing the map on the fly each time.
- Amazon EC2 is a cloud service that allows you to rent computing power from Amazon Web Services. You launch virtual machines in Amazon’s data centers and pay for them by the hour. Once you are done with the virtual machines, you terminate them and incur no further cost. The procedures for running ArcGIS Server on Amazon EC2 are documented here.
Map caching is a CPU and memory-intensive process because the map must be drawn repeatedly, often thousands of times, as the tiles are created across the map. Because these computing resources are needed only temporarily, map caching is a great fit for Amazon EC2. This is especially true if you don’t have any extra computers available on premises and you don’t want to bog down your production ArcGIS Server machines with the task of creating tiles.
Comparing instance sizes and performance while building cache
To find out some of the speed and cost of creating tiles with Amazon EC2, I downloaded all the land parcels in Montana from the Montana State Library and put them into a map intended to be overlaid on a separate imagery cache (such as the ArcGIS Online imagery layer). I applied antialiasing and labeling on the parcels. I then built this cache for the entire state of Montana down to approximately the 1:4500 scale. Here is an example of how the cache looks overlaid on some imagery.
The final cache contained about 11.8 million tiles and took about 7.8 GB of space.
I repeated this test with all of the EC2 instance types offered by ArcGIS Server Cloud Builder on Amazon Web Services except for Medium (which is probably too weak to be a practical option for building large caches). I used Windows instances for all the tests. Only one instance participated in each test site, although if you wanted to throw additional computing resources at a cache, you could launch a site with multiple instances.
When creating any cache with ArcGIS Server, you need to decide how many instances of the CachingTools service you will allow to work on the cache at any time. For consistency I always allowed 2n + 1 instances, where n is the number of vCPUs in this chart. You may find that your threads are more efficiently utilized by adjusting this number up or down.
I was mainly interested in two factors:
- How fast could each instance type build the cache?
- Since each instance type has a different hourly price, which instance type offered the best value (in terms of tiles per penny)?
The chart below displays both these variables. Instance sizes toward the right created the cache faster than instance sizes toward the left. Instance sizes toward the top offered more value than instance sizes toward the bottom.
In summary, Amazon EC2’s regular instance types offered more value than their high-memory counterparts of equal CPU numbers. However, the high-memory instances created the cache slightly faster.
The M3 Double Extra Large option can build the cache relatively quickly while maintaining one of the best values on the chart. It took about 5 hours and cost under $8 USD to cache all the parcels in Montana down to ~1:4500 using this instance type.
The outlier on this chart is the cluster compute instance, which may be appropriate for other types of jobs but does not appear to have the memory sufficient to offer good value or speed for cache tile creation.
Further testing is needed to determine if these trends hold true across Linux instances and other map styles.
Data transfer considerations
When you prepare to create cache on the cloud, getting the data onto the cloud machines may be the most time-consuming part of your process. The speed at which you can transfer your source data to the cloud is limited by your network connection speed and bandwidth. For extraordinarily large amounts of data (over 1 TB for example) it may make more sense to ship a disk drive to Amazon and have them load it into the cloud via their AWS Import/Export service, or just cache the data on premises.
Moving the tiles from the cloud to your on-premises machine can also take some time, although with imagery the cache will often occupy much less disk space than the original images and will hence transfer more quickly. Of course, you can always choose to host the tiles directly from an ArcGIS Server site running in the cloud.
Esri licensing is also a factor when deciding whether to cache using Amazon EC2. You must license any software that you cause to run in Amazon EC2, including all the virtual CPU cores that you utilize with the instance sizes mentioned above. Enterprise License Agreements (ELAs) are therefore the most suitable for cloud-based caching, as they place no restriction on the number of cores where you can launch ArcGIS Server. If you don’t have an ELA, Esri offers short term licensing for ArcGIS Server that is designed for cloud deployments. A pay-by-the-hour model integrated into the Amazon instance price is not currently available.
When building a large cache, you might be tempted to set up auto scaling triggers that automatically increase the number of EC2 instances working on the cache as the CPU usage increases. However, auto scaling is better suited to handling unexpected spikes in traffic. When creating caches, you already know that you will need a great amount of computing power; therefore, it makes more sense to launch all your needed instances before you build the cache, rather than waiting for them to launch sequentially via auto scaling triggers.
Contributed by Sterling Quinn of the ArcGIS for Server development team