ArcMap

Improving batch geocoding performance

Geocoding performance has always been a top priority when working with large volumes of addresses, and it’s a primary consideration when designing and implementing a geocoding workflow.  There are several factors that contribute to geocoding performance, I won’t cover all of these in this article but, let’s look at some simple, high value changes we can make to our locators and workflow to improve performance.

Increasing the run time memory limit:

ArcGIS Locators can be configured to use more system memory.  Using system memory (RAM) to work with locators is much faster than reading a locator from disk.  ArcGIS supports use of 2GB memory on a 32 bit system and 3GB of memory on a 64 bit system.  By default ArcGIS locators are configured to use 512MB of system memory.  The default values were chosen to ensure the greatest possible system coverage, without exceeding system limits and possibly destabilizing a system.  By increasing the memory allocation for locator use above 512MB we see good performance gains.

ArcGIS locators support a “Run Time Memory Limit” parameter.  For large geocoding jobs working against a large locator such as US streets, or US Composite, we’ll want to set this limit near the capabilities of our system.  With ArcGIS 10, geocoding performance continues improving as the cache grows beyond 2GB.

Follow the steps below to create or edit an existing Run Time Memory Limit parameter:

  1. Open the *.loc file in a text editor (note this is the .loc file and not the .loc.xml file).
  2. Look for the line that starts with RuntimeMemoryLimit =
  3. If the value does not exist, create one and change the value to represent the number of bytes of system memory you want to allocate for the locator:
      RuntimeMemoryLimit = 2048000000  
  4. Save the locator.
  5. Use the locator and compare performance and system memory use.

You can play with this value to find something that is appropriate for your system.  In most cases you will see optimum performance near 2GB of memory.  If you’ve allocated too much memory, the system may become unstable or ArcGIS may crash.  Reduce the memory allocation to rectify.

Presort reference data for optimal performance:

It’s always much faster to find something when it is organized than when it is chaotic.   Geocoding is no different.  Sorted addresses can be geocoded faster than unsorted data.  Sorted addresses tax the system less, hard disks don’t have to work so hard and the geocoding engine makes less frequent memory allocations.

A typical approach for sorting data would be to sort by the highest level of geography first and then to sort on smaller levels of geography.  With a U.S. address for example, you should sort the address data by State, followed by City, and Postal Code.  If your data isn’t already sorted, you can sort it using the ArcGIS Sort (Data Management) tool or you can configure a locator to sort data as it’s read into the geocoding process.

Configuring a locator for sorting is similar to changing the run time memory limit, except you need to configure two parameters instead of one.  The first parameter specifies the fields to sort by.  These fields are the input fields defined by the locator.  The second parameter is the number of records to sort in each process.  You could sort the entire table, but that would be very memory and CPU intensive so instead you can choose to sort the first n number of records.  Defining a higher value can help produce higher performance but again, taxes system resources.  A typical value supporting good performance might be 100000 records.

Follow the steps below to create or edit an existing Batch Presort parameter:

  1. Open the *.loc file in a text editor (note this is the .loc file and not the .loc.xml file) .
  2. Look for the line that starts with BatchPresortInputs = State
  3.  If the value does not exist, create one and assign the first locator input field to sort by. Repeat the entry for additional fields
    BatchPresortInputs = State
    BatchPresortInputs = City

    BatchPresortInputs = Zip
  4. Look for the line that starts with BatchPresortCacheSize =
  5. If the value does not exist, create one and assign the first locator input field to sort by.  Repeat the entry for additional fields
    BatchPresortCacheSize = 100000
  6. Save the locator.
  7. Use the locator and compare performance and system memory use.

In this example I geocoded 10,000 unsorted U.S. addresses with a U.S. streets locator and the preceding enhancements.  Performance increased from 350,000 recs/hr to 614,000 recs/hr.  You can experiment with the  batch presort cache size to find optimal performance for your locators and data.

In the future we’ll look at more ways to increase geocoding performance, but these techniques are a good start and should result in some excellent results.

Jeff

Next Article

Engaging Volunteers for a Cause

Read this article