Dicing Godzillas (features with too many vertices)

Vertices are the x,y coordinate pairs that define the shape of a feature, and the size of an individual feature (polygon, polyline, or multipoint) is defined its number of vertices. When a single feature has a million or so vertices, it can cause out-of-memory errors and, in some cases, a system crash – never a good thing. We call such gargantuan features ‘Godzillas’ because they wreak havoc on your computer’s resources. Godzillas are usually long and crenulated coastlines or street casings digitized at a high degree of accuracy.

Operations that are particularly vulnerable to Godzillas are:

Godzillas typically raise the geoprocessing error codes 000426 (Out Of Memory), 010005 (Unable to allocate memory), 999998 (Unexpected Error), and 999999 (Error executing function).

How Godzillas are created

  • Data entry using stream digitizing (as opposed to point-by-point digitizing) is the usual culprit—it is easy to create features with too many vertices using stream digitizing.
  • Features imported from other software are another source of Godzillas. Typically, the software that creates them is single-purpose data entry software with no analytic capability like ArcGIS.
  • The Dissolve tool can create a Godzilla by combining smaller (but still fairly large) features into one feature. This is known as the combinatorial problem. The Dissolve tool has logic that prevents it from creating a Godzilla (you’ll receive the warning code 000059) but this logic is based on the machine’s available memory at the time Dissolve is run. So, while the output may not be a Godzilla on the machine where Dissolve ran, it may be on another machine with less available memory.

How many vertices define a Godzilla?

Unfortunately there is no simple answer since it depends entirely on the available memory your machine has–more memory means more vertices per feature can be processed. You can increase available memory by closing down all other applications other than ArcGIS , turning off background processing as described in the Desktop help topic Foreground and background processing, and re-running your operation. But this is a one-time solution: what you really need to do is get rid of the Godzilla.

Finding Godzillas

If you think you’ve got a Godzilla, your first task is to count the number of vertices for every feature in your feature class. The recipe for this is:

  1. Use the Add Field tool to add a new field named VERTEXCOUNT. The field type is LONG.
  2. Next, use the Calculate Field tool with this expression: !shape!.pointcount, as illustrated below.
  3. After Calculate Field runs, open the feature class attribute table and sort on the VERTEXCOUNT column, or use the Summary Statistics tool to find the MAX of VERTEXCOUNT

The VERTEXCOUNT field you added and calculated is not automatically recalculated or maintained. Anytime the geometry of features change, you’ll have to run Calculate Field again. Of course, if you no longer need the VERTEXCOUNT field, use Delete Field to remove it.

Simplifying your data

The first question you should ask is if you really need all those vertices to describe the shape of your Godzilla. If you don’t need all the vertices, use the Simplify Polygon or Simplify Line tool. These tools weed out unnecessary vertices. After running Simplify Polygon or Simplify Line, recalculate VERTEXCOUNT by running Calculate Field again. If the vertex count does not drop dramatically (or the tool fails to run because of the Godzilla), you’ll need to dice your Godzilla as described next.

Using the Dice tool

The Dice tool takes input features and a vertex limit and outputs a new feature class with diced features, as illustrated below. The Dice tool works with multipoints, lines, and polygons.

Choosing a vertex limit for the Dice tool

Obviously, the vertex limit value for the Dice tool needs to be less than the maximum of VERTEXCOUNT. Smaller vertex limits create more features, but this is hardly ever a concern since adding more features is rarely a computational issue—it’s the Godzilla that’s the problem. Here are some suggestions:

  • Use half the maximum VERTEXCOUNT – A good starting point is to set the vertex limit to half of the maximum VERTEXCOUNT.
  • Experiment with the Godzilla instead of the entire feature class – The Dice tool works on selected features, so you can select your Godzilla in ArcMap, run the Dice tool, visually inspect the result, and try different vertex limits until you’re happy with the results. At this point, you can either run Dice on the entire feature class, or replace the Godzilla with its diced version as follows:
    • In ArcMap, select the Godzilla and run the Delete Features tool, inputting the layer with the selected Godzilla.
    • Run the Append tool to append the diced Godzilla into its original feature class.

Apportioning attributes

When using Dice (or any overlay tool found in the Overlay toolset), all attribute values from the input feature class are carried across to the output feature class. If any of the input attributes contain values that are apportioned by area (such as a population count), you’ll want these attribute values to be apportioned among the new features created by Dice. To apportion attributes, use the Make Feature Layer tool and check “Use Ratio Policy” for any attribute that needs to be apportioned by area, and use the output of Make Feature Layer as the input to Dice. The ratio is based on the ratio in which the original geometry is divided. If the geometry is divided in half, each new feature’s attribute gets one-half of the value of the original object’s attribute.

Maintaining parentage

The Dice tool does not maintain the parent Object ID of the original feature, so unless you have a unique ID field, you’ll have no way of knowing the original feature from which the new feature was created. Therefore, you should add a unique ID field before Dice is run. If you don’t already have a unique ID field, do this:

  • Use Add Field to add a new field named UniqueID with the field type of LONG.
  • Use Calculate Field to calculate UniqueID equal to OBJECTID.

Geometry errors

The Check Geometry tool identifies possible geometry errors such as null coordinates, empty rings, and self intersections. Godzillas, because of their size, are error suspects. The Dice tool cannot check the geometry of the input features since the operation may fail on the Godzilla, so any errors in the input will be written to the output. If you haven’t run Check Geometry prior to running Dice, you should run Check Geometry on the output of Dice.

Geometry changes after running Dice

Polygons
  • If an individual connected component (a feature and any internal rings) exceeds the vertex limit it will be subdivided by using it’s envelope to split it into two equal parts based on its vertex count. This subdivision is repeated until the results of splitting parts are under the set vertex limit.
  • Splitting of simple polygons is done along a polygon’s smallest diameter.
  • For large very complex polygons the initial dice line is determined with a best guess algorithm which samples the data to get an idea of the density of vertices across the feature envelope.
  • The total vertex count of all parts of a diced polygon feature in the output should never be less than the starting vertex count for the polygon prior to running the dice tool.
Lines
  • If a part exceeds the vertex limit it will be subdivided into parts with roughly the same number of vertices. You can think of this as subdividing a single part into parts of (almost) equal vertices. Dice does not count up vertices from the beginning vertex until the vertex limit is reached to create a feature. There are too many cases where very short lines would be created.
  • All splitting of lines occur at an existing vertex (which are duplicated to participate in two consecutive new parts).
  • The total vertex count of all parts of the diced line feature in the output should never be less than the starting vertex count for the line prior to running the dice tool.
Multipart Points
  • If the multipoint exceeds the vertex limit it is subdivided by using the feature’s envelope to split it in two. This is repeated until no multipoint exceeds the vertex limit.
  • An attempt is made to spatially group the points within a multipoint feature when splitting.
  • The total vertex count of all parts of the diced multipoint feature in the output should always equal the starting vertex count for the multipoint feature prior to running the dice tool.

This post was contributed by Ken Hartling, a product engineer on the geoprocessing team

This entry was posted in Analysis & Geoprocessing and tagged , , , , , , , , , . Bookmark the permalink.

Leave a Reply

4 Comments

  1. sharb says:

    Hi Dale and Ken, Thanks for a really valuable post. I know you say that there is no definitive threshold vertex count to define a Godzilla feature, but can you suggest a frame of reference? I’m working with several datasets, so I’ve compared between them. Three different feature classes had a maximum number of feature vertices of up to 260, 686, and 2329. Obviously that third feature class has some features with a relatively huge number of vertices, but I’m looking to see where I can “cut off” which feature classes I should be Dice-ing, and which I can leave as is. Thanks for all your help.

    • KenH says:

      Hi Sharb.
      You should only be considering dicing features if you’ve run into a problem running a tool. The sizes you list in your post would not be considered a problem on most machines. Usually, we only see issues when a machines resources are exhausted due to individual features having millions of vertices, or where the combining of features during tool execution results in a single feature containing millions of vertices. What will determine whether a problem will occur at 1 million vertices or 100 million (for example) will be the amount of memory you have available to the process. If you need more info, see this blog on being successful with large overlay processes (with large and/or complex data): http://blogs.esri.com/esri/arcgis/2012/06/15/be-successful-overlaying-large-complex-datasets-in-geoprocessing/
      Ken

  2. handokovska says:

    hai Ken, this is very interesting. Is there any workaround to identify/to find godzilla more accurately. I mean i found an error 99999 when running find identical on a polylineZM shapefile. So i try to ‘feature to line’ it that also failed- error 99999. I count the vertices, based on your article, and got the maximum count is about 90.000 vertices. I try to edit and using ‘split tool’ to split the longest line, and make it half from it. but error still shows up when running some geoproccessing tools. Try to ‘simplify line’ also got an error. but when I ‘dice’ it is success, with max 5000 vertices. Try to running feature to line and simplify to this it still got error 9999. I know there’s Godzilla in it, but some how it hide, and difficult to find…. :D

    • KenH says:

      From your description it may not be a ‘Godzilla’ causing the issue. Could you put in a support case with a full reproducible case please? Please include the exact steps to cause the error, and the data you are running with.
      Thanks, Ken