Vertices are the x,y coordinate pairs that define the shape of a feature, and the size of an individual feature (polygon, polyline, or multipoint) is defined its number of vertices. When a single feature has a million or so vertices, it can cause out-of-memory errors and, in some cases, a system crash – never a good thing. We call such gargantuan features ‘Godzillas’ because they wreak havoc on your computer’s resources. Godzillas are usually long and crenulated coastlines or street casings digitized at a high degree of accuracy.
Operations that are particularly vulnerable to Godzillas are:
- Editing operations, such as cutting, pasting, reshaping, or moving a feature.
- Geoprocessing tools that operate on feature geometry, such as most of the tools in the Analysis toolbox, tools found in the Features toolset and Feature Class toolset in the Data Management toolbox. Tools that don’t manipulate features, such as Add Field, are not affected by a Godzilla.
How Godzillas are created
- Data entry using stream digitizing (as opposed to point-by-point digitizing) is the usual culprit—it is easy to create features with too many vertices using stream digitizing.
- Features imported from other software are another source of Godzillas. Typically, the software that creates them is single-purpose data entry software with no analytic capability like ArcGIS.
- The Dissolve tool can create a Godzilla by combining smaller (but still fairly large) features into one feature. This is known as the combinatorial problem. The Dissolve tool has logic that prevents it from creating a Godzilla (you’ll receive the warning code 000059) but this logic is based on the machine’s available memory at the time Dissolve is run. So, while the output may not be a Godzilla on the machine where Dissolve ran, it may be on another machine with less available memory.
How many vertices define a Godzilla?
Unfortunately there is no simple answer since it depends entirely on the available memory your machine has–more memory means more vertices per feature can be processed. You can increase available memory by closing down all other applications other than ArcGIS , turning off background processing as described in the Desktop help topic Foreground and background processing, and re-running your operation. But this is a one-time solution: what you really need to do is get rid of the Godzilla.
If you think you’ve got a Godzilla, your first task is to count the number of vertices for every feature in your feature class. The recipe for this is:
- Use the Add Field tool to add a new field named VERTEXCOUNT. The field type is LONG.
- Next, use the Calculate Field tool with this expression: !shape!.pointcount, as illustrated below.
- After Calculate Field runs, open the feature class attribute table and sort on the VERTEXCOUNT column, or use the Summary Statistics tool to find the MAX of VERTEXCOUNT
The VERTEXCOUNT field you added and calculated is not automatically recalculated or maintained. Anytime the geometry of features change, you’ll have to run Calculate Field again. Of course, if you no longer need the VERTEXCOUNT field, use Delete Field to remove it.
Simplifying your data
The first question you should ask is if you really need all those vertices to describe the shape of your Godzilla. If you don’t need all the vertices, use the Simplify Polygon or Simplify Line tool. These tools weed out unnecessary vertices. After running Simplify Polygon or Simplify Line, recalculate VERTEXCOUNT by running Calculate Field again. If the vertex count does not drop dramatically (or the tool fails to run because of the Godzilla), you’ll need to dice your Godzilla as described next.
Using the Dice tool
The Dice tool takes input features and a vertex limit and outputs a new feature class with diced features, as illustrated below. The Dice tool works with multipoints, lines, and polygons.
Choosing a vertex limit for the Dice tool
Obviously, the vertex limit value for the Dice tool needs to be less than the maximum of VERTEXCOUNT. Smaller vertex limits create more features, but this is hardly ever a concern since adding more features is rarely a computational issue—it’s the Godzilla that’s the problem. Here are some suggestions:
- Use half the maximum VERTEXCOUNT – A good starting point is to set the vertex limit to half of the maximum VERTEXCOUNT.
- Experiment with the Godzilla instead of the entire feature class – The Dice tool works on selected features, so you can select your Godzilla in ArcMap, run the Dice tool, visually inspect the result, and try different vertex limits until you’re happy with the results. At this point, you can either run Dice on the entire feature class, or replace the Godzilla with its diced version as follows:
When using Dice (or any overlay tool found in the Overlay toolset), all attribute values from the input feature class are carried across to the output feature class. If any of the input attributes contain values that are apportioned by area (such as a population count), you’ll want these attribute values to be apportioned among the new features created by Dice. To apportion attributes, use the Make Feature Layer tool and check “Use Ratio Policy” for any attribute that needs to be apportioned by area, and use the output of Make Feature Layer as the input to Dice. The ratio is based on the ratio in which the original geometry is divided. If the geometry is divided in half, each new feature’s attribute gets one-half of the value of the original object’s attribute.
The Dice tool does not maintain the parent Object ID of the original feature, so unless you have a unique ID field, you’ll have no way of knowing the original feature from which the new feature was created. Therefore, you should add a unique ID field before Dice is run. If you don’t already have a unique ID field, do this:
- Use Add Field to add a new field named UniqueID with the field type of LONG.
- Use Calculate Field to calculate UniqueID equal to OBJECTID.
The Check Geometry tool identifies possible geometry errors such as null coordinates, empty rings, and self intersections. Godzillas, because of their size, are error suspects. The Dice tool cannot check the geometry of the input features since the operation may fail on the Godzilla, so any errors in the input will be written to the output. If you haven’t run Check Geometry prior to running Dice, you should run Check Geometry on the output of Dice.
Geometry changes after running Dice
- If an individual connected component (a feature and any internal rings) exceeds the vertex limit it will be subdivided by using it’s envelope to split it into two equal parts based on its vertex count. This subdivision is repeated until the results of splitting parts are under the set vertex limit.
- Splitting of simple polygons is done along a polygon’s smallest diameter.
- For large very complex polygons the initial dice line is determined with a best guess algorithm which samples the data to get an idea of the density of vertices across the feature envelope.
- The total vertex count of all parts of a diced polygon feature in the output should never be less than the starting vertex count for the polygon prior to running the dice tool.
- If a part exceeds the vertex limit it will be subdivided into parts with roughly the same number of vertices. You can think of this as subdividing a single part into parts of (almost) equal vertices. Dice does not count up vertices from the beginning vertex until the vertex limit is reached to create a feature. There are too many cases where very short lines would be created.
- All splitting of lines occur at an existing vertex (which are duplicated to participate in two consecutive new parts).
- The total vertex count of all parts of the diced line feature in the output should never be less than the starting vertex count for the line prior to running the dice tool.
- If the multipoint exceeds the vertex limit it is subdivided by using the feature’s envelope to split it in two. This is repeated until no multipoint exceeds the vertex limit.
- An attempt is made to spatially group the points within a multipoint feature when splitting.
- The total vertex count of all parts of the diced multipoint feature in the output should always equal the starting vertex count for the multipoint feature prior to running the dice tool.
This post was contributed by Ken Hartling, a product engineer on the geoprocessing team