I have recently "invented" a method for simplifying polygon map layers, which seems to give reasonable results. Probably many others have invented it before me, but I would like to present it in order to receive comments and advice on setting the appropriate parameters.

My task was to produce a national soil map suitable at 1:1,000,000 scale on the basis of a 1:200,000 map. The best method would probably be to have a geologist or soil scientist make a complete re-production for the new scale - but we needed a less expensive method. The challenge was to find a technique that would not totally erase soil types represented as many small polygons covering more than half of the area in some regions. A traditional Generalize or Eliminate would have this result.

The basic idea is to convert the polygon-layer to raster (ArcInfo license required), apply a majority neighborhood focal statistics to this raster, and then convert back to polygons. The degree of simplification is dependent on the cell size chosen for rasterization and the size of the neighborhood area. One advantage with this method is that the outline of the major polygons is retained in the simplified version.

A few more steps are needed to get proper results:

  1. If the polygon layer does not cover the whole analysis area - a typical example could be that the surrounding sea is not represented as a polygon - then you must initially make this "outside polygon" a part of the layer. If not, your polygons will grow into the sea.
  2. The majority statistics computation may sometimes result in two or more values with the exactly same number of cells in the neighborhood. In the cases the cell will get the nodata-value. I get rid of the nodata-values by using a Con-expression that uses value from the original raster in these cells.
  3. This step - and probably other things - may produce some very small polygons, that don't belong on a simplified map. I use Eliminate (in the version from ET Geowizards) to get rid of these. The area-limit in this elimination may be considered as a third parameter for the simplification degree (besides cell size and neighborhood size)

The whole process is packaged as a Python script that receives the simplification parameters as arguments.

If the original field used for symbolizing is non-numeric, the original content will be lost in the raster processing and replaced with a numeric value. But since the mapping between the numeric value and the original content is preserved in the VAT-table of the raster resulting from the initial polygon to raster process, it can be easily fetched back via a join on this table.

I have tried to experiment with different values for raster cell size and neighborhood size to get the optimal result for a given map scale. I think, however, that rules must exist for choosing the proper values. Can anyone give me a hint on this, or a link to publications on the subject?