Using Statistical Sampling with Positional Accuracy Assessment Tool

A common question when using Data Reviewer’s Positional Accuracy Assessment Tool (PAAT) is what sample size should be used when evaluating a geospatial data layer. Sometimes the sample size is mandated by a specification; but when it’s not, Data Reviewer’s Sampling check can be used to provide the sample size. In this blog, I’ll discuss how you can use the Sampling check to generate a statistically valid sample size and then explore two options for using it with the PAAT.

The steps include:

1. From the Data Reviewer toolbar, select the Select Data Check dropdown.

2. Expand the Advanced Checks category and select Sampling Check

3. In the Sampling Check Properties dialog, select Auto Calculate.


4. Under Auto Calculate, select your Confidence Level and Margin of Error.

Note: The question you’re looking to answer: given a population size (number of features), what sample size do I need so that I’m “X” percent confident the sample size is statistically significant within a “Y” percent margin of error?

Once the sample size is generated you have two options for using it with the PAAT. The first is to use the Browse Features dialog to zoom to the selected features in the sample set and then the PAAT to read feature locations. The second option is to use the Grid Properties dialog in the PAAT to produce a number of grid cells corresponding to the sample size and then read feature locations from within each grid cell. Both options are described in more detail below using a fairly common scenario – evaluating vectors based on a raster. In this case, I am evaluating roads based on imagery for the State of Louisiana.

Browse Features Option with PAAT

The steps below use the Browse Features dialog (which is activated upon running the previously configured Sampling check).

1. From the Data Reviewer toolbar, click the Data Reviewer dropdown, and select Positional Accuracy Assessment.

2. Set up the PAAT to use a single grid cell (1 x 1) that will encompass your entire image. The PAAT always produces a grid when evaluating vector features, but since you won’t be using the grid cells, the minimum number, 1, is sufficient. 

3. Turn the PAAT Auto-pan function off, since you won’t be using it either. 

4. Use the Zoom to Feature option in the Browse Features dialog (circled in red in the image below) to zoom to each feature.

5. Use the PAAT Digitize Points function to collect data (vector) locations and image (reference) locations. 

In this example, the sample size Data Reviewer generated was 62 of the total 713 features. As a result of using the tools from the PAAT, you can see we have a horizontal accuracy of about 477 meters CE90.

Grid Option with PAAT

Below are the steps for using the PAAT grid option.

1. Use the PAAT to set up a sampling grid that contains the same number of grid cells as the sample size generated by the Sampling check.  

2. Once you have the sample size you can dismiss the Browse Features dialog, as you will not be using it.

Note: In order to generate grid cells that intersect your data equal to the sample size for an irregularly shaped area, you may have to create a grid that contains more cells than the sample size. In the case of the State of Louisiana, which has an irregular (non-rectangular) shape, I created a 10 x 10 grid (100 cells) to get 62 cells (the calculated sample size) that actually intersects the data.

3. The PAAT will automatically zoom you to the next grid cell when a feature’s data and reference positions are collected.

The result of using the grid option with PAAT was a horizontal accuracy of approximately 487 meters CE90.

The difference in accuracy between the two options is only about 2 percent. This indicates the results are basically in agreement given the relatively low sample size of features over a large area. 

Data Reviewer’s Sampling check (using Auto Calculate) is a quick way to determine a statistically valid sample size for use with the Positional Accuracy Assessment Tool (PAAT). Neither option is necessarily more valid; it basically comes down to your product specifications for sampling and which option works better for you.

Content contributed by Pete Aniello

This entry was posted in Editing, Spatial Statistics and tagged , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply