Last week's post featured Performing Analysis with ArcGIS Desktop, a new instructor-led ESRI training course that teaches a five-step process for GIS analysis. Today's post shows how to apply that process using ArcGIS Desktop software and datasets from ESRI Data & Maps.

Suppose you want to analyze access to health care services in Riverside and San Bernardino counties in southern California.

Locator map of study area

The five steps in the analysis process are:

  1. Frame the question
  2. Explore and prepare data
  3. Choose analysis methods and tools
  4. Perform the analysis
  5. Examine and refine results

Step 1. Frame the Question

This step seems straightforward because typically you're assigned a project to obtain specific information. Some projects involve answering several questions derived from a high-level question. How you frame the questions helps determine which GIS tools and methods you use for the analysis.

In this example, you might frame a preliminary high-level question: Is the distribution of health care facilities consistent with the population distribution in Riverside-San Bernardino, CA? This question could be broken down into the following sub-questions:

  • Where are facilities that provide health care services located?
  • What is the population distribution within the study area?
  • Do areas with the highest population density have the greatest number of facilities?
  • Within the study area, are there areas with high population density but no health care facilities? 

Step 2. Explore and Prepare Data

This step can be the most time-consuming. If you don't have all the data needed for an analysis project, the ESRI Data & Maps DVDs that come with ArcGIS Desktop are an excellent source of high-quality spatial data. Be aware that this data is intended for internal use and some of the datasets have restrictions on commercial use and distribution. Make sure you review the ESRI Data and Maps > Getting Started with ESRI Data and Maps > Redistribution rights topic in the ArcGIS Desktop Help before sharing the data with others.

For this example, data representing health care facilities and California counties and population data are needed. Depending on how fine-grained the analysis needs to be, ZIP Codes or other levels of geography such as census tracts or census blocks could be used to map population distribution in the study area.

Step 2a: Explore Data 

You explore data using ArcCatalog. For each dataset, preview the features, attributes, and metadata to determine whether the data will be useful for your analysis and what kind of preparation, if any, may be required. Questions to ask about the data include:

  • What is the data format? 
  • When was the data collected (how current is it)? 
  • How detailed is the data—at what scale was it collected?
  • What coordinate system does the data use? Is the data projected?
    • Best practice is to project all datasets into a common coordinate system before doing analysis.
  • Does the feature geometry (i.e., point, line, polygon) work for the analysis?
  • Does the data have the attributes you need?
  • Does the data have any access or use constraints?

For this example, the following StreetMap datasets were selected:

  • hospitals (ghospitl) — this data includes "traditional" hospitals as well as other medical facilities.
  • ZIP Codes (zip_poly) — this data includes population attributes.
  • counties (dtl_cnty) 
  • states (dtl_st)

Step 2b: Prepare Data

To start, you need to decide what data format to use. Project data doesn't have to be all in the same format, but it can make things easier. The important thing is to verify that the analysis tools you need accept your data format; also consider whether you will be distributing the data created by the analysis. You can use the geoprocessing tools in the ArcToolbox Conversion Tools toolbox to quickly convert data to another format. If you have access to the ArcGIS Data Interoperability extension, you can directly work with many spatial formats.

  • The ESRI Data & Maps datasets are stored in SDC (Smart Data Compression) format. For convenience, all the datasets for this project were exported to shapefiles and saved to a local disk. 

Organizing data into a project folder can simplify analysis tasks (you can specify a default input workspace) and facilitate sharing your work with others.

  • For this project, a file folder was created to organize the shapefiles.

If you are working with geodatabase feature classes, you could copy or import them into a single file-based geodatabase. You might also want to create separate folders or geodatabases to store intermediate (temporary) data output from analysis operations as well as final data.

Extracting data to have the same extent as the study area helps speed up processing time and enhances data visualization in ArcMap. In this example, the project datasets cover the entire U.S. 

  • Clipping the hospitals and ZIP Codes to the extent of the two counties will be part of data preparation.

In order to clip the data, you can create a selection layer of just Riverside and San Bernardino counties. You can export the selection layer to its own shapefile or geodatabase feature class, but you don't have to. Selection layers are saved as part of the map document (.MXD).

So here's how the data preparation tasks flow for this project:

  • Start ArcMap, add the project data, and zoom to the study area. 
  • Using the Select Features tool, select Riverside and San Bernardino counties.
  • Right-click the Counties layer and choose Selection > Create Layer From Selected Features.
  • The selection layer is added to the Table of Contents. Drag it below the ZIP Codes layer and rename and symbolize as desired.
  • Unselect the two features in the Counties layer.

Next, clip the hospitals and ZIP Codes.

  • Display the ArcToolbox window and expand the Analysis Tools > Extract toolset. Click to enlarge
  • Double-click the Clip tool to open its dialog box.
  • For Input Features, drag the U.S. Hospitals layer into the text box.
  • For Clip Features, drag Riverside-San Bernardino (the selection layer) into the text box.
  • For Output Feature Class, browse to the project folder, enter a name, then click Save. 
  • Click OK to run the tool. 
    • When the clip operation completes, a layer representing hospitals within the study area is added to the Table of Contents.
  • Change the symbol as desired and turn off the U.S. Hospitals layer.
  • Repeat the steps to clip the U.S. ZIP Codes layer.

Step 3. Choose Analysis Methods and Tools

To choose the appropriate methods and tools for an analysis project, consider the questions framed in Step 1 and document the methods and tools that will answer each one. 

Question Methods and Tools
Where are facilities that provide health care services located? Examine distribution of hospitals on the map.
What is the population distribution within the study area? Symbolize ZIP Codes layer based on population density using graduated colors.
Do areas with the highest population density have the greatest number of facilities? First, do a visual analysis of the map to get a general idea, then do a spatial join operation between the Hospitals and ZIP Codes. The output of the spatial join will be one record for each hospital and the ZIP Code attributes.
Within the study area, are there areas with high population but no health care facilities? Summarize the ZIP field in the table output from the spatial join. The summary table will include a count of hospitals in each ZIP code that contains a hospital, plus population data for each ZIP Code. 

It's very helpful at this step to diagram the analysis. The diagram doesn't have to be anything fancy (although it can be if you like that sort of thing). An easy thing is to quickly draw on paper or a whiteboard like the example below. 

Workflow diagram example

Step 4. Perform the Analysis

If you've diagrammed the process in step 3, then in this step, you simply follow the diagram, completing each task in sequence. For complicated analyses, you may want to create a model in ModelBuilder to automate the process. A model also allows you to quickly change a parameter and run the model again to explore different scenarios. Map showing distribution of hospitals and population density

  • Examine the distribution of the hospital features on the map. Zoom and pan as needed.
  • Symbolize ZIP Codes with graduated colors based on the POP07_SQMI (2007 population density) attribute.

A visual analysis of the data shows the greatest number of hospitals and the most densely populated ZIP Codes are in the southwestern part of the study area.

You can get more information by performing a spatial join between the Hospitals and ZIP Codes layers.

  • Right-click Hospitals and click Joins and Relates > Join.
  • In the dialog box, choose to join data from another layer based on spatial location.
  • Choose ZIP Codes in the drop-down list of layers, specify the output feature class name and location, and click OK.

The output of the spatial join is a new point layer that contains all the hospital features plus the attributes of the ZIP Code each facility falls within. The ZIP field contains the five-digit ZIP Code in which the hospital is located, and the PO_NAME field contains the post office name (corresponds to the city name) for that ZIP Code. The POP07_SQMI field shows the population density associated with each hospital's ZIP Code.Click to enlarge

Sorting the PO_NAME field reveals that multiple hospitals are located in some ZIP Codes.

The last step is to summarize the ZIP field. This operation will output a table that contains one record for each ZIP Code that contains a hospital, plus a field containing the count of hospitals within each ZIP Code. You can also choose to output statistics for numeric fields (such as POP07_SQMI).

  • In the joined table, right-click the ZIP field and choose Summarize.
  • For summary statistics, check First and Last for NAME (this is the hospital name) and check Average for both POP2007 (total population) and POP07_SQMI.
  • Specify an output location and name, then click OK.
  • Choose to add the result table to the map and open it.

Step 5. Examine and Refine Results

So what information does the summary table provide? Click to enlarge

The Count_ZIP field tells you the number of hospitals in each ZIP Code that contains a hospital. Sorting the POP07_SQMI field reveals that the most densely populated ZIP Codes in general are the ones that have multiple hospitals. Of the 23 ZIP Codes that have less than 2,000 people per square mile, only three have more than one health care facility. All the ZIP Codes that have more than 2,000 people per square mile have at least one health care facility.

The analysis shows that the distribution of health care services is consistent with the distribution of the population within the study area—that is the most facilities are located where the population is most dense. You could refine this analysis by considering the number of patients each facility can serve and other variables of interest. You could also extend the project to analyze whether access to health care services in the low-population areas is adequate. The current map indicates that residents of ZIP Codes with a low population density may have to travel a great distance to reach a hospital.