One of the cool things about working at Esri is the access to sophisticated GIS products that help us make decisions. Recently, we used Esri Business Analyst Desktop to analyze student travel patterns—that is, from where do students travel to attend an Esri instructor-led training class? We want to understand student travel patterns to make sure we schedule classes appropriately and meet customer training needs.
The goal of the project was to answer two questions:
- Do the majority of U.S. students travel to the Esri training site closest to them?
- Are there areas in the U.S. that are underserved with regard to access to an Esri training site?
What better tool than GIS to answer these questions?
Paige Hayes, a project manager with Esri Training Services, performed the analysis. Below, Paige describes the methodology she used, the analysis results she obtained, and the decisions that were made based on the analysis results.
This project had three main parts:
- Preparing data for analysis.
- Geocoding student data.
- Applying ArcGIS Business Analyst tools to explore students’ use of training sites.
This post focuses on the geocoding process and the Business Analyst tools for creating desire lines (to show where students are going for training versus where they might be expected to go) and vector grids to examine the density of the student population compared to Esri training site locations and Core Based Statistical Areas (CBSAs).
Preparing the Data
The focus for this study was U.S.-based students who attended an Esri instructor-led training class during the 2008 calendar year. The raw student data included records for:
- All Esri training customers (domestic and international) who attended an instructor-led class at one of three locations (Esri learning center, client site, or in a private class)
- Esri employees who attended a training class
- Customers who attended an instructor-led Online Classroom course
All Esri employees, international customers, and customers who attended an Online Classroom course were excluded from this analysis. Domestic customers who attended a class at a client site or a private class were also excluded. After filtering out the excluded data, the remaining data was exported to a nonspatial geodatabase table.
- Note: The raw data contained a record for each student for each class taken (i.e., if the same student took three classes during the year, there were three separate records in the table). I decided to maintain that information and called the table ”student instances.” This way, the travel patterns for each class would be accounted for in the desire line analysis.
Putting Points on the Map
After subsetting the data to the focus group, it was time to geocode the data and visualize student travel patterns in ArcMap. For this national-scale analysis, we decided that geocoding to the ZIP Code level would provide the desired granularity.
Examining the ZIP Code data uncovered an interesting phenomenon with our original database—there were a great number of four-digit ZIP Codes in the data.
After the ZIP Code issue was fixed, the data was ready for analysis. Now for the fun part of this project!
I started Business Analyst and added my student instances table to ArcMap, then I created a study area for the United States.
Business Analyst uses a customer/store paradigm. This can be changed to reflect your organization. For example, for this project I set my Business Analyst preferences to refer to customers as “students” and stores as “sites.” I also chose to store analysis results in a file geodatabase feature class.
To geocode our students and identify which training site they used (for the subsequent analysis), I opened the Business Analyst Student Setup wizard. In the wizard, I chose to:
Create a new dataset from tabular data that was already loaded in the ArcMap document.
Use the Customer_City, Customer_State, and Customer_Zipcode fields to geocode the students.
Use the Attendee field (which stores an ID number for each student) as the unique ID for each student record in the new feature class.
Use the Location_City field to identify where each student took their class.
Of the thousands of records run through the geocoding service, 99% were successfully matched on the first try. It turned out that some of the unmatched records were military ZIP Codes. Because students with military ZIP Codes could be coming to class from anywhere in the world, those records were discarded. The remaining unmatched records simply did not have enough information to match. In the end, less than 1% of all the records were unmatched. Not bad.
Analysis, Part 1: Desire Lines
For my first analysis, I wanted to see what a desire line map (spider diagram) would look like if all students had gone to the closest Esri training site. I would then compare this pattern to a desire line map showing where students really went for training.
Recall that when setting up the student data, I specified the Location_City field, which identified where each student attended a training class. I also had a feature class of Esri training site locations that identified the city, state, and ZIP Code of each site. These sites were further identified as either Esri regional offices, off-site contracted locations, or satellite offices. This data was my site data (typically called “store” data in Business Analyst).
To derive desire lines, I ran the Business Analyst Desire Lines tool using the City field in the site data as the unique identifier for each site, told the tool to consider all sites, and assigned sites to students using the closest site. I also chose for distance to be reported in the output feature class as straight-line distance.
The result was a feature class of lines from each student to their closest training site. The map was nice and neat and showed that lines from students to Esri training sites don’t cross Thiessen polygon boundaries for each site. Having been an instructor myself, I knew this simply wasn’t the case. This map was not reality.
The next step was to identify students who didn’t use the closest available training site. I created an attribute query to select these students: [Closest_Site] <> [Location_City]. The result revealed that a little over half of students attended class at the closest available Esri training site, while the rest went farther.
The answer to our first question is Yes, a majority of U.S. students did take instructor-led classes at the Esri training site closest to them.
Now that I had the selected set of students who traveled farther for training, I could derive new desire lines to determine where these students were actually getting their Esri training.
I ran the Desire Lines tool again for just the selected students. This time, I assigned students to sites based on the Location_City field, which identified where each student actually took the class. To maintain consistency, I again chose to create a distance field in the output feature class that stored straight-line distance between the student origin and training destination.
This map reflects reality. Reality is often messy, isn’t it?
Some of the interesting results regarding students who didn’t use the closest training site are:
- They most often attended instructor-led training at Esri regional offices in Vienna, VA (just outside of Washington, D.C.); Redlands, CA; and San Antonio, TX.
- Students traveled the farthest to attend Introduction to the Multiuser Geodatabase (3,000 miles), Introduction to Programming ArcObjects Using the Microsoft .NET Framework (2,900 miles), and Managing Editing Workflows in a Multiuser Geodatabase (2,700 miles). Of course, if I could, I’d go to Hawaii for training, too.
Analysis, Part 2: Finding Hot Spots
The next piece we wanted to tackle was to identify areas that were potentially underserved. These areas would have a high density of students but no Esri training site within reasonable distance (75 miles). At this stage, we simply wanted an initial idea that would lead us to immediate further investigation if an area looked to be in dire need.
To accurately calculate student density, I needed to use a table that contained unique student records rather than student instances. To condense student instances down to unique students, I employed a free third-party Field Calculator (.CAL) script (EasyCalculate from ET SpatialTechniques).
The script required adding a long integer field (UniqueStudent) to the student instance table, then running the script against the Attendee field. The values in the UniqueStudent field were based on the occurrence of the value in the Attendee field.
For example, the first instance of an Attendee yielded the value 1, and the second instance of that same Attendee value yielded 2. After running the script, I selected all the 1 values and exported those records to a new geodatabase feature class. The new feature class stored one record for each student, exactly what I needed to determine student density.
Next, I used the Business Analyst Create Grids tool to create 100×100 mile grids over the continental U.S. The map below shows the vector grids (square polygons) covering the study area.
Because we wanted to find underserved areas, I removed grids that either contained an Esri regional office or whose boundaries were within 75 miles of a regional office. This allowed me to focus on students who did not have relatively easy access to a major training site.
Next, I performed a spatial join between the grids and the students. After the spatial join, each student acquired the unique ID of the grid they were contained in. I then summarized the student table based on the grid ID field to derive a count of students within each grid. The final step to pull this information together was to join the summary table to the grids. I now had a count of students per grid area.
The three most densely populated grids happened to contain an alternate training site—either a contracted off-site or an Esri satellite office. The next step, then, was to find the most densely populated grids that didn’t contain a training site of any kind. The map below shows the result. The grids outlined in blue have the highest student density of those located more than 100 miles from any Esri training site.
Analysis, Part 3: Comparing Grid Density to Urban Centers
The last phase of this project was to compare student population density against the largest CBSAs and Esri training site locations to see if there were any holes. From the Business Analyst datasets, I selected the 50 CBSAs that contained the most businesses. Queries helped determine that nearly 75% of our 2008 student population came from an area within 30 miles of one of the top 50 CBSAs, and that 40 of the top 50 CBSAs have an Esri training site within 30 miles.
The answer to our second question is No, there are no clearly underserved areas in the U.S. at this time.
We started this project looking for areas where we might better serve our customers. Exploring the data in different ways using ArcGIS Business Analyst tools led us to the conclusion that, at least based on our 2008 students, we have pretty good coverage. As a result of this study, it was decided that we would not add or remove any training sites in the near future. We will revisit this study periodically to look for trends and see if any patterns change over time.
Like many GIS analyses, our project findings resulted in new questions. For example, why did so many students travel farther than their closest Esri training site to attend a class? This is a new project we’re working on, and maybe we’ll share the results in a future post.
|Paige Hayes, a project manager with Esri Training Services, contributed this post. She works in the Esri Denver regional office, where close to 10% of U.S.-based students attended instructor-led training in 2008.|
Want to learn more about ArcGIS Business Analyst?
- Introduction to Esri Business Analyst Desktop teaches how to use many of the analysis tools and datasets included with the software, including those mentioned in this post.