I remember making this feature class…I wonder why?

By Charlie Frye, Esri Chief Cartographer

Have you ever wondered where a feature class came from as you’ve browsed over one of your geodatabases in ArcCatalog? I think most of us have, and probably more often than we’d like to admit. In the example shown here to the left, I made these datasets a few weeks ago, and I have no idea what “GN” means, and if or how I selected, simplified, or dissolved the data.

There are a couple of things we can do to avoid that puzzled feeling: standardize your feature class naming convention; standardize your geoprocessing.  With the naming convention, I started doing the right thing here, but failed to follow through and leave myself the necessary clues. The data in the image above was also the result of a complex workflow, so rather than start with that, let’s cover the basics.

Use Standard Feature Class Names Spring conference presentations
Second, while my example started off with a good name, I had not developed a standard convention for any of the processes that I ended up doing to my NHD_FlowlinesPlus dataset to produce my cryptically named datasets.  Before I explain what I should have done, I’ll share the standard naming conventions I use:

  • Scale: _24K, 100K, 250K, 1M, 2M: The first example means the data are either captured or generalized to be at a resolution appropriate for 1:24,000 scale maps.  The 1M means 1:1,000,000.  This is a relatively good convention; I say relatively, because the meaning is map product and data-production specific.  The product in this case is an on-screen map, and I use different data production methods from that which I would use for a printed map.
  • Mapping Purpose: _lab _sym, _master: This refers to how I use these data in a map. If it is for labeling only (_lab),  for symbology only (_sym), or a dataset that I use to derive cartographic data, (_master). The context I have found most useful to use these abbreviations is ArcGIS Server, my goal there is to optimize the data to the greatest extent possible to improve drawing performance.  I covered how to set that data up in a recent blog entry on tips for improving drawing performance.
  • Vintage: _03, _07, Mar_08, Jun_06, etc.: This is just a two-digit year so I can tell when data are captured. For imagery, it’s useful to add at least the month or even the capture date to the name as seasonality often carries significant meaning.

Making Feature Class - Figure 1
To the left is an example of some data that follow these naming conventions.  The way I used scale varies between what I described above, e.g., FlowLine18M_Lab to also include being able to indicate a range of scales, e.g., FlowLine147_36K.  The usage for my maps is indicated by either nothing, which means there is no specific use, e.g., Flowline147_36K; or _Lab versus _Sym, which means the data are only used for labeling or symbology.

Logging Geoprocessing Steps

Last, where I specifically went wrong in the example above was in trying to come up with a shorthand for an entire toolbox. By shorthand, I mean a three or four letter abbreviation for each tool I used to process my dataset. I tried to add each step on to the end of the name in the uppermost picture above.  It’s just not practical, the names get too long and there are too many tools.

The solution is to use metadata.  I prefer to use the FGDC metadata template for cartographic data because it has specific places to describe the purpose and nature of information in my dataset’s fields, and it has a specific section for the lineage, or processing steps used I used on my data.  Lineage and fields are key to well documented cartographic data.  However, that’s a lot of work to do, especially if you’re not even sure you are going to use the data (perhaps you’re just experimenting, and hoping to find a good method to process your data). Further, as anybody who has tried editing metadata in the FGDC editor can attest, it can be a tedious task.

Therefore, a good middle ground is to use the Abstract field. After a bit of head scratching I finally figured out the basics of what I did to produce my mystery feature class and wrote this in my metadata abstract field. So, rather than try to indicate each of the geoprocessing steps in my feature class’s name, I would recommend just using a _GP as a standard convention to indicate that if you want to know how it was made, check the metadata.

“Flowlines from NHDPlus.  GN means Good Names, which means the short unnamed segments in the midst of flows have been given names by a process that used Spatial Join in ArcGIS 9.3.  That process joined the long segments to the short segments and when two long segments joined to a short segment the name was copied to the unnamed short segment (when one or three features joined it was deemed unimportant as the short segment was either at the end of a flow or at the juncture of another flow, which meant these segments were not significant for labeling).  The SEL means these are selection from the NHD_FlowLinesPlus dataset, based on a minimum MAFLOWU field value. The Sim means these were simplified using Simplify Line with the Bend Simply method.  The Dis means these were dissolved by the Name and FlowClass attributes.”

That only took a couple of minutes… well if I had done it in the first place it would have only taken a couple of minutes. Either way, its a good practice to keep a log of what tools or models have been used to produce your data.

This entry was posted in Mapping and tagged , . Bookmark the permalink.

Leave a Reply

2 Comments

  1. gnoellea says:

    One of the ways I use to keep track of intermediate files is to keep a work log in a notebook (either a “real” notebook or a computer document) for each project. I write the date at the top of each page and then proceed to jot down the name of each file as I create it along with a few phrases concerning what the file is, how I am making it, or why I am making it.

    Every file name is underlined so I can easily scan a page for file names. The files are automatically in order by date and time created, thereby making it easy for me to visualize how I created a final file when I am looking back over the notes. Often that is enough to jog my memory but if I need more info on a file then it is right there in my notes.
    –Gretchen, PetersonGIS

  2. cfrye says:

    One habit I’ve developed in recent years when I’m trying to sort out the fastest or best way to geoprocess a dataset is to keep journal. I use Excel since the rows are already numbered, and I can use a column for the tool name, another describing why I used it.