Validating Strings Using the Regular Expression Check

Many of the benefits that your GIS provides for decision-making is due to its ability to leverage non-spatial information in a geographic context. This non-spatial information, stored as a series of attributes, provide rich and descriptive characteristics to your features. Such attributes are commonly stored as string values because of the flexibility they provide – they can be letters, numerical digits, punctuation or other special characters. Unfortunately, with this flexibility also comes the increased risk of introducing errors in your database especially in cases when the attribute value is not constrained by a domain. In order to effectively leverage this non-spatial information it must be free of errors and consistent in content.  In this blog I will highlight how Data Reviewer’s Regular Expression check can be used to validate data which must comply with standards or organizational-based business rules. Let’s use a common example to illustrate this.

Tax Parcel Example:  Validating Parcel Identification Number (PIN)

Validating values stored in the Parcel Identification Number (PIN) attribute of a parcel feature is important to ensure consistency in data and map production workflows.  If you are using the Esri Local Government information model, the PIN attribute must consist of only numbers (which are then formatted using labeling expressions for visualization and map production). The Regular Expression check can help validate this. On the check dialog:

  1. Add a Check Title:  Parcel Identification Number contains invalid characters
  2. From the Feature Class group, select the feature class on which you would like to run the check: ParcelFabric_Parcels – LocalGovernment.gdb.
  3. From the Where Clause option, add a SQL definition query to ensure that only Tax Parcel features are evaluated by the check:  (SystemEndDate IS NULL) AND (Type = 7)
  4. In the Regular Expression Editor create an expression for the NAME field that identifies string values which contain invalid characters:  [0-9]+
  5. Enter Reviewer Remarks by providing a note:  Parcel Identification Number contains non numeric characters.
  6. Choose a Severity to rate the importance of errors of this type: 1
Additional Examples

Here are some other common data patterns found in string fields that can be validated using the Regular Expression check:

Note: In the last example, use the Unique ID check to validate that all Primary Key attribute values are unique across your database.

As you can see from the examples, Regular Expressions are a powerful technology for identifying pattern-related errors in data.  Due to its origins in UNIX and Unix-like utilities (such as vi and lex), most software developers find it to be a flexible method for validating string values in programming software applications. By leveraging Data Reviewer’s Regular Expression check you too can integrate this powerful capability into your quality control process.

Content contributed by Jay Cary

This entry was posted in Editing, Local Government and tagged . Bookmark the permalink.

Leave a Reply

One Comment

  1. riverside says:

    Good Stuff, Great Post.