In my previous blog, I discussed how Data Reviewer’s Metadata check can be configured to validate object metadata based on a formally published metadata standard such as the Federal Geographic Data Committee’s (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) or the ISO-19115/19139 international metadata standard. In the second part of the metadata series, I’d like to discuss how the Metadata check can be configured to validate metadata content based on an organization’s unique content guidelines.
The need for validating metadata element content
For many organizations, simply validating their metadata against a standard is not always enough to ensure that data publishing requirements are met. These data publishing requirements will vary based on the organization, intended use and audience of the data being documented, and may change through time as new requirements are identified. Validating metadata is particularly important for content that are defined as “free text” by a given metadata standard. Free text-based elements cannot be validated in the same way as numbers, dates or other elements which have a defined list of valid values. Free-text elements, see screenshot below, are common throughout each standard and include such things as a dataset’s title, abstract, point of contact name, contact address, etc.
Due to the unstructured nature of free-text elements, organizations often have to provide additional guidance as to what information should be placed into these elements. This could include defining element values which are common across the organization as well as those specific to a team or business unit. A best practice implemented by many organizations is to deploy metadata templates containing standardized content as a starting point for new collections. However, this practice does not help in cases when metadata already exists, is obtained from outside sources, or when created using non-ArcGIS workflows and tools.
Metadata element validation types
The Data Reviewer Metadata check enables you to configure data validation rules for metadata elements which are deemed important to the data publication workflow. These rules fall into two categories: Pre-Defined and Custom Expressions.
Pre-Defined Expressions include a series of validation rules for detecting common errors which can occur with free-text based elements. This includes formatting rules for email addresses, dates (ANSI/ISO), U.S. ZIP Codes and U.S. phone numbers. Additionally, there is a pre-defined rule to check whether an element is missing or empty.
Custom Expressions are validation rules implemented using XML Path Language (XPATH). XPATH is a query language which enables you to select, compare, and compute element values stored in XML (the data format used for ArcGIS item metadata). Using XPATH, users can define custom expressions to evaluate metadata element values either individually or against other metadata elements. Once created, these custom expressions can be saved for reuse or shared with others in your organization.
Configuring the Metadata check
In this example, I’ll be configuring the metadata check to validate a number of element values which were identified as important to the data publication process. These element rules include:
- Data Set Publication Date element exists and is formatted properly based on the metadata standard’s date format
- Metadata Contact Person Name element exists and equals “Esri Content Team”
- Contact Person’s Phone Number element exists and is formatted properly based on US telephone standards
- Contact Person’s Email Address element exists and is formatted properly based on internet standards
- Contact Person’s Postal Code element is formatted properly based on US ZIP code standards
In the below screenshot, I configured the Metadata check to validate a number of objects (Metadata Sources parameter) stored in my Geodatabase workspace (Workspace parameter) based on the FGDC CSDGM metadata standard (Metadata Standard parameter). To configure element validation, I have checked the Validate Metadata Content check box.
Upon clicking Configure… the Metadata Content Validation dialog (screenshot below) appears and is used to author element value business rules. In this example, each of the above referenced metadata rules have been configured using a combination of pre-defined and custom expressions.
Note: Each metadata element is identified by its “short” name which is derived from the metadata standard upon which the check has been configured (Metadata Standard parameter). The short element names may not match what you may be accustomed to seeing in your organization’s metadata documentation or what is displayed in the Metadata Editor dialog.
Evaluating check results
Configuring the element value validation option in the Metadata check is more time consuming compared to the schema-only validation option. However, the results are much easier to interpret since you have greater control over which elements are being evaluated.
In the highlighted examples shown in the above screenshot, a metadata error record’s REVIEWSTATUS attribute is used to document what form of element validation was used in detecting the error. The attribute also stores the metadata element’s value which was found to be in error. This information can serve as a guide for those who are tasked with correcting the error using the Metadata Editor tool.
Tips and Tricks
As you can see from the examples discussed here, having a good understanding of your metadata requirements prior to configuring the Metadata check is crucial and it may be helpful to first document the metadata business rules using a document or spreadsheet as shown below. This can greatly speed configuration of the check and enable you to identify new validation requirements with stakeholders and other interested parties.
When configuring the Metadata check, it’ll be necessary to identify the metadata elements by the element’s “short” name. The best source for this information can be found in the documentation for the metadata standard. For users of the FGDC’s CSDGM, the standard is available for free from their website and can be used to understand the standard’s naming conventions. Other metadata standards such as ISO-19115/19139 and the North American Profile are available for purchase from their respective standards authority.
Another way to identify element “short” names is to use a template metadata document to help identify elements requiring validation. As depicted in below screenshot, the steps for this workflow include the following:
- Create a metadata template using the ArcGIS Metadata Editor to populate those elements which require validation. Entering searchable keywords into the elements will make it easier to find them later in the XML output.
- Export the template metadata to an XML file using the Esri Metadata Translator tool.
- Open the resulting XML document with a viewer such as Altova’s XMLSpy, Microsoft’s XML Notepad or Mozilla Firefox with the Firebug extension.
Depending on the XML viewer you chose, search the XML document for the element values and note the XPATH name of the metadata element.
To summarize, Data Reviewer’s Metadata check enables you to automate the validation of item metadata created to document your spatial data holdings. Element content rules can be particularly helpful when there is a need for content consistency across your organization, especially in cases when metadata elements are based on free-text data types which often cannot be validated using a metadata standard’s schema. Finally, using the Metadata check you can validate multiple sources of metadata against organization-specific element value rules using either pre-defined or custom expressions.
Content contributed by Jay Cary