New in Geostatistical Analyst 10.1 : Empirical Bayesian Kriging

Those of you familiar with kriging interpolation know that it is not always the easiest technique to implement successfully.  For a long time we’ve wanted to make a geoprocessing tool that can automate kriging, but the problem has always been in the complexity of calculating good default parameters.  At 10.1, through a combination of subsetting and simulations, we have a solution to the problem with a method called empirical Bayesian kriging (EBK).  The method is available in the Geostatistical Wizard and as a geoprocessing tool in the Geostatistical Analyst toolbox.

EBK works by building local models on subsets of the data, which are then combined together to create the final surface.  Because the interpolation model is built automatically, the method requires very few parameters.  There are also some optional parameters that give you some control over how locally the models will be built and how they will be combined together.

Why should I use EBK?

  • Simplicity – To get accurate results, all you need to do is specify the field you want to interpolate.  Other kriging methods require you to build the model step-by-step to be confident that the results are statistically accurate.
  • Automation – Because EBK is available as a geoprocessing tool, you can use it in Model Builder and in Python scripts.
  • Capture small-scale effects – Using local models allows EBK to capture small-scale effects that global kriging models may miss.

This post was contributed by Eric Krause, a product engineer on the analysis and geoprocessing team.

This entry was posted in Analysis & Geoprocessing and tagged , , . Bookmark the permalink.

Leave a Reply

One Comment

  1. ztvavra_umn says:

    The other day a student asked me two questions about EBK which I could not fully answer and I was hoping you would be able to help.

    1. After a semivariogram is estimated from the data in the subset, the second stage in semivariogram estimation states “Using this semivariogram as a model, new data is unconditionally simulated at each of the input locations in the subset.” What is this “new data”? Where does the “new data” that is being used come from?

    2. I still do not fully understand how the EBK prediction is calculated – specifically how the semivarigram distributions influence the prediction, and do the highlighted neighbors within a neighborhood also influence the prediction? I’m guessing these two variables (neighbors and semivariograms) work together, but it’s not fully clear to me how.

    I’ve read ESRI’s EBK semivariogram estimation multiple times, and I can conceptualize this part : “For each prediction location, the prediction is calculated using a new semivariogram distribution that is generated by a likelihood-based sampling of individual semivariograms from the semivariogram spectrums in the point’s neighborhood. For example, if a prediction location has neighbors in three different subsets (as specified by the searching neighborhood), the prediction will be calculated using some simulated semivariograms from each of the three subsets; these semivariograms are chosen probabilistically based on their likelihood values.”

    My interpretation of this statement is the role of the neighbors ( the active neighbors – highlighted in black) are used to identify the subsets (and the semivariograms within each subset) which are used to calculate a new semivariogram distribution, and it is the semivariogram distribution that is used to calculate the predicted value. Is this correct? Do the values of the active neighbors have any influence on the prediction?