Are you sure Intersect is the right tool for the job?

Are you sure Intersect is the right tool for the job?

I often talk with people using ArcGIS Geoprocessing who find themselves surprised about the amount of time the Intersect tool takes to run, or confused about the output results.  Much of the time this confusion comes from a misperception about what tool to use for the analysis, or from a lack of understanding of what the Intersect tool does.

Here’s an example… A user contacted me about the Intersect tool because the tool would run for hours and then fail. They perceived their data as small and really thought Intersect would only take a few minutes.

My first question, and one you should ask too is, “What do you really want for a result?”

The answer in this, and many cases was “I want to know how much of each feature in input 1 is in each feature in input 2.”

Cool!  That means Intersect is not the right tool to use!

Intersect does not compare each feature in the first input to each feature in the second input.  It works in a much more complex, thorough way. The intersect tool is for creating a topologically correct fabric across the entire extent of the dataset and return those newly created features that were created due to overlap from a feature in each input.  The Intersect tool throws all the features from all the inputs into one bucket, and passes this to the underlying topology engine so it can create this topological fabric. This means all relationships, between all features, including those relationships within each input are determined and new features are created for every interaction. The Intersect tool then selects just those features from this topologically correct fabric.

So, for the Intersect tool if you have 2 inputs the output will only contain those features that were created where a feature from each of the 2 inputs overlapped.  If you have 100 inputs the output will only contain those features that were created where a feature from each of the 100 inputs overlapped.

Here is what looks like a very simple example of intersecting one polygon layer with 10 overlapping polygons with a single polygon from a second feature class (features are labeled with their OID):

Input 1 (10 input polys that overlap)

First input feature class

Input2 (One polygon)

Intersect output will contain new features created from the intersection of the overlapping polygons in Input1 and their intersection with the polygon in Input2:

As you can see, there are now a lot more polygons in the output (167!) than there were in the inputs combined.  Imagine if Input1 was actually complex:

Ouch!

Using the output of Intersect you now have to perform further analysis to put together polygons that were broken apart in Input1 due to their intersections with one another. Seems like a lot of work to figure out how much of the polygon with OID 9 in Input1 intersects with the polygon in Input2 right?

So what should they, and you, be doing if you want to know how much of each input feature intersects with each feature in the second input?  Here are two possible workflows, one good, and one much better, that will give you exactly what you want, and may get you the results of your analysis much faster.

Good… Clip and Transfer Attributes workflow

This is a simple approach that uses existing tools and can be setup by anyone familiar with scripting. The only ‘advanced’ part, is the use of a cursor, but it should be fairly easy to understand.

import arcpy, os, sys, time

arcpy.env.workspace = os.getcwd()

arcpy.MakeFeatureLayer_management(r"Data.gdb\input1","input1")
arcpy.MakeFeatureLayer_management(r"Data.gdb\input2","input2")

arcpy.env.overwriteOutput = True

arcpy.CreateFileGDB_management(os.getcwd(),"output.gdb")

try:
    #For each input2 feature in the poly feature class, select it, select the input1 features that intersect it, then clip it with the selected input2 feature
    input2Count = int(arcpy.GetCount_management('input2')[0])
    progressCounter = 1
    startTime = time.time()
    with arcpy.da.SearchCursor("input2", "ID") as cursor:
        for row in cursor:
            print "\n\ninput2 # " + str(progressCounter) + " of " + str(input2Count)
            print "Processing ID = " + str(row[0])
            arcpy.SelectLayerByAttribute_management('input2', "", "ID = " + str(row[0]))

            print "Selecting the input1 features..."
            arcpy.SelectLayerByLocation_management("input1", "INTERSECT", "input2", "", "")

            print "Clipping the selected input1 features with the selected input2 feature..."
            arcpy.Clip_analysis("input1", "input2", r"output.gdb/input1InID" + str(row[0]), "0.001")

            print "Add identification field..."
            arcpy.AddField_management(r"output.gdb/input1InID" + str(row[0]), "ID", "LONG")

            print "Assign input2 ID field value to the output..."
            arcpy.CalculateField_management(r"output.gdb/input1InID" + str(row[0]), "ID", str(row[0]), "PYTHON_9.3")

            progressCounter +=1

    print "Merge all the outputs..."
    arcpy.env.workspace = os.path.join(os.getcwd(), r"output.gdb")
    fcList = arcpy.ListFeatureClasses()
    arcpy.Merge_management(fcList, r"output.gdb/mergedFC")
    stopTime = time.time()
    print "Total time in seconds = " + str(int(stopTime-startTime)) + " and in minutes = " + str(int(stopTime-startTime)/60)
    print "DONE"

except Exception:
    print "FAILED"
    print arcpy.GetMessages()

You add code to transfer as many of the attributes from the second input as you want. If you want this method to go a little faster, you could play with sending the intermediate data to in_memory featureclasses.  Trouble is, you’ll have to manage how many outputs you send to in_memory if your data is very large.  You may use up all your memory storing the intermediate data and cause the workflow to fail.

There are other possible problems with this workflow. The main ones are SelectLayerByLocation ‘INTERSECT’ doesn’t use the exact same rules to select features as Intersect does (see here) and creating and managing all that intermediate data is painful and slow.

So, here’s a much better way to do this…

Better… Intersect geometry method

We can get rid of the creation and maintenance of all that intermediate data, use a method that mimics the Intersect tool, and make this all run tons faster on larger, more complex data.

Let’s use a geometry method instead!

Here is a simplified code snippet explaining the core basics of the workflow:

inCursor = arcpy.da.InsertCursor(r"output.gdb/IntersectResult", fldsOutput)
with arcpy.da.SearchCursor("input1", fldsInput1) as cursor:
    for row in cursor: # For each feature in input1
        # for each feature, select the features in input2 that it intersects
        arcpy.SelectLayerByLocation_management("input2", "INTERSECT", row[-1])
        with arcpy.da.SearchCursor("input2", fldsInput2) as cursor2:
            for row2 in cursor2: # for each selected feature from input2
                # intersect it with the feature from input1
                clippedFeature = row2[-1].intersect(row[-1], 4)
                # setup the insert cursor (for the input you previously created)
                flds2Insert[-1] = clippedFeature
                inCursor.insertRow(flds2Insert) # Insert the resulting geometry from using INTERSECT geometry method

For a fully functional (I hope!) script tool and script that you can try out, download the Pairwise Intersect Tool here.

Taking a look at the same example I used above except this time I use the PairWiseIntersect tool:

That’s more like it!

In testing we’ve done, the “Better” method is multitudes faster with large and/or complex data and it also gets through cases the Intersect Overlay tool appears to have trouble with (and for the most part these ‘trouble’ cases do not make sense to process using the Intersect tool).

Many thanks to Jason Pardy for reviewing the code and blog contents.

This entry was posted in Analysis & Geoprocessing, Python and tagged , , , , . Bookmark the permalink.

Leave a Reply

20 Comments

  1. harrybowman says:

    Why is the insert cursor created fesh for each feature?

  2. dan_patterson says:

    Good examples!

  3. spardo2 says:

    This is a great post and I’m exactly the person that could utilize these alternative scripts. I’ve been using intersect and then exporting the results to excel sheets and using pivot tables and other work arounds to account for the partitioning of all the overlapping features. It’s annoying and limits my ability to geoprocess further in ArcMap.

    However, I’m a scripting novice and have been thrown by the script that I downloaded from your link above to the Pairwise Intersect Tool.

    First, why is there a toolbox without anything in it? Am I missing something?

    Second, I was expecting to see sys.argv objects at the top of the script so that I could load it into a toolbox and choose the required arguments — Input Feature, Intersecting Feature, Output Featureclass and finally a parameter for Field mapping Input features to be appended to the Output Featureclass.

    A bit of guidance on where and/or how to plug in the Input and Intersecting Features would be very very much appreciated!

    One more thing, why is there no built-in ArcMap tool to for what is a popular and somewhat basic processing task?

    Thanks for whatever further assistance you can provide.

    • KenH says:

      Thanks for your comments spardo2.

      It was an oversight on my part to not have noted that the script tool is for 10.1 and higher releases. The intersect geometry method was introduced at 10.1 (also, the tool not showing up in the toolbox is probably because you are using a release prior to 10.1).
      Thanks for pointing that out. I’ve updated the tool download page stating the script tool is for 10.1 and higher.

      For running the script standalone, usage is stated at the top of the script. arcpy.GetParameterAsText() is used to get parameters from the command line rather than sys.argv().

      If the workflow in the script tool proves useful to many of you, we will consider it for inclusion in the product as a system tool in a future release. For those reading this comment, you can leave a positive comment here if you feel strongly about including it in a future release or you can add an item to our Ideas site and vote for it there.

      Thanks again!
      Ken

  4. tui says:

    Perfect timing for our parcel-based analysis (around 200,000.) We changed the merge to append in the “Good” method, but couldn’t find a way to clear the intermediate data within the search cursor. We ended up implementing the “Better” method entirely in SQL using STGeometry, which gave us another jump in performance.

  5. renangrace says:

    Thanks for posting this, this is a great tool. However I’m having trouble getting it to work. in line 124 (clippedFeature = row2[-1].intersect(row[-1], dimension) it is going to arcobjects intersect and returning the following error: “ValueError: “. I haven’t changed anything in the script and my two polygon layers are both in the same coordinate system as each other and the data frame (NZTM). Any ideas why this error is occurring?

    Thanks.

    • renangrace says:

      that error should say ValueError: geoprocessing describe geometry object object at 0xB1A4D120.

      • KenH says:

        Can you double check that the spatial reference is exactly the same between the two inputs? You can use the FeatureCompare tool with compare_type = SPATIAL_REFERENCE_ONLY.
        So far the only way I’ve been able to see this error is by using inputs with different Spatial References.
        Thanks,
        Ken

  6. charlesfox123 says:

    Hi Ken,

    Would there be a 10.0 version to this tool? If not, could you provide the input parameters and daty types, etc. My E-mail:
    charlesbanks@polk-county.net

    • KenH says:

      Hi Charles,

      Please see my earlier reply to this question:

      “The intersect geometry method was introduced at 10.1 (also, the tool not showing up in the toolbox is probably because you are using a release prior to 10.1).
      Thanks for pointing that out. I’ve updated the tool download page stating the script tool is for 10.1 and higher.”

      Ken

  7. kvolleberg says:

    Hi there,
    Interesting blog. I found it while browsing with the same question. I have a large dataset (actually, I use a ‘small’ subset of 60000 near-rectangular polygons), and an equally large polygon dataset (spatial extent) with a more complex (but singlepart) geometry.

    Triggered by your conclusion that the Better method wou;d be multitudes faster than the Intersect tool, I tried it myself.

    My conclusions contradict yours completely.
    The Intersect tool only needs 20secs to intersect my 60000 rectangles with my ‘complex’ geometry, where the Better method runs a whole lot longer (actually it runs while I am writing this). The first 500 are finished now: took 20secs only to get those 500.
    Output is correct, but it takes 120 times longer to do it with the ‘Better’ method.

    Or am I missing something? Here is my code:

    inCursor = arcpy.da.InsertCursor(ruwfm, ["Shape@","RUWHEIDSCODE","CELL_NUMBER"])

    cnt = 0

    with arcpy.da.SearchCursor(fm,["Shape@","CELL_NUMBER"]) as firstCursor:
    for firstRow in firstCursor: # For each feature in input1
    #counter and progress printing
    cnt+=1
    if cnt%500 == 0:
    print (dt.datetime.now() – t1)
    #print(“{0} features processed”.format(str(cnt)))

    inputFeat = firstRow[0]
    # for each feature, select the features in input2 that it intersects
    arcpy.SelectLayerByLocation_management(rvl, “INTERSECT”, inputFeat)
    with arcpy.da.SearchCursor(rvl, ["Shape@","RUWHEIDSCODE"]) as intersectCursor:
    for intersectRow in intersectCursor: # for each selected feature from input2
    # intersect it with the feature from input1
    intersectFeat = intersectRow[0]

    # dimension=4 ==> polygon geometry (where 1 = point, 2 = polyline)
    clippedFeature = intersectFeat.intersect(inputFeat, dimension=4)

    # setup the insert cursor (for the input you previously created)
    inCursor.insertRow([clippedFeature,intersectRow[1],firstRow[1]]) # Insert the resulting geometry from using INTERSECT geometry method

    • kvolleberg says:

      Sorry for the crappy formatted code.

    • KenH says:

      It may be that your case isn’t that complex or large as far as the intersection between feature class geometry goes. Can you share the data with me so I can take a look to confirm?

      The cases that seem to be aided most by the geometry method match the example I provided where the Intersect tool creates a massive number more features in the output than were in the two inputs. Is that the case when you run the Intersect against your data?

      Thanks, Ken

      • KenH says:

        Is that the case when you run the Intersect against your data?

        should read

        Is that the case when you run the Intersect Analysis tool against your data?

        • kvolleberg says:

          Hi,
          You can download the data from here (12.5MB): https://dl.dropboxusercontent.com/u/8713747/test.gdb.zip. I named the feature classes input1 and input2, corresponding to your example.

          The intersect tool does not make a massive amount of new polygons (90000, where we started with 60000 input1-polygons).

          • KenH says:

            Thanks.
            Looking at your inputs and how they intersect, this is a fairly simple Intersect operation. Like I mentioned earlier, moving to the geometry method for performance is most often for cases where the output feature class would have multitudes more features in the output than the inputs combined.
            Generally how I look at deciding what to use is, if the Intersect tool completes in what you feel is a reasonable amount of time and produces an output you want… keep using it. If the Intersect tool takes many hours or days attempting your Intersect operation (and maybe fails due to machine resource issues), and when finished creates many multitudes more features in the output than the inputs combined (for example, tens or hundreds of thousands of features in the inputs…. tens to hundreds of millions of features in the output), you may want to investigate using the Intersect method in this blog providing the output it creates is suitable for your analysis.
            Thanks. Ken

  8. markxevans says:

    Where can I find documentation on the intertsect geom method (as in (clippedFeature = row2[-1].intersect(row[-1], dimension) )
    I’ve search for this and can’t find it. Just trying to find out what the requirements are for input and what is returned

    Thx

  9. epolaina says:

    Hi! I am trying just now trying to use this tool (PairWiseIntersect) for intersecting one shapelayer containing 5,300 multi-polygons (with the additional difficulty that many of them overlay) and another with around 14,000 polygons (not overlaying). It’s still running but… I’d like to know your opinion about the probabilities of success (if there’s any) I have.

    Thank you very much!

    Ester