GIS Tools for Hadoop

On Sunday, David Kaiser and his Big Data crew released the GIS Tools for Hadoop Project on GitHub.

The project contains an open source framework and API that enables big data developers to author custom spatial applications for Hadoop.

The GIS Tools project also enables the ArcGIS platform to leverage big data on Hadoop using tools that combine custom Hadoop applications with the ArcGIS Geoprocessing environment.

The project supports processing of simple vector data (Points, Lines, Polygons) and basic analysis operations, e.g. relationship analysis on that data, running in a Hadoop distributed processing environment.

An overview page, including sample tools, can be found here: http://esri.github.com/gis-tools-for-hadoop

Upcoming Presentations
David and Michael Park are also presenting the project, its design and implementation, plus a demo, during a DevSummit talk on Thursday at 10am in Catalina/Madera.
If you’re down in Palm Springs this week, go check it out:
Big Data: Using ArcGIS with Apache Hadoop

This entry was posted in Geodata and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

4 Comments

  1. dkaiser says:

    This project really is the work of the “Big Data crew”, and I have to call out a whole list of awesome people here:
    Geometry Devs: Sergey, Aaron and Paul
    Geoprocessing Devs: Monica and Alex
    Geodatabase Devs: Mike and Randall
    and many other people… Mansour, Andrew and other remote devs, and a number of other interested people, team leads and product managers.
    I’m just one guy that was part of a conversation that involved this great group of people and see that it was released as we had planned. Thanks.

  2. dhollema says:

    2 questions. Is there any streamlined means to leverage these tools in Amazon EMR?
    The tools currently operate on point, line, and polygon. Are there plans to reach into the raster processing world?

    • schalker says:

      I don’t really know the answers of the questions. But I suspect that raster processing isn’t that easy to use in a Hadoop resp MapReduce environment. MapReduce was orginally developed for massively parallel text processing. But what do I know…

    • dkaiser says:

      @dhollema: Yes, an AMI for Amazon EMI is in the works. Watch here or follow the github repository pages to see when it is ready.

      re: vector vs. raster: Our initial plans were to allow customers to be able to use our Hadoop Framework to process massive amount of text data (and you can see this in our demos where we are showing accessing geometries out of tweets and webserver logs, etc.).

      Having said that, we are definitely interested in raster processing. There is work being done in academia and within the greater big data software industry where certain raster computations are already being processed on Hadoop, and we are looking at where our next steps will lead in this space.