An Introduction to Big Data

Two years ago the Big Data team released GIS Tools for Hadoop on GitHub. GIS Tools for Hadoop is an open source project that allows users to integrate Hadoop (a distributed big data platform) with big spatial data, complete distributed spatial analysis, and move data between the Hadoop Distributed Filing System (HDFS) and ArcGIS Desktop.

Until now, it has been difficult for many GIS users to take full advantage of these tools, or even just try them out (and see what all this big data talk is about). We know that not everyone has a cluster sitting around (although they are cheaper than you’d think) so we have put together a tutorial for beginners – no cluster or development experience needed!

This tutorial takes you through the steps of downloading and starting up a virtual machine (a self-contained portable Hadoop environment), accessing GIS Tools for Hadoop through GitHub, and pointing you towards tutorials and samples that teach you how to complete analyses on your big spatial data.

Check out the tutorial on GitHub, and let us know if you have any questions, or other tutorials you want to see on our GeoNet page.

(Post submitted by Sarah Ambrose, Big Data Team)

This entry was posted in Geodata and tagged , , , . Bookmark the permalink.

Leave a Reply

5 Comments

  1. knagornyuk says:

    Hello! Are you going to add more easy-to-use guide for Hortonworks Sandbox? As it did many other partners http://hortonworks.com/tutorials/#tuts-partners. For example Microsoft did really easy-to-use tutorial. The result is map in Microsoft Excel

  2. knagornyuk says:

    Hello Sarah,
    Yes, the guide GIS-Tools-for-Hadoop-for-Beginners is an easy to use . I used it! But what is the next step? The next step (sample) is based on using Hive Command line. It can be difficult for beginners. Meanwhile the Sandbox provides an interactive interface to Hive and etc. It would be grateful if you could prepare step by step tutorials for the web UI of Sandbox.

    • sambrose88_1 says:

      Hi @knagornyuk,
      I understand what you mean about the command line being difficult for beginners. We will look into making a tutorial using the sandbox in the future. Thanks for the suggestion!

      -Sarah

      • sambrose88_1 says:

        I meant to mention, right after completing the steps for the beginner intro you can complete the sample steps [https://github.com/Esri/gis-tools-for-hadoop/tree/master/samples/point-in-polygon-aggregation-hive]. If you are using the Hortonworks Sandbox, you will just type in the commands. If you are using Cygwin (which I would recommend), you can just copy and paste the commands in.

        - Sarah