Tag: Hadoop

An Introduction to Big Data


Two years ago the Big Data team released GIS Tools for Hadoop on GitHub. GIS Tools for Hadoop is an open source project that allows users to integrate Hadoop (a distributed big data platform) with big spatial data, complete distributed … Continue reading

Posted in Geodata | Tagged , , , | 5 Comments

New Spatial Aggregation Tutorial for GIS Tools for Hadoop


The Big Data team is excited to offer a new tutorial on spatial aggregation (sometimes called spatial binning). Spatial aggregation is extremely useful in summarizing big data to gain a meaningful snapshot of patterns in your data. Spatial aggregation works … Continue reading

Posted in Geodata | Tagged , , , , , , | 3 Comments

Setting up a small budget Hadoop Cluster for Big Data Analysis

At the 2014 Esri User Conference, the Big Data team gave several presentations, including two technical workshops entitled: ‘Big Data and Analytics: The Fundamentals’ and ‘Big Data and Analytics with ArcGIS’. We presented our open source GIS Tools for Hadoop (shared on GitHub), as well as some research that we’re currently pursuing (exciting things to come!). We gave demos using both our open source tools as well as the prototype tools being currently researched.

For the demos (source data consisted of > 170 million data points that represent all the taxi cab trips in New York City in 2013), we ran all of our analytics on a Hadoop cluster back in Redlands. A twenty node cluster may seem like a big investment (and it can be); but, it doesn’t have to be. Enter the DREDD cluster… Continue reading

Posted in Geodata | Tagged , , , | 8 Comments

Big Data ST_Geometry Queries up to 20X Faster in Hive

The Big Data development team at Esri is excited to announce a major performance speedup in ST_Geometry for Hive, which is part of Esri’s open-source Spatial Framework for Hadoop.  The amount of performance gain depends on the type of spatial query run and on the size of the table in Hive.  The biggest gain comes with relational operations such as ST_Contains and ST_Overlaps.  In general, the performance gain will be greater with larger tables — exactly where it helps the most.

Continue reading

Posted in Geodata | Tagged , , , , , , | 1 Comment

ST_Geometry Aggregate Functions for Hive in Spatial Framework for Hadoop

We are pleased to announce that the ST_Geometry aggregate functions are now available for Hive, in the Spatial Framework for Hadoop. The aggregate functions can be used to perform a convex-hull, intersection, or union operation on geometries from multiple records of a dataset.

Continue reading

Posted in Geodata | Tagged , , , , , , | 1 Comment

Vehicle Trip Discovery with GIS Tools for Hadoop

An interesting task in highway management is to study potential impact of driver carpooling, based on an analysis of automatically collected automobile GPS position data. To identify potential enhancements to carpool participation, we set out to study places that have the highest numbers of trips with similar origin and destination locations. The source data for this experiment consists of nearly 40 million vehicle position records assembled from a single day of GPS-collected vehicle positions. The position data consists of longitude and latitude along with date, time, and speed. We used the Hadoop MapReduce framework for distributed parallel computation, considering its capability for analyzing larger data sets. Continue reading

Posted in Geodata | Tagged , , , , | 8 Comments

Esri presentations, panels, and open-source projects at FOSS4G-North America

FOSS4G NA 2013 logo

This week a few members of the Esri team will be at the FOSS4G North America held in Minneapolis, Minnesota. FOSS4G-NA is yearly regional event of OSGeo – the Open Source Geospatial Foundation – highlighting community projects, open-source tools, and … Continue reading

Posted in Developer | Tagged , , , , , , , , | Leave a comment

GIS Tools for Hadoop

On Sunday, David Kaiser and his Big Data crew released the GIS Tools for Hadoop Project on GitHub.

The project contains an open source framework and API that enables big data developers to author custom spatial applications for Hadoop. Continue reading

Posted in Geodata | Tagged , , , , , , , , , , | 10 Comments