Two years ago the Big Data team released GIS Tools for Hadoop on GitHub. GIS Tools for Hadoop is an open source project that allows users to integrate Hadoop (a distributed big data platform) with big spatial data, complete distributed … Continue reading
The Big Data team is excited to offer a new tutorial on spatial aggregation (sometimes called spatial binning). Spatial aggregation is extremely useful in summarizing big data to gain a meaningful snapshot of patterns in your data. Spatial aggregation works … Continue reading
At the 2014 Esri User Conference, the Big Data team gave several presentations, including two technical workshops entitled: ‘Big Data and Analytics: The Fundamentals’ and ‘Big Data and Analytics with ArcGIS’. We presented our open source GIS Tools for Hadoop (shared on GitHub), as well as some research that we’re currently pursuing (exciting things to come!). We gave demos using both our open source tools as well as the prototype tools being currently researched.
For the demos (source data consisted of > 170 million data points that represent all the taxi cab trips in New York City in 2013), we ran all of our analytics on a Hadoop cluster back in Redlands. A twenty node cluster may seem like a big investment (and it can be); but, it doesn’t have to be. Enter the DREDD cluster… Continue reading
The Big Data development team at Esri is excited to announce a major performance speedup in ST_Geometry for Hive, which is part of Esri’s open-source Spatial Framework for Hadoop. The amount of performance gain depends on the type of spatial query run and on the size of the table in Hive. The biggest gain comes with relational operations such as ST_Contains and ST_Overlaps. In general, the performance gain will be greater with larger tables — exactly where it helps the most.
We are pleased to announce that the ST_Geometry aggregate functions are now available for Hive, in the Spatial Framework for Hadoop. The aggregate functions can be used to perform a convex-hull, intersection, or union operation on geometries from multiple records of a dataset.
An interesting task in highway management is to study potential impact of driver carpooling, based on an analysis of automatically collected automobile GPS position data. To identify potential enhancements to carpool participation, we set out to study places that have the highest numbers of trips with similar origin and destination locations. The source data for this experiment consists of nearly 40 million vehicle position records assembled from a single day of GPS-collected vehicle positions. The position data consists of longitude and latitude along with date, time, and speed. We used the Hadoop MapReduce framework for distributed parallel computation, considering its capability for analyzing larger data sets. Continue reading
This week a few members of the Esri team will be at the FOSS4G North America held in Minneapolis, Minnesota. FOSS4G-NA is yearly regional event of OSGeo – the Open Source Geospatial Foundation – highlighting community projects, open-source tools, and … Continue reading