The Big Data team is excited to offer a new tutorial on spatial aggregation (sometimes called spatial binning). Spatial aggregation is extremely useful in summarizing big data to gain a meaningful snapshot of patterns in your data. Spatial aggregation works … Continue reading
As mentioned in a prior blog post and presentation, there have been discussions about the development of a new, improved version of the 2007 Arc Marine Data Model (also known as the “Marine Data Model”). This was an action item … Continue reading
At the 2014 Esri User Conference, the Big Data team gave several presentations, including two technical workshops entitled: ‘Big Data and Analytics: The Fundamentals’ and ‘Big Data and Analytics with ArcGIS’. We presented our open source GIS Tools for Hadoop (shared on GitHub), as well as some research that we’re currently pursuing (exciting things to come!). We gave demos using both our open source tools as well as the prototype tools being currently researched.
For the demos (source data consisted of > 170 million data points that represent all the taxi cab trips in New York City in 2013), we ran all of our analytics on a Hadoop cluster back in Redlands. A twenty node cluster may seem like a big investment (and it can be); but, it doesn’t have to be. Enter the DREDD cluster… Continue reading
The Big Data development team at Esri is excited to announce a major performance speedup in ST_Geometry for Hive, which is part of Esri’s open-source Spatial Framework for Hadoop. The amount of performance gain depends on the type of spatial query run and on the size of the table in Hive. The biggest gain comes with relational operations such as ST_Contains and ST_Overlaps. In general, the performance gain will be greater with larger tables — exactly where it helps the most.
We are pleased to announce that the ST_Geometry aggregate functions are now available for Hive, in the Spatial Framework for Hadoop. The aggregate functions can be used to perform a convex-hull, intersection, or union operation on geometries from multiple records of a dataset.
An interesting task in highway management is to study potential impact of driver carpooling, based on an analysis of automatically collected automobile GPS position data. To identify potential enhancements to carpool participation, we set out to study places that have the highest numbers of trips with similar origin and destination locations. The source data for this experiment consists of nearly 40 million vehicle position records assembled from a single day of GPS-collected vehicle positions. The position data consists of longitude and latitude along with date, time, and speed. We used the Hadoop MapReduce framework for distributed parallel computation, considering its capability for analyzing larger data sets. Continue reading
The 2013 Esri International User Conference starts next week down at the San Diego Convention Center. There will be 15,000 attendees eager to learn and to see our path towards the future of GIS. Hopefully you’re one of them!
The Geodatabase Team will be down there giving technical sessions and demo theater presentations. We’ll be available to answer your questions and discuss what projects you’re working on too, so come visit us in the convention’s showcase area.
The project contains an open source framework and API that enables big data developers to author custom spatial applications for Hadoop. Continue reading
The Geodatabase Team is rolling down to Palm Springs next week for the 2013 Developer Summit. Swing by our section of the showcase area to meet the team’s developers, ask questions, and talk about your projects. Esri Showcase hours: Monday … Continue reading
By Damien Demaj, Cartographer The statistical component of sport has always provided a fascinating way to analyze performance and success. This might simply be the final score, but for some sports, such as football, baseball, cricket, golf and tennis, meaningful … Continue reading