Researchers today need to deal with an avalanche of data—from environmental sensor networks (both on land and at sea), social media feeds, LiDAR, and outputs from global- and regional-scale atmospheric circulation and general climate models and simulations. Because of this, “big data” is emerging as a major research theme for the academic community.
I recently had the opportunity to attend GIScience 2012, which is convened every two years and brings together leading researchers from around the world to reflect on a wide spectrum of geographic information science research areas. Attendees are normally university academics and graduate students working in the areas of geography, computer science, information science, cognitive science, mathematics, philosophy, psychology, social science, environmental sciences, and spatial statistics.
One of my goals as Esri’s Chief Scientist is to try to understand and articulate what big data means for the academic community. GIScience 2012 proved to be the perfect venue for this.
Academics tend to think in terms of questions. Among the many questions they seem very interested in at the moment are:
- What new fundamental problems does big data pose to GIScience?
- How can we best to foster and synergize research on big data across pertinent research communities?
- What are the significant kinds of big data from a spatial perspective? Why do they matter?
- What are the challenging issues of modeling uncertainty in big data?
- How can we best prepare the students in the use, development of, and analysis of big data?
In response to these and other questions, a number of research areas are being pursued, including:
- Most GIScience algorithms need to be rewritten to fit the new infrastructure of big data.
- Various communities—such as the machine learning and complex process modeling communities—need to talk to each other and work together.
- We have succeeded in getting metadata to talk to data. We now need models talking to data, models talking to other models, and models talking to us via effective workflows.
Among the “V” tenets of big data, Variety may be the most interesting for the academic community, with data coming from more sources and types (picture, video, audio, text, scientific observations, scientific models), perspectives (governments, military, NGOs, etc.), and the various cultures of contributing data (e.g., government vs. citizen scientist). A more holistic view is gained by considering all of these.
This community is also very keen on semantics, with the aim toward semantically annotating big data and making it understandable to both humans and machines, easing publishing, retrieving, and exploring. Many challenges still remain in improving the exploration of the data. For example, how can we come up with new paradigms to browse and navigate through data without pre-sized queries?
Big Data and Esri
The IT industry has launched a whole series of technologies to manage and process the deluge of big data using advanced techniques such as parallel processing, in-stream processing, and related techniques. In my brief remarks at GIScience 2012 as a member of a plenary panel on big data, I shared that Esri is still working through and researching various issues, approaches, and technologies related to big data, and developing prototypes.
We are researching the importance of Volume, Velocity, and Variety, adding the two additional “V’s” of Veracity and Values (which resonates well with this community), and how these “V’s” will require significant departures from traditional SQL databases and processing workflows. We are also looking at the importance of developing or leveraging multi-node architectures, stream-processing engines for real-time analysis, and spatially-enabled NoSQL databases, document stores and search engines.
In terms of technologies, at Esri we are researching and experimenting with stream processing via Storm, Map Reduce frameworks via Hadoop, NoSQL databases via MongoDb, and scripting in Hive or Pig languages to do data processing.
Esri is working with several customers that have interest in leveraging GIS with big data. We are currently investigating how our platform can integrate with big data technologies and provide analytic and visualization solutions. We are also working with several of the leading big data vendors, such as IBM, and actively building prototypes that help us understand how best to support big data.