What’s the Big Deal about Big Data?

The technology tides have shifted again and, as the notion of cloud computing is becoming mainstream across most industries, a new buzzword is emerging: Big Data.  Never heard of it? Simply put, the term refers to the ever-growing mountain of data, generated from myriad sources, that organizations must effectively address.

Big Data Caricature

Courtesy: Keith Mann, Esri

For instance, according to a recent MeriTalk survey, 96% of Federal IT professionals expect their agency’s stored data to grow in the next two years by an average of 64 percent.

Big Data is often described using the Three “V”s:  Velocity, Volume, and Variety.  By example, let’s take a few of the real world case studies gathered by IBM and provided by Mike Rhodin, Senior Vice President at IBM Software Solutions:

      Utility companies record 350

billion

      meter readings per year (= Volume); The financial service industry clocks 5,000,000 trade events

persecond

      (= Velocity); and, as we know, the types of data formats that can be generated easily range from structured traditional file formats, to unstructured video, audio, imagery, email, web logs, and pretty much

anything you can think of

    (= Variety).

As you might expect, the typical C-level individual is not overly concerned with the definition of Big Data.  What really interests her is finding ways to take advantage of Big Data in order to drive better business outcomes, according to Gartner.  So how does one extract value out of data that is terrabytes large (or larger), at-scale, both on a geographical and also a practical level?   How do you store all this data?  And how do you achieve results that are meaningful to your organization and your customers?  These are a few of the challenges of Big Data.

The good news is that there are a growing number of technologies that allow individuals to store and conduct analyses on the 3V’s of Big Data.  For example, MapReduce is the original set of distributed computing ideas now embodied today as Apache Hadoop.  Other big data-related technologies include Apache Cassandra, Hive, NoSQL, and MongoDB, just to name a few.   Also emerging are applications or methodologies used to perform data mining and analyses on Big Data via pleasing dashboards and an intuitive user experience (UX).

Marketing Workspace Dashboard, CloudTrigger.com

A residual effect of the growth of these offerings is the increased demand for workers with blended skill sets to fill the role of a Data Scientist: a half-research scientist, half-data analyst.

Enter: Serious business analytics.

To put the potential for Big Data into perspective, in 2011, GigaOm shared a few interesting examples of real-world situations where Big Data problems were solved:

  • A New York University PhD student conducted a comprehensive analysis of several terabytes worth of Wikileaks data to determine key trends around U.S. and coalition troop activity in Afghanistan.
  • A global non-profit analyzed 80 Million documents to confirm validity of the Guatemalan genocide of the 1990′s
  • A California genomics company consumed over 100 Million gene samples to predict markers for coronary artery disease.

It’s not surprising and quite a natural progression that the discussion of Big Data arrives on the heels of cloud computing.  The cloud allows organizations and agencies the ability to store a tremendous amount of data in a [hopefully] highly reliable system, in a distributed environment.  Cloud provides the ability to scale dynamically, leverage existing algorithms for analyses, and take advantage of robust data center hardware, cost effectively, without building from the ground-up.

Esri has been testing Amazon Web Services’ Elastic MapReduce product and deploying prototypes on the AWS cloud, as well as exploring and providing MongoDB examples to plug-in NoSQL data sources to ArcGIS.  More visibly is Esri’s geospatial analysis of tweets generated from Twitter and collected through big data partner, Gnip. You can see examples of social media monitoring via the public information maps, where tweets are captured then displayed across relevant geographies.  Other Esri partners in the big data space currently are Microsoft, IBM, TerraEchos, and CloudTrigger.

Every organization sees their data as core assets that drive business and decision-making. Mining location data from these assets and making sense of them is perhaps one of the biggest challenges we face with Big Data.  Typically this information is randomly collected and then locked away.  Access is limited or the data is archived and forgotten.  A more democratic platform, such as ArcGIS Online, can be used to greatly increase the speed of understanding and sharing of location data assets.  As a result, individual users are empowered with the information they need to make the most effective and innovative decisions, affecting the future of government and society, science and business.

That is the big deal about Big Data.

Related Links:

Victoria Kouyoumjian

About Victoria Kouyoumjian

Victoria Kouyoumjian has more than 15 years of experience working in the field of geospatial at Esri, and witness to many changes in high-tech and GIS during this time. After several years as a Product Manager for a suite of developer-focused software solutions, she now works as a Senior Business and Technologies Strategist, focusing on emerging technology trends, as well as business and marketing strategies. She works with global independent research organizations, Gartner and Forrester, and others leaders in information technology. Victoria holds an MBA, as well as a BS in Geography from the University of Wyoming, and a BA in English from Mt. Holyoke College in western Massachusetts. She co-authored The Business Benefits of GIS: An ROI Approach; has written several articles on cloud computing; and presents frequently, as burgeoning trends move into mainstream.
This entry was posted in Technology and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

8 Comments

  1. Dawn Wright deepseadawn says:

    This is a great post and very timely given what is going on in parts of the scientific community at the moment, especially the National Science Foundation’s EarthCube initiative which seeks to develop “transformative concepts and approaches to create integrated [big data] management infrastructures across the Geosciences. See EarthCube in action at http://earthcube.ning.com Indeed gig data will change the way that science is done, leading to a new science paradigm. An excellent book is the 2009 Microsoft Research publication, “The Fourth Paradigm,” which posits a new paradigm for scientific discovery beyond those of empiricism, analysis, and simulation. The fourth paradigm is where insight is also discovered through the manipulation and exploration of big data.

  2. azolnai says:

    Thanks Vic for this, Big Data part of my remit and and my plan, and your comms are very helpful as usual.And here’s the link to 
    “The Fourth Paradigm” in comment below:http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 

  3. duanemarble says:

    It all depends upon your definition of “big” and when you are posing the question. For example, in the discussions of the feasibility of Earth orbital sensors that took place many decades ago, we were very concerned with drowning in the potential spatial data. Again, some years later, a group of us were brought together by the DoD to discuss a proposed planet wide database with a 10 meter resolution. The discussion died before long when two of us pointed out that a single digital copy of the proposed database would require a physical storage facility four stories high and occupying an area roughly the size of Los Angeles.  From today’s point of view, both of these now look like “little” data and I am sure that tomorrow’s view of today’s “big data” will not be any different.We must also remember that even the best “big data” is of little utility unless we have a good notion about how to make effective use of it. And I would suggest that, especially in the case of spatiotemporal data, we are not in a very strong operational position at the present time.

  4. brya1680 says:

    Hi Vic,Thanks for posting this as it’s nice to see big data getting some attention from the GIS community.  Speaking of…several mentions of the importance of spatial location in the context of big data uses for the City of Chicago.http://www.huffingtonpost.com/ben-hecht/chicago-big-data_b_1522259.html?view=print&comm_ref=falseHope to see you in San Diego!Bryant Ralston

  5. Pingback: Esri DC Development Center plans | GeoIQ Blog

  6. Pingback: “What’s the Big Deal about Big Data?” | Blog Esri Portugal

  7. Pingback: ESRI to Big Data: I'm watching you | GEO:TECH #bigdata