Sunday, 6 July 2014

Inside Big Data

Everyone is talking about big data, but sometimes the things you hear people say aren’t always strictly accurate. Adaptive Computing’s Al Nugent, who co-wrote “Big Data for Dummies” (Wiley, 2013) has written a blog called “Big Data: Facts and Myths” at http://www.adaptivecomputing.com/blog-hpc/big-data-facts-myths/ – I thought it would be interesting to hear what he has to say.

He says: “there has been an explosion in the interest around big data (and big analytics and big workflow). While the interest, and concomitant marketing, has been exploding, big data implementations have proceeded at a relatively normal pace.” He goes on to say: “One fact substantiated by the current adoption rate is big data is not a single technology but a combination of old and new technologies and that the overarching purpose is to provide actionable insights. In practice, big data is the ability to manage huge volumes of disparate data, at the right speed and within the right time frame to allow real-time analysis and reaction. The original characterization of big data was built on the 3 Vs:

  • Volume: the sheer amount of data
  • Velocity: how fast data needs to be ingested or processed
  • Variety: how diverse is the data? Is it structured, unstructured, machine data, etc.

“Another fact is the limitation of this list. Over the course of the past year or so others have chosen to expand the list of Vs. The two most common add-ons are Value and Visualization. Value, sometimes called Veracity, is a measure of how appropriate the data is in the analytical context and is it delivering on expectations. How accurate is that data in predicting business value? Do the results of a big data analysis actually make sense? Visualization is the ability to easily ‘see’ the value. One needs to be able to quickly represent and interpret the data and this often requires sophisticated dashboards or other visual representations.

“A third fact is big data, analytics and workflow is really hard. Since big data incorporates all data, including structured data and unstructured data from e-mail, social media, text streams, sensors, and more, basic practices around data management and governance need to adapt. Sometimes, these changes are more difficult than the technology changes.

“One of the most popular myths is the ‘newness’ of big data. For many in the technology community, big data is just a new name for what they have been doing for years. Certainly some of the fundamentals are different, but the requirement to make sense of large amounts of information and present it in a manner easily consumable by non-technology people has been with us since the beginning of the computer era.

“Another myth is a derivative of the newness myth: you need to dismiss the ‘old database’ people and hire a whole new group of people to derive value from the adoption of big data. Even on the surface this is foolhardy. Unless one has a green field technology/business environment, the approach to staffing will be hybridized. The percentage of new to existing will vary based on the size of the business, customer base, transaction levels, etc.

“Yet another myth concerns the implementation rate of big data projects. There are some who advocate dropping in a Hadoop cluster and going for it. ‘We have to move fast! Our competition is outpacing us!’ While intrepid, this is doomed to failure for reasons too numerous for this writing. Like any other IT initiative, the creation of big data solutions need to be planned, prototyped, designed, tested, and deployed with care.”

I thought Al’s comments were very interesting and worth sharing. You can find out more at Adaptive Computing’s Web site.

No comments: