Big Data is a term being thrown around a lot these days. This post is my attempt to at least let the readers make sense of what it means and its potential.
One does tend to think at a very large scale when it comes to big data. The basic concepts revolving around Big Data are age old. What has changed recently is not the techniques of looking at the data but the data itself. With people generating data at the rate of at least tera bytes per minute in the world, and most of it being shared online, the evolved computation power has provided humans with a paradigm never experienced before. Access to an unlimited stream of data and the computational ability to make sense of it.
Some people argue that anything too big for Excel or Access is big data. Personally I disagree. My belief is that big data is working not only with large data sets but also with varied data sets, most of which is unstructured. Making sense out of a combination of Structured and un-Structured data available in the world is what makes Big Data really BIG and provides a lot of potential.
World has known lot of great mathematicians and statisticians who gave theorems and algorithms which were hard to prove without today's computational capacity. It has been proven that the maximum length of chain that link and two humans in this world is extremely small in the order of 10s. It was a long standing theory but could only be proved by the advent of two things: Social Media and the computation power to analyze social media.
Companies like UPS and Fedex ship millions of consignments per day. The amount of data generated by just the tracking of each shipment is phenomenal. To optimize the flow of packets, these companies run algorithms on peta bytes of such data coupled with factors like weather, traffic, etc and arrive at optimum routines. This approach has saved millions if not billions for these companies.
This forbes article http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/ suggests how strong the power of data is. With the available amount of data and the capacity and willingness to process it can lead to such powerful models. Similar data when applied to natural disasters can lead to early prediction of earthquakes and Tsunami. ISRO was successfully able to predict cyclonic storms in India and saved thousands of lives. Such natural disasters usually cost thousands of lives in India.
There is a saying in India, " Ati Sarvatra Varjayet". In Chinese it goes something like "Wu Ji Bi Fan" ( Refer to Jackie Chan's Karate Kid if I did some mistake here ). And roughly translated in English it comes out saying that too much of everything is bad. Similarly, going too much into big data may be counter productive. Sometimes, data provides an insight and the organization is not flexible enough to implement it on time. They then implement a half cooked duck taped solution and find themselves in a bigger soup.
Privacy is a big issue when it comes to using the data better. Going into a more granular level of data makes sense but there comes a point when the analysis might intrude on the privacy of individuals. Traditionally, the store owners and workers had a personal touch with customers and offered discounts based on personal interactions. Most places, this has been replaced by data based offers. The personal touch is lost and the example of Target above can serve as a classic borderline privacy intrusion case.
Big Data is a great boon for business if used properly, but it is a territory best tread carefully with the help of legal and technical departments.
One does tend to think at a very large scale when it comes to big data. The basic concepts revolving around Big Data are age old. What has changed recently is not the techniques of looking at the data but the data itself. With people generating data at the rate of at least tera bytes per minute in the world, and most of it being shared online, the evolved computation power has provided humans with a paradigm never experienced before. Access to an unlimited stream of data and the computational ability to make sense of it.
Some people argue that anything too big for Excel or Access is big data. Personally I disagree. My belief is that big data is working not only with large data sets but also with varied data sets, most of which is unstructured. Making sense out of a combination of Structured and un-Structured data available in the world is what makes Big Data really BIG and provides a lot of potential.
World has known lot of great mathematicians and statisticians who gave theorems and algorithms which were hard to prove without today's computational capacity. It has been proven that the maximum length of chain that link and two humans in this world is extremely small in the order of 10s. It was a long standing theory but could only be proved by the advent of two things: Social Media and the computation power to analyze social media.
Companies like UPS and Fedex ship millions of consignments per day. The amount of data generated by just the tracking of each shipment is phenomenal. To optimize the flow of packets, these companies run algorithms on peta bytes of such data coupled with factors like weather, traffic, etc and arrive at optimum routines. This approach has saved millions if not billions for these companies.
This forbes article http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/ suggests how strong the power of data is. With the available amount of data and the capacity and willingness to process it can lead to such powerful models. Similar data when applied to natural disasters can lead to early prediction of earthquakes and Tsunami. ISRO was successfully able to predict cyclonic storms in India and saved thousands of lives. Such natural disasters usually cost thousands of lives in India.
There is a saying in India, " Ati Sarvatra Varjayet". In Chinese it goes something like "Wu Ji Bi Fan" ( Refer to Jackie Chan's Karate Kid if I did some mistake here ). And roughly translated in English it comes out saying that too much of everything is bad. Similarly, going too much into big data may be counter productive. Sometimes, data provides an insight and the organization is not flexible enough to implement it on time. They then implement a half cooked duck taped solution and find themselves in a bigger soup.
Privacy is a big issue when it comes to using the data better. Going into a more granular level of data makes sense but there comes a point when the analysis might intrude on the privacy of individuals. Traditionally, the store owners and workers had a personal touch with customers and offered discounts based on personal interactions. Most places, this has been replaced by data based offers. The personal touch is lost and the example of Target above can serve as a classic borderline privacy intrusion case.
Big Data is a great boon for business if used properly, but it is a territory best tread carefully with the help of legal and technical departments.
This blog recalls me the "3Vs", Volume, Variety, Velocity. Although first addressed 12 years ago, it is still an effective framework of "what is big data". I have been asking myself what becomes different after 12 years. Now I get one answer, "wu ji bi fan". (Yes, this is the correct spelling).
ReplyDeletenice one ambuj ..sharing it
ReplyDeleteVery Useful ! Thanks for sharing Mr. Ambuj Agarwal !
ReplyDelete