Wednesday, January 1, 2014

Why Big Data is big now a days?

In my previous post I hinted that most current techniques used in Big Data currently have been in existence for a long time. What are the reasons that Big Data has grown in prominence recently? There are primarily two reasons, Data Capture and Power of Computation.

In this post I will focus primarily on the abundance of Data in current world. Changes in the world currently that have led to Big Data being in prominence:

1) Social Media: The growing culture of sharing personal opinions and content on facebook, twitter, pinterest, linkedin etc means that the data available on Social Media is much more than could have ever been imagined. In a recent disclosure, facebook revealed its monthly active users to be above a billion while twitter had more than 237 million active users. These users regularly post comments, events, photographs, experiences etc with the world. The number of user hence become a sufficiently large sample to make sense of the population.
The proliferation of Social Network can be found in the fact that people have taken to sharing their sneezes, rashes and abnormal physical conditions on these sites. Coupled with the fact that these posts are mostly geo tagged, it is possible to segregate things by locations. Incidentally, many locations provide sufficient volume of posts to justify any analysis statistically.

2) RFID: Most companies keep track of their inventory using either RFIDs or regularly scanned barcodes. Each scan becomes a data point and adds to the volume of data. Not only the volume but it heavily improves the quality of data available. It allows for meaningful analysis of this data. This technology helps covertly record a lot of data and adds extensively to volumes. Later, it is possible to find out different trends and patterns from this data. Combining this data with other factors open floodgates for analysis.



3) Scanners: Most outlets and industries use their IT infrastructure for recording anything. Scanners like Bar Code Readers, QR Code Readers etc have made it almost pain free to calculate such transactions. Totals, Billings, transfers etc become lot more accurate and fast. In the process lot of usable data is generated on which we can work. For bigger organizations, the sheer volume of this data might classify it as Big Data.

4) Mobile: This is the biggest change in context for the upsurge of Big Data for consumer side data. The usage and dependence of people on mobiles have allowed for the users to capture lot of pictures ( Form of Data), text messages, Instant Messages etc in real time without having to wait for getting home and writing it down, preparing a negative or sending out mail letters. Mobile is ubiquitous and its impact on the life of an average human has increased at the rate of knots. For example, people keep checking their social media (facebook, twitter) on phone and reply then and there. What this does is that it gives a near real time feed about events at a place or a global phenomenon. The information is disbursed across the globe at lightning speeds using the viral model. This data might not seem large but it is very very voluminous, not to mention unstructured. First making sense of this data in silo, and then combining it with external data sets is what most big data scientists do.

5) Location: Building on the last point, most mobile phones have location settings switch On by default. This leads to companies collecting immense user data about their location movements, etc. This in itself is such a huge amount of data, but when combined with other behaviors of the user, it provides immense understanding of user behavior. Also for companies, it is like mining gold and making the correct ornaments.

6) Internet : I am trying to explain a lot of unexplained things under a very broad category of Internet. Though most of what was explained earlier also fall under internet, but that is not very specific. Users generate immense data just by visiting websites or searching on Google/Bing. Google has approximately 6 Billion searches per day. Each of this search is logged. When one start typing part of its search query, Google gives most probable search query. It can do that because a user is rarely unique and the chances that someone has already searched for the same thing is extremely high. In short, a simple search entered by one user adds to data, and there are always millions doing the search. The data is big.
People's lives are so intertwined with the virtual world, that people are always on the web. People surf between 0 to 100 websites in a day on an average depending on how heavily your life is dependent on the internet. Each of this hit is collected as a data point somewhere. Either by the Name Server, or by the website itself; or both. Only the browsing history of all people combined collected by a large company like Google can run into TBs per day.

This is all data not seen at this scale previously. Since the techniques were available, but neither the tool nor resources (in terms of data to work on) was available previously, it lay dominant. With the rise of data, rose Big Data Analytics.
 
The list and explanations are in no way exhaustive but only indicative. I will try to collaborate a more exhaustive list when time permits. I just wanted to make a sense of why Big Data is a hit when the basic techniques were already there.

No comments:

Post a Comment