Tuesday, October 9, 2012

Big Data ! Big Challenge !! Big Win !!!

 While reading  article on a blog I found a very interested statement "Big data can beat a better Algorithm"   ...   It was totally apposite  to some of my college lectures which  say A O(n) algorithm is always better than an O(n * n) ? 
  •  What is this big data ?
  •  How is this getting generated ? 
  • What challenges it have with it ? 
  • How this data can be helpful for any organization ? 
 Based on one IDC survey the size of "digital universe"  was .18 zettaBytes in 2006 which was ten times in 2011 to 1.8 zettaBytes ( which is roughly equivalent to every person on the earth have one disk drive of data ) . The rate at which this digital data is getting generated is still increasing. We can say we already crossed the time when  A Software programmer had to think about data in GB or even in TB, now a Programmer will have to think about every second increasing Tsunami of digital universe known as "Big Data" ? 

How is this data getting generated ? Every human being having a connectivity to WWW, Telephonic network , a pager is making it's contribution in this set of Big data.  We are increasing it by sending a email, click like on Face Book, Call  to a friend ,  Publish a blog. We go to our favorite song site and listen few songs,  your favorite website  keep track of all these activities and adding up to this Big data.

In my previous post I mentioned some  points that their is certain limit after which a single machine can't store or process the data ! Than how this large data will be stored and will be processed ?  how rich information will be getting extracted out of this ? Answer is with when one guy can't do it .. distribute it in between 2 guys . So to store these large data sets we need distributed storage ... Distributed computation point which will be making operation on this distributed data . There are again challenges what if I divide my data on 10 machines and 1 of them die ? How I will know on which machine my particular part of data is available ? Will my computation program will have to go on each and every machine to search for this data ? How these multiple computation units will coordinate ?  There are many questions to be answered and I will try to uncover one by one in my upcoming posts but before that we jump to the Really Big question is this Big data really helpful for us ?

Yes It Is !! Your favorite song website will only be able to give  nice suggestion to you only if it have all your previous visit records ...suggestion from your friends ..your geography location ...your recent likes ...your age ......balah balah ....  so with out this big data this song website can't give you any valuable song suggestion even if they are using best Data analysis algorithm .......so I guess the article was right with big Data we can get a Big Win ...