May 22, 2012
Big Data. What the heck is it? I heard the term used so much and knew so little about it that I decided to learn “what people are talking about” and then try to simplify it for others who may be similarly confused. So, I did some online research and attended a Big Data conference at Stanford earlier this month.
Wikipedia defines it as “a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.”
Wikipedia goes on to state that Big Data refers to “data sets that grow so large and complex that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. Scientists regularly encounter this problem in meteorology, genomics, connectomics, complex physics simulations, biological and environmental research, finance, and the internet.”
So, Big Data means “a lot of data that is complex to manage.”
But why is Big Data important? Take Velti, for example. We collect 3 billion data facts each day from mobile devices that consume breathtaking amounts of information. When dealing with all this data, constant improvement in the analysis and optimization of this data becomes critical for our customers success and similarly our company’s bottom line.
…Okay, on to the Big Data Conference. Held at Stanford and hosted by Accel Partners, the speakers ranged from large companies like Facebook, LinkedIn, Twitter, Google and Yahoo! to innovative startups such as Cloudera, Couchbase, Splunk and Storm.
The discussions focused on how companies ingest, store, interpret and use data today–which was much different than my interpretation of the industry. I assumed the data-side of the mobile industry was much more progressive, but when listening to the panels, the Internet side still seemed relatively nascent! In fact, it was mentioned that many companies today simply store the data they receive (it’s coming in at such a rapid clip) and that they will figure out how to use it more intelligently at a later date.
I’m sure it’s not news to typical data strategists, but to me (a layperson when it comes to understanding data beyond the basics) it was surprising that we are still in the “build crowd” stage—companies are still figuring how to deal with all of this data. More important, listening to the challenges facing other companies, larger and smaller, reminded me that most mobile companies still have an enormous amount of work to do to become superior performance organizations through leveraging data. It also reminded me how great the opportunity is for mobile marketers and advertisers as data systems continually make rapid advances in the areas of interpretation and optimization.
Below are a few key takeaways that shed light on Big Data, the challenges and its overall state in mobile:
- Structured Data vs. Unstructured Data. Structured Data is information that is easily stored in fielded forms in databases, while Unstructured Data is information such as images, or things that don’t fit into relational tables because they are text-heavy and challenging for a computer to process.
- Web data scientists are overwhelmed and under-resourced. Mobile – which has even more incoming data points to ingest, store, process and interpret than the web – is exponentially more challenging to manage and to leverage in a manner where companies can intelligently improve and scale beyond standard performance measures. From listening to the panels, most companies will need more Data Scientists and/or a more Open Platform and/or Data Partners to deliver the type of Data necessary to further improve on ROI metrics. Mobile is a great place to collect and consume data, but a terrible place to aggregate it–and so the cloud is needed (scalable, and can process large data sets) vs. storing on internal servers. Innovation around the mobile domain is a huge opportunity – “somebody figure it out” was the plea of one panelist.
- Open-source was a big theme. It’s almost as if the task of turning Big Data into more usable data is too large for one company to take on…but that open-source helps the industry tackle and solve specific data problems which helps the industry pace and scale. For example, initially Facebook used web logs alone to track users; but over the years, they found that by combing that machine data with data from open-source partners (Facebook opened their platform so that others could supply ad data tools) and combining that data with their own, their targeting capabilities improved dramatically.
- There doesn’t seem to be a clear standard emerging. Panelists agreed there is no standard platform today–companies don’t want to manage dozens of different platforms and so a standard is needed. SQL was mentioned frequently as “fine but not complex” because it is still a basic work system; end-users query, wait for the data to load and then interpret it. One panelist said, “The challenge is that SQL is still too manual.” On the other hand, Hadoop recieved a lot of support at the conference. Hadoop, the open source project administered by Apache, is a platform for consolidating, combining and understanding large-scale data in order to better comprehend the data deluge. Several companies at the conference said they deploy Hadoop alongside their legacy IT systems, which allows them to combine old data and new data sets.
Overall, the biggest challenges in Big Data today are:
- Segregation and transparency of data (26 regulators!)
- Enabling legacy technologies/tools to be successful
- Understanding how companies will manage data over time
- How do we serve the new class of users (what’s the use case?)
- Most tools being developed today are not being delivered as packaged applications
- High scale, real time application capabilities needed
Interesting data companies and sources mentioned included:
- Storm (ad targeting)
- Bit High
- …and any small teams that are building tools within large companies or even as startups
So now you get the gist of Big Data—and I hope you understand what it might mean for mobile. With massive amounts of data, in all different forms, being collected by the minute in mobile, companies that want to analyze and strategize over growing trends might have a bigger task at hand than they signed up for. In fact, I’m quite sure our collective efforts around data will soon place mobile in a position where the consumer experience and marketing results are consistently in agreement.
I hope sharing these conference notes helped simplify – as it did for me – what the heck Big Data is!