Sunday, September 28, 2014

Size doesn't matter

Big data.

A couple of years ago, at a GOTO Aarhus conference, I took a break from the sessions, and walked around in the vendor area. Here I was lucky enough to be able to listen in on a conversation between Dave Thomas and Jim Webber, where Dave Thomas was explaining to Jim Webber why graph databases, like neo4j, were not suited for the type of stuff he was doing. Basically, what Dave Thomas did, was to take all global stock data several times a day, and run some analysis on it (I am obviously simplifying it, and probably explaining it wrong).

This is the sort of things I think of when I hear the words "big data".

Since that's the case, I have been somewhat skeptical when people start talking about big data in Denmark, because we have very few domains where there are anything remotely close to such data amounts (health care probably being the one exception).

It turns out that I've basically misunderstood the concept of big data, and that I underestimated the amount of data out there.

At GOTO Copenhagen, I went to a talk with Eva Andreasson, where she gave an overview of the big data landscape, mostly at the vendor level. During this session, she made a number of important points, which made me realize I have to change my view on big data and its usage in Denmark.

First of all, Eva Andreasson made clear that only about 10% of all data out there is what we traditionally would consider data (e.g. data about companies or people). The rest of it is all the trace data that people leave around when they navigate the internet, doing whatever shopping or browsing they want.

Such trace data, put together with traditional data, allows companies to analyze end-user behavior much better than traditional data alone. E.g. while traditional data will tell you what customers bought, trace data will tell you what products customers spent a long time looking at, without buying them at the end - allowing the company to do some further analysis on what it would take to get the customers to buy the product.

Another thing that Eva Andreasson made clear, is that big data isn't just about working on large data amounts. It is also about aggregating new data sources into existing use scenarios of existing data, and about making new use scenarios of the data that you work with, allowing you to look at things in new ways, hopefully gaining new insights.

Based on these two points, it is clear to me that I have to reevaluate my understanding of when big data is relevant. And judging from the conversations I've had with other people about big data, I am not alone in this.

No comments:

Post a Comment

If the post is more than 14 days old, your comments will go into moderation. Sorry, but otherwise it will be filled up with spam.