OPINION: When small data becomes big data

John Stones
By John Stones | 21 August 2013

It was recently estimated by IBM that up to 90% of data in use today has been created within the last two years. While there is clearly a need to manage this exponentially growing data stream, the term 'big data' has become synonymous with this volume. This is only part of the picture though, as with many buzz words, 'big data' has become misappropriated.

What truly makes data 'big' is defined by other variables besides volume. It is the ability to consume streaming, often unstructured, data. Then analyse and identify patterns in an automated real time environment.

This methodology is in opposition to the supposedly 'small' data of the past, where data was analysed in samples of structured data - a manual and time inefficient process.

'Big data' is in many ways a quantity over quality approach. Data sets are large and often unstructured. Take social monitoring, companies in this area continuously monitor multiple social networks tracking changes in sentiment.

The data they analyse is streaming and unstructured. Reaching statistically significant sample sizes is a non-issue. The assumption is that, in these large data sets, errors cancel themselves out, whereas in the 'small data' world, minor problems become accentuated by the smaller sample size.

In online advertising, behavioural data has always been a vital selling tool. Publishers would survey their visitors, collecting insights that could be used to convince an advertiser that their audience was a good match. This is a typical example of 'small' data. Ask the wrong questions or use too small a sample size and the insights can be startlingly inaccurate.

This methodology has a strong reliance on human analysis - which is open to attitudinal bias. Advertising 'big data' companies now track, in an anonymous fashion, every page visited and video watched, every email read and purchase made by a web user. This constant behavioural stream of data means their user profile data evolves with them, and remains accurate. It enables advertisers to decide, with a much higher level of certainty that best fits their advertising requirements.

Today's behavioural data insights would not be possible using 'small data' practices.

John Stones
Head of product and innovation

comments powered by Disqus