Putting the Teradata, IBM’s and everyone large scale aside for one minute. I’m not saying they don’t have a place in all of this, they do.  Cluster farms and huge data silos are out of reach to the common man in most cases.  For the rest of us there’s the joys of single nodes, one machine doing all the work.

Now dumping petabytes of data on a single drive is not a common thing to do. And to be honest I’m not overly interested in the outliers and the black swans of this world, well not yet.

The planning stage for this data project or any other startup for that matter starts with the basics. A back of a beer mat job (well you normally have to split the beermat in two but that’s just a technicality).

Back of the envelope (BOTE) calculations have gone on for years. Sometimes known as Fermi Questions from the physicist Enrico Fermi who was roughly calculating bomb blast waves. The key word here is “roughly”, you can filter out an awful lot of noise with a basic rough idea.

If you remember the app economy equation I posted last month this is essentially a back of the envelope calculation. It gives me a rough idea, it’s not polished but it’s a good starting point.

The belief that BigData starts with taking everything and seeing what transpires is wrong in my eyes. Unless you have the processing power, the time and the patience to wait for some mysterious answer to arrive, then you’re usually already having an idea what you’re looking for. Why go through the whole lot?

The problem with BigData is there’s no one story that fits all. Every investigation starts with a question and with the data derives an answer. It may not be the answer you want, nor correct but it’s an answer. I’m becoming a firm believe of recency over historical especially for the likes of loyalty data.