Ride on Time

There’s much talk of BigData being like a black box. Data goes in, it’s processed and then some answers or knowledge come out of the other side. With all the Hadoop/MapReduce talk we refine a mapper with an algorithm or task to do and wait for the answers to pour out of the other side.

A black box, yesterday.

A black box, yesterday.

This is all very well when we know that the data is of a constant variance. For example a user rating on a product is between 3.5 and 4.7 stars on average and the algorithm is applied, could be a recommendation or a segmentation.  In batches no great deal if the results are monitored and confirmed.

What about realtime? High Frequency Trading essentially uses this black box sort of paradigm. No human interaction and the system trades many times a second.  If the flap of a butterfly’s wings causes the ripple then the effects are sometimes cyclical.

Algorithmic and HFT were shown to have contributed to volatility during the May 6, 2010 Flash Crash, when the Dow Jones Industrial Average plunged about 600 points only to recover those losses within minutes. At the time, it was the second largest point swing, 1,010.14 points, and the biggest one-day point decline, 998.5 points, on an intraday basis in Dow Jones Industrial Average history. (link to full article here)

Autonomic Managers

Thinking as algorithms as part of a nervous system gives us some greater control of the a spiralling outcome.  Autonomic managers are based on a control loop that do the following:

Montior -> Analyse -> Plan -> Execute

From those four elements we produce the fifth, knowledge.  Now if this knowledge can be used to refine the working elements of an algorithm then we’re potentially on to a good thing.  I wrote about understanding the autonomic manager concept for IBM in 2004 (nine years ago!).

A single algorithm set in a Hadoop like Black Box won’t solve every problem.  Like all software development there’s a constant refining process.  Automated systems make our lives even more reliant on the systems that work for us, they can do it quicker and more efficient than we can.

The real question now is can the black box be trained to heal itself when it senses a problem with the incoming data.  Will my algorithm be able to heal itself before the next iteration? That could be minutes, seconds or even milliseconds.

Can the MAPE-K principle be put into BigData?  Personally I think it can….