Sunday mornings are for tea, The Sunday Times and thinking. And with changes in my daily work routine, all for the better, that’s got me thinking on large scale things with data again.

I’ve been thinking about Hadoop a lot again over the last week. “Is it dead?” posts in Quora, Spark 2.0 coming out and me working with Terraform, DCOS, Marathon and Mesos to create some quite remarkable things. Hadoop started it all for me and a good few years ago as I was reminded.

“Can I just say, you were the first, and only person speaking about Hadoop and BD in NI for years, ….”

Which was nice…. but over tea this morning I started thinking about the bigger picture. I love Northern Ireland, I love the startup scene though I’m not really involved anymore and I love data enabled stuff. And that got me thinking….


Do Connected Health Ideas Deal With The Big Problems?

There are some excellent companies coming out of the connected health side of things, where the personal collection of data can provide some feedback of performance. Even though I occasionally whine about the reliability of the data out wearables there are companies like AppAttic that are attempting to change user behaviour in this feedback loop. Bravo! Add to that the Invent2016 finalists such as Elemental, Take Ten and Kraydel with additional heavy duty wallop from C-TRiC you can see that things are happening.

These products could be classed as external to the main health hub of the NHS and while the notion of data being passed around to each other is enticing, the reality is much harder than that in reality.

In my opinion connected health ideas deal with the prevention, monitoring and catching early and there’s certainly use in that data. Without wanting to de-humanise the notion of a hospital, are we even tapping the 1% of the data available within these institutions?

Probably not.

Data Platforms In Hospital?

Absolutely. While the emphasis on connected health seems to come from outside the health system we can’t discount what goes on within the walls of it. Every hospital, GP surgery, health centre and clinic is a separate data platform. Different sectors of data from different backgrounds and groups of population.

On a compliance level it’s very difficult, nigh on impossible, for a startup to launch a product within a health location, hospitals especially. Not without years of testing, accreditation and reported findings, being a University spin out at this point becomes very appealing, I’d wager the probability of getting inside the NHS with findings may be easier with university backing.


The large BigData vendors love the Healthcare card, selling vast infrastructure to authorities and working close with them with consultants and so on. Infrastructure on this scale is expensive and with the NHS watching the purse strings it is a hard sell to accept.

If Not The Cloud, Then Where?

So, every piece of machinery, everything that can emit data can more than likely be stored. And I’m not saying, “to the cloud”, privacy dictates otherwise (with the exception of Deep Mind doing analysis on historical data). And I for one, even as a big/open data advocate, would be very concerned if time critical raw data was pushed to the cloud for analysis.

So if not the Cloud, then where? In house….

For first line analysis the data shouldn’t be leaving the building. Now this sounds like an infrastructure nightmare waiting to happen. I think there’s a potentially simple solution.

The plus point of Hadoop infrastructure was it was built for “commodity hardware”, every node in the cluster could be a fairly low grade machine and chug away in the background. And guess what, the hospital is full of them and hardly being used to capacity.

Photo 29-05-2016, 08 21 36

The machines are already networked. It’s just a theoretical case of having a small number of machines to handle in the incoming data (a Kafka queue or two) and somewhere to store the data (HDFS). A Hadoop master can then handle the nodes. A node in this instance is a machine under the desk of each ward or office. Take an entire hospital and you have one large data process cluster without any capital spend.

It’s not a difficult system to put together and it’s mainly geared for batch work, I wouldn’t expect it to do anything in real time, nor would I unleash a Spark job on it (loads of cores, loads of memory, the hospital isn’t geared up for that kind of work). Using the existing infrastructure does mean trade offs and that’s okay, if you’re sensible about the use case and think batch processing then there’s something of use here.


There’s a string of questions to answer I know, but in the pragmatic utopian vision for cheap, scalable data processing that keeps user data private, then this is a starting point. Consultants and analysts working together with their questions could potentially see a holistic view of the hospital for that day. “Why did all the base line temperatures raise by 2 degrees in that ward alone, is that something we need to look at?”.

Merely ideas I know, you never know, someone on the inside might just run with it. If it helps lives then I’m all for it.