Would you place a bet on your startup like @nilerodgers did?









So what does Nile Rogers (2nd from left in the picture, but you know I’m a bassist and Stick player so I have to have Bernard in there too) have to do with your startup?  To be honest, little apart from a measurement in your self belief and the belief in your own venture. That though counts for an awful lot.

The assumption that every startup is going to go through the same auto pilot cycle of idea > some money (public or otherwise) > accelerator > pivot > rinse and repeat is so well documented and overly adopted.  You can Lean Startup it, Business Model Canvas it or Personal MBA it or a mixture of. Bootstrappers of world, you’re not forgotten either.

That Difficult Second Album

Madonna’s first album was, and is, a classic. Make no bones about it Borderline and Lucky Star are pretty much perfect pop songs. Even so sales after the first year were in the 750,000 mark and Nile Rodgers was brought on to produce the next one, you might have heard of it….



The self belief that Nile had in Madonna as an artist made him so sure of the way he wanted to do things. He was happy to for go his advance (in startup land, let’s call it funding). Now the advance covers you while product is being made and is then essentially paid back against revenues of the product until a certain amount is reached.

The idea that Madonna could sell five to six million copies of Like A Virgin easily (this wasn’t risk analysis, I believe it was sheer self belief, “I placed a bet on myself” were his words) that he was willing to not take an advance but be paid a higher royalty from the sale of copy number 1 onwards.

It paid off, 21 million copies sold and a higher royalty from copy 1.

Bet On Your Own Startup?

I believe anyone who can stand in front of a pitch panel, investor or accelerator and say, “We’re gonna be the next {x} of {y}” or “We’re gonna disrupt {X} industry” are usually great story tellers, able to convince a set of folk who hold money (their own or otherwise) to hand it over to the story teller. It’s little to do with belief at the time.

Having the belief to get actual customers and make revenue is a different matter. The story telling will only go so far.


The Challenge

So here’s the question, could you look in the mirror and say, “I’m going to bet £10/£50/£100* that my venture is going to make £100,000/£750,000/£1,000,000,000* profit in 1/3/5/10 years*”? Even better find a way to make that bet concrete (I know there are all reasons not to bet, if you’re uncomfortable with it I understand, you’ll need to find another way). What could you forgo in the short term to increase the upside when the venture is successful?

Even better, announce your intention. There’s nothing exposure to public ridicule to galvanise the attention. If you fail then that’s fine, it happens, been there plenty of times myself.

For me personally I don’t take funding, simple as that, build it then find customers. I choose my mentors based on the industry, not the locality.  And I’m in for the long play, not short term gains. There’s timing in everything, you might think you’re ready but the rest of the world may not.

* – Delete as applicable

#Hadoop #HDFS in Safemode, here’s how to get out of it.

It happens once in a while. When you are performing an operating with HDFS like adding new data you may see this message:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: Cannot create directory /users/txt6. Name node is in safe mode.

Safemode in HDFS can be switched off with one command:

hdfs dfsadmin -safemode leave

Once you’re run that command you will be able put data in to HDFS again.

|LIVE NOW| #meerkat How using Twitter’s user base may end badly.

Over the years (I can say that now) I’ve had the opinion that solely using Twitter as a user generation tool usually ends up bad. I also have the sneaking suspicion that history will no doubt repeat itself with AppMeerkat.


Live Video – The Natural Progression

On the surface it makes complete sense, live video. From messages, to short messages, to photos, to snippets of video to live video, yes makes sense to me. Just call it technical evolution. If you also look at the history of these developments a number of companies were thoroughly dependent on a third party platform to boost their user base.

This too made sense as a large volume of users were hanging out in either Facebook, Twitter or both.

Cherry Picking The API

The common growth strategy to get to 1 to a million users is to go via a third party like Twitter and use their user API to log on to your application. You get the user details and some quick traction. Keep auto posting back to Twitter when a user does something and all’s sweet. It’s not.

Remember this Twitter slide from 2012?


From their “we’re changing the API” post basically saying if you’re a Twitter client then you’re stuffed. In developer communities it sent mini shockwaves of how startups would get traction. In the early days Twitter wanted rapid user growth too so opening up the API and getting developers to create client applications and not costing Twitter a penny, well it was the perfect plan.

The cherry pick happened later, Twitter could look at all the apps using their API and acquire the better ones and attempt to kill off the useless ones. A form of digital natural selection. There were plenty of casualties.

Looking at this image again three years later only reinforces to me one thing, everything in those four quadrants is a moveable feast.

Not Just Apps, Look At Watches

It’s not just apps, look at watches. The cherry pick can happen anywhere. Apple have waited a long time to announce a watch, why? Well to see what everyone else was doing first. In the meantime stock the competitor so there’s upsell revenue coming in and then when you’re ready you cut the competitors out of your ecosystem. Simple. Don’t believe me, Look at the fitness things being removed from Apple Stores in waiting for the watch…..

Predicting AppMeerkat’s Rapid…..

Twitter own Vine, six seconds of video looping. I wager that every Vine in house developer is working on live video right now. The cherry pick is already happening with Twitter restricting automatic tweets for Meerkat. Even Meerkat’s CEO Ben Rubin isn’t convinced his startup will outlast the hype.

“People get excited by the novelty of live streaming, but it wears off,” Meerkat CEO Ben Rubin cautioned me on Skype from Israel.” (from the Gigaom piece 5th May 2015)

At least this time around there’s a CEO under no illusions.

The timing of all this is the perfect storm for startups, pundits and grumpy ones like me. SXSW is going through the motions and the tech press is looking for that startup showpiece that stole the show in Austin. Attendees are live streaming now and on the surface it’s stealing the show.

Cast your minds back to the Foursquare and Gowalla launch battle at SXSW then think, what happened since?


Concentrix and the Case For Machine Learning and Tacit Domain Knowledge (#MachineLearning #Spark #Hadoop)

The initial story ran on 20th February in the Independent newspaper, “People in need at risk of losing tax credits after being wrongly accused of cheating“(1). Now a story like this is going to an emotive issue at the best of times as it involves low incomes and the potential removal of money.

What it also illustrates is two common problems in process. The lack of tacit domain knowledge and a lack of refined process.


The Case For Tacit Domain Knowledge

First of all let me be clear, I’m not aware of how Concentrix currently do this decision making. So I’ll guess my way around it from what the headlines are saying.

One of the arguments from staff at Concentrix (allegedly I may add) is:

“Staff at Concentrix’s office in Belfast, where the contract is based, have told The Independent that they haven’t been given enough training to differentiate between genuine claims for tax credits and fraudulent ones.”

Tacit domain knowledge is, according to (Blandford & Rugg, 2002) is, ‘knowledge which is not accessible to introspection via any elicitation technique.’ And this mirrors what the staff are saying in not so many words, it boils down to one single issue, “experience”.

“The reason for not being able to gain easy access to this deep level of knowledge is because it is what we humans call ‘experience’, something which we gain through time and exposure to different environments and situations. It is precisely the experience factor that creates experts in certain fields around specific subjects or subject matter.”(2)

Training doesn’t lead to experience it only acts as a baseline to what should and should not happen. With something as complex as tax credit claims then tacit domain knowledge is going to be key to the decision making process. So at the start of the project, before staff get to use any system, the domain experts should be able to define the process and the expected decisions any system should be making. With something like the tax credit system there are multiple sources of data on which a decision is made.

Even with tacit domain knowledge not every case handled is going to pass through a system without error. Some will be needed to be routed to a domain expert for analysis for a final yes/no on the claimant. The key to issues like this is how the knowledge is put back into the system to enable the team to make better decisions in the future.

A Case For Machine Learning?

Well I believe there is here, what I don’t believe though that it’s a black and white solution. Here machine learning should aid the process and make a recommendation on the final decision, ultimately though it’s up to a qualified employee voice to make the final call based on the data presented.

Since the introduction of tax credits in April 2003 there will be a trail of data and decisions, therefore there is historical data to train a system to make decisions. Like I say, this sort of system shouldn’t be making the final decision but merely aiding those who do based on previous cases.

With this sort of scenario it’s going to based on claimant documents so scanning and text mining is going to be playing a key part. Couple this with a decision trees and you have the basis for a decision making process.

The most important part of the mix though is not to have the system guess when it can’t decide but rather making the domain expert decide and thus enabling the system to learn from the new experience. This is important especially in borderline cases where the final decision is not clear.

The advantage of using machine learning in the training stage, along with a domain expert, is that at a guess 66% of claims would fall within the average (i.e. 1 standard deviation from the mean) it’s the outlier cases that take time on analysis.

Human Factors

Let’s not beat around the bush here. This is call centre staff being put under pressure to perform.

“They also say they are being encouraged to hit a target of making 20 decisions a day, or about three an hour, on whether to stop, amend or leave a tax claim unchanged.”

Automation can help in such cases and a trained system can certainly lead to better decision making once the machine learning training has been performed and evaluated.

Balancing the Cost Of Machine Learning

Machine Learning systems don’t come cheap and they also take time to develop, train and refine before being let out in the real world. The implementation of process also takes time and money to implement. The costs should only strengthen the benefits of the final process. I say all this but there is a “but”…

“The company is not paid on the number of letters issued, but on the basis of savings to public finances arising from correcting tax credits claims that are incorrect.”

It seems to me though that designing a system that would decide based on prior evidence and training may not work to the advantage of Concentrix (or any other company doing this work under these contractual measures) as the payment is based on money saved to the public purse. It would make sense for such a system NOT to exist in this case, this is a real shame.


There is a strong case here for a process driven algorithmic system to deliver aid in the final decision making process. Removing part of the analytical side away from staff means they can spend time with the customer, final decisions that require more analysis then are sent to a domain expert, the knowledge gained from that case is then fed back into the system for learning. The more cases the better the learning patterns.

Issues do arise though when the bottom line is based on money saved and not the amount of customers processed. This leads to rules being relaxed in the favour of the service company (though there is no evidence for that here may I add).


1 – People in need at risk of losing tax credits after being wrongly accused of cheating – The Independent 20th February 2015 – http://www.independent.co.uk/news/uk/home-news/people-in-need-at-risk-of-losing-tax-credits-after-being-wrongly-accused-of-cheating-10060745.html

2 – “Towards a Methodology to Elicit Tacit Domain Knowledge From Users” – http://www.ijikm.org/Volume2/IJIKMv2p179-193Friedrich328.pdf – Wernher R. Friedrich and John A. van der Poll


Jason Bell is a Data/Hadoop consultant based in Northern Ireland but helps companies globally with various BigData, Hadoop and Spark projects. He also offers training on Hadoop, the Hadoop Ecosystem and Spark to developers and anyone interested in what these technologies can do. He’s also the author of “Machine Learning – Hands On For Developers and Technical Professionals“.

Cutting down on the verbosity of #Spark messages

Spark shell is great but one of the major issues is the amount of logging it dishes out, it can get frustrating when you are trying to debug things.

Easily solved though.

In your SPARK_HOME/conf directory you’ll find a log4j.properties.template. Make a copy of it.

cp log4j.properties.template log4.properties

Edit log4j.properties with your favourite text editor and change:

log4j.rootCategory=INFO, console


log4j.rootCategory=WARN, console

When you restart the Spark shell you’ll have a fighting chance of seeing the output.


NI Software Skills – Reality Check Time (@PathXL)

Seems to me that things are hitting not-quite-crisis-point. So what I’m about to say is opinion and not a criticism of any of the fine companies involved. My word to NI tech companies is simple: Wanting to be a programmer is a choice, not an expectation because of demand.

“NI students ‘training for wrong careers’ says PathXL head.”

The first headline I read this morning……


The irony is that the photo probably illustrates the reason why so many people don’t want to be programmers. It comes across as a grey and boring profession. And to be fair, those views are at times justified.

I’ve said during my 27 year career as an engineer, programmer, technologist and big data/machine learning nerd that it takes a certain type of person to do this job. And while Mr Speed has a very good point, those mid tier jobs will be automated over the next 5-10 years, his words sound like a cry of a chief exec at the point of outsourcing.

It’s All About the Percentages

In December 2013 I was invited by Momentum to speak at one of their BringItOn sessions, as it was local to me I duly obliged as a civic duty to inform. Two things struck me that day.

1. No Belfast company bothered their arse to attend. If you want programmers so bad then you are going to have to find them, they may not come to you.

2. Out of a room of 400+ students about 10% stuck their hands up at the end about wanting to explore this career further. This didn’t come as a complete shock to me but I was talking to the students about it and asked why some of them didn’t put their hands up, “well, no offence but it looks boring”.

The Skills Shortage

Mr Speed is right though, jobs will go unfilled. Not just at PathXL but at AllState, Citi, Kainos, Liberty IT and all the other companies that have made announcements. All great companies, great stories and great results. Lovely people to boot.

Focusing on education to sort it all out, well the UK has been here before. Late 90’s the universities were pumping out computer science graduates like lemmings. All fine until the dot com bubble finally burst and the supply crashed to the floor.

What you cannot do is streamline a production line of government forced education to create programmers to satisfy the needs of companies. It’s not the done thing.

Education should be the rich fabric of disciplines, from science to art and everything in-between. Just because students want to study law or become a teacher does not mean they are wasting their time. Their prize could be the flight departing from Belfast International to a new life in another country. Who said they’d stay in the first place?

The Conundrum

The second one in six months, I’m doing well.

1. You can’t tell people what to do. Telling an individual they’re doing the wrong course or degree just to satisfy your company’s skills demands, well it’s damaging in the long term. Workers will get bored if their hearts and not in the profession and then leave. That leaves the company with the same problem longer down the line.

2. Experienced programmers are very difficult to find. Always have been and always will be.

3. Even on the mainland the question is popped, “Are you willing to relocate?”. So ask that here, would a good programmer be willing to relocate to Belfast? Let me put it this way, I’m classed as the rank outsider being in Limavady and knowing what I know.


(I’m the one on the far left of the graph).


Your money, your great working environment, your blow football table, your team get togethers…. they are not selling points but additional extras. Real programmers only are bothered about the challenge of the task in hand. Everything else is a little extra.

In terms of the skills gap I don’t think you can quickly educate your way out of it. Certainly not in the short term, perhaps in the 5-10 year bracket. So coding in schools is a good start but you’ll hit the same issue again and again….

….ultimately, programming and coding is not for everyone.

(And Mr Speed, I’m happy to talk this over any time. Here’s my phone number, 07900 316333).

Jason Bell is a Data/Hadoop consultant based in Northern Ireland but helps companies globally with various BigData, Hadoop and Spark projects. He also offers training on Hadoop, the Hadoop Ecosystem and Spark to developers and anyone interested in what these technologies can do. He’s also the author of “Machine Learning – Hands On For Developers and Technical Professionals“.



#Oscars – How did I do?

Yesterday evening I posted a bunch of predictions without resorting to data mining, Twitter analysis or reading anything by Nate Silver. Just good old guessing. In terms of a result then guessing didn’t do me too bad. Result 14/24 (58.3%)

[Update: Turns out that FiveThirtyEight’s predictions in the “top six” were the same,we got 5/6 (83%) and missed on the best director. I would have liked to have seen how Nate and Co. managed on the other categories which are much harder to predict.]


Winner: Birdman Prediction: Birdman


Winner: Eddie Redmayne Prediction: Eddie Redmayne


Winner: Julianne Moore Prediction: Julianne Moore


Winner: J.K. Simmons Prediction: J.K. Simmons


Winner: Patricia Arquette Prediction: Patricia Arquette


Winner: Big Hero 6 Prediction: How To Train Your Dragon 2


Winner: Birdman Prediction: Birdman


Winner: The Grand Budapest Hotel Prediction: The Grand Budapest Hotel


Winner: Alejandro Gonzalez Inarritu (Birdman) Prediction: Richard Linklater


Winner: CitizenFour Prediction: CitizenFour


Winner: Crisis Line: Veterans Press 1 Prediction: Crisis Line: Veterans Press 1


Winner: Whiplash Prediction: The Imitation Game


Winner: Ida Prediction: Ida


Winner: The Grand Budapest Hotel Prediction: The Grand Budapest Hotel


Winner: Alexandre Desplat Prediction: Hans Zimmer


Winner: Glory from Selma Prediction: “I’m Not Gonna Miss You”


Winner: The Grand Budapest Hotel Prediction: Into The Woods


Winner: Feast Prediction: Feast


Winner: The Phone Call Prediction: Boogaloo and Graham


Winner: American Sniper Prediction: The Hobbit: Battle of the Five Armies


Winner: Whiplash Prediction: American Sniper


Winner: Interstellar Prediction: Guardians Of The Galaxy


Winner: The Imitation Game Prediction: The Imitation Game


Winner: Birdman Prediction: Birdman