Interesting Posts w/e 12th April

A few posts that I faved on Twitter as I thought they were interesting reads.

What Etsy’s S1 Filing Taught Me About Market Places (link)

Everything We Wish We’d Known About Building Data Products (link) via @ckevinliu

The Best Loyalty Scheme Was Replaced With the Worst (link) via @thesidsmith

Why Startups Fail and How To Build Interest Before Launch (link) via @mattermark

Apache Spark’s Success: Overhyped or Preordained? (link) via @andrewbrust



Kids Coding, the Long Term View.

I’m all for teaching problem solving, logic, programming concepts and so on in schools. I’ll start with that. Getting everyone hyped up about how great a career in IT is, well I have my opinions on that but I really should keep those to myself.

There is potential for a real problem in the long term and the pictures painted aren’t all as rosy as people make out.

If It’s Repetitive It Will Be Automated

Get used to this, it’s happening at scale already. Companies will only ever be precious about the bottom line and shareholder value over the long term, that’s what companies are designed for. Now to reduce the bottom line well you want to automate whatever you can.

In the ever present search for the unicorn* company then automation is key to reducing overheads.

Everyone Must Code!

Now I’ve said this before, I don’t agree with the above statement. Not everyone must code. Teaching them the process, logic and other bits is fine with me. Being a programmer though is a choice.

What seems to be forgotten is that in some respects coding is a repetitive task, therefore can be (and is) automated. And with machine learning and deep learning what we are presented with is a case for self healing algorithms and code. I mean anyone with the time, determination and patience can put together a website in either PHP, Ruby or Python.

Clojure, that might take a little more time.

Back to the children though, what we saying is that “tech is really cool and you need to know all this to get a career later in life”. The reality could be far different. For example a child looking at Scratch now in P7 is a good 6, 7 or 8 years away from being available to the job market. So while the number of available skills is going up over the time period and a wave of programming talent is on the market, no one is talking about the potential downside.




The solid line shows the skills, the dotted line shows the potential requirement over time (in years). Now this is a simplistic view but with the rapid development of deep learning means that the requirement landscape could change dramatically. Legacy costs money to maintain, let alone the new and the funky.

While Moore’s law tells us the halving of cost while doubling of power. We’re not looking at the long tail of a career that has had once severe dip in the past after the dot com bust. Where a lake of great talent was jobless for a long time.

All Hail The New Startups!

Telling kids they could be the next Uber/Facebook/Twitter etc is all well and good, it’s not impossible and the creative process is great to do. More and more though we’re reading that doing a startup is an aside to the day job, fake it ’til you make it.

Let’s keep the 98% failure rate thing under the carpet though. And I know founders that will tell you the startup life is great and then spending a sleepless night wondering where the runway of cash is coming from so they do payroll in two months time. Not a life for every child in school, let’s be honest.

Everyone Can Be a Potential Unicorn

Very true, do a Unicorn Mask.


#DataScience for the confused. Ni Software Jobs – Part 2

[The post bag is full of single unit count postcards, okay one, with comments from my last post].

Anon from Northern Ireland writes:


As it’s a bank holiday it’s double time mate but I’ll look all the same.

No Pondering Required

No pondering required as we’ve done all the majority of the pondering we need to do and we have some scripts that will help us get the numbers. So a small ponder is required….


A Wee Bit of Prep

IT Sales and Business Development Manager are essentially sales roles so I’ll base my searches there. The category number is 24, compared to 3 which is for IT.

A quick switch of the keywords and we can release the script to an unsuspecting world.

for i in it+sales business+development+manager; do echo -n $i >> jobsoutput.txt ; curl$i\&Location=\&Category=24\&Recruiter=Company | grep "Total Jobs Found:" >> jobsoutput.txt ; done

And this time no messing around with awk, there’s only two numbers I’m after.

it+sales <label style="margin-right:30px;">Total Jobs Found: 31</label>
business+development+manager <label style="margin-right:30px;">Total Jobs Found: 19</label>

There we are, 31 and 19. Fifty jobs in grey, so to speak.

Back to the Hypothesis

Let’s have a look at the “anonymous” comment again.


So the question is are those numbers matching. For 50 sales jobs there should be 300 developers. A quick look at a basic search of all the IT positions shows 118 jobs. So that’s a factor of 2.36, a bit off the mark.

That does beg the question that given every sales position does an organisation already have the required number of developers and just a lack of sales? The conclusion here is that there’s a factor of 2.36 available IT jobs against the number of sales/bizdev positions.

For Further Discussion?

Is Northern Ireland a “normally functioning market”? Does the number of employed sales people match up with the required number of IT related people (a factor of six)?

Can I have my tea now????




#DataScience for the confused. NI software jobs. #unix #excel

Hypothesis: Java is a dead language in Northern Ireland. 

Let’s consider that for a second.


I’ve heard it for far too long. The “why did you do it in that!” or “no one ever does a startup in Java, it’s a dead language”. So let’s, while teaching some basic data science put this sordid tale to bed once and for all.

Get Thy Data!

I need job numbers and as most NI job postings don’t end up on JobServe there’s no point me looking there. NIJobs on the other hand does.





Now I know the searching brings up a HTML page of results. So based on feeding NIJobs with a programming language I’ll get some jobs in return. I’m  not interested in who’s looking, I am interested in a number, the total number of jobs.

If I want to look for Java jobs directly with employers in the IT category then I’d use the following URL:

You can see the key pairs of variables NIJobs site is looking for. The one I’m interested in is Keywords.


Thou Shalt Not Copy/Paste HTML

I need to automate this in some way, I don’t want to repeatedly type in the URL and copy/paste the data I’m looking for.

Now I’ve established the URL I can manipulate it a little, as it’s not just Java I’m looking for, I want to compare a range of skills.

  • Java
  • PHP
  • Python
  • Ruby
  • iOS
  • C++
  • Scala

Unix will help me here. Using curl and a for loop I can craft something quick and dirty to pull the information I need.

for i in java php perl python ruby ios c++ scala; do echo -n $i >> jobsoutput.txt ; curl$i\&Location=\&Category=3\&Recruiter=Company | grep "Total Jobs Found:" >> jobsoutput.txt ; done

In one line I’m grabbing doing the following:

  • Creating a for loop, each time it loops the value of $i will be that of one of the programming languages I listed.
  • I echo the name of the language but not a newline.
  • Then I use curl to retrieve the html page then pipe it through grep. The only thing I want is the line “Total Jobs Found: “, this is appended to the file. Notice how I’m changing the Keyword value to $i, each time the loop runs the language name will be inserted.

Cleaning the File Up Further

Currently what I have is:

java <label style="margin-right:30px;">Total Jobs Found: 34</label>
php <label style="margin-right:30px;">Total Jobs Found: 7</label>
perl <label style="margin-right:30px;">Total Jobs Found: 10</label>
python <label style="margin-right:30px;">Total Jobs Found: 14</label>
ruby <label style="margin-right:30px;">Total Jobs Found: 5</label>
ios <label style="margin-right:30px;">Total Jobs Found: 4</label>
c++ <label style="margin-right:30px;">Total Jobs Found: 9</label>

There’s tabs or spaces in there and then <label> tags which need to come out. What I want is a comma delimited file with the language and the number of jobs, nothing else.

I’m not an awk expert by any stretch of the imagination but I know enough to get my by and that counts for a lot. We’re not looking for scripting perfection but just a way to get the results I want.

awk '{sub(/[ \t]+\<label style\=\"margin-right\:30px\;\">Total Jobs Found: /,",")};1' jobsoutput.txt > jobstemp.txt
awk '{sub(/\<\/label\>/,"")};1' jobstemp.txt > jobstemp2.txt
awk '{sub(/\n/,",")};1' jobstemp2.txt > nijobsoutput.txt

The three awk commands will remove the tags and clean up the white space. Me being a complete noob on awk (I’m still an old Perl hack at heart) is outputting to new files each time, then I’ll clean up afterwards.

So, now I’ve figured out what I’m doing UNIX command wise I can wrap all this up in a shell script and run it every time I want to update the data.


# Iterate through the languages and pull the info from
# The echo -n means the loop won't output a newline.
for i in java php perl python ruby ios c++ scala; do echo -n $i >> jobsoutput.txt ; curl$i\&Location=\&Category=3\&Recruiter=Company | grep "Total Jobs Found:" >> jobsoutput.txt ; done

# Clean up the grep'd html to remove the tags and replace with a comma.
awk '{sub(/[ \t]+\<label style\=\"margin-right\:30px\;\">Total Jobs Found: /,",")};1' jobsoutput.txt > jobstemp.txt
awk '{sub(/\<\/label\>/,"")};1' jobstemp.txt > jobstemp2.txt
awk '{sub(/\n/,",")};1' jobstemp2.txt > nijobsoutput.txt

# Remove the files we don't need anymore.
rm jobsoutput.txt jobstemp.txt jobstemp2.txt

I’m saving this file and calling it ‘’, then quick modification to the file will mean I can run it from the command line.

chmod 755

Now I can test it.

Running the Script

A quick run of the script…


…..and it whirls into action. Once it’s finished I can look at the output and see what I’ve got.


It’s worked nicely. Scala doesn’t have any job listings so there’s no output. I do want to leave it in the script just in case any jobs do pop up in the future.

Going All Florence Nightingale

Right, something that will put a pie chart together quickly. I’m not proud, Excel will do me fine. Yes I could use R, D3, Google Charts or Tableau but all is well with the world and Excel will do what I want it to do. Path of least resistance.


Opening the file in Excel it will import the text file easily. Selecting the whole data set and clicking on charts, pie chart and I’ve got an instant visualisation. No messing about.


Back To That Hypothesis…..

We started with a statement of “fact” fused into the psyche but developers in the land.

Java is a dead language in Northern Ireland. 

Well I think we’ve proved that the data from NIJobs would suggest otherwise. 41% of the IT jobs listed by employers are looking for Java developers. The interesting part is the bottom ones, Ruby and iOS with 6% and 5% respectively.

To be fair this is one jobs website, you can’t measure word of mouth and I’ve not looked at the likes of Twitter or any other social media outlet. As a hypothesis it needs more refinement and investigation.

This though is only the start of the conversation, we know there’s a healthy freelance market for Ruby and iOS but it seems that the more established enterprise companies are on the lookout for more “traditional” languages.

One thing I do know, we just covered some simple data science. So even if you don’t agree you have the tools to do some yourself.



Would you place a bet on your startup like @nilerodgers did?









So what does Nile Rogers (2nd from left in the picture, but you know I’m a bassist and Stick player so I have to have Bernard in there too) have to do with your startup?  To be honest, little apart from a measurement in your self belief and the belief in your own venture. That though counts for an awful lot.

The assumption that every startup is going to go through the same auto pilot cycle of idea > some money (public or otherwise) > accelerator > pivot > rinse and repeat is so well documented and overly adopted.  You can Lean Startup it, Business Model Canvas it or Personal MBA it or a mixture of. Bootstrappers of world, you’re not forgotten either.

That Difficult Second Album

Madonna’s first album was, and is, a classic. Make no bones about it Borderline and Lucky Star are pretty much perfect pop songs. Even so sales after the first year were in the 750,000 mark and Nile Rodgers was brought on to produce the next one, you might have heard of it….



The self belief that Nile had in Madonna as an artist made him so sure of the way he wanted to do things. He was happy to for go his advance (in startup land, let’s call it funding). Now the advance covers you while product is being made and is then essentially paid back against revenues of the product until a certain amount is reached.

The idea that Madonna could sell five to six million copies of Like A Virgin easily (this wasn’t risk analysis, I believe it was sheer self belief, “I placed a bet on myself” were his words) that he was willing to not take an advance but be paid a higher royalty from the sale of copy number 1 onwards.

It paid off, 21 million copies sold and a higher royalty from copy 1.

Bet On Your Own Startup?

I believe anyone who can stand in front of a pitch panel, investor or accelerator and say, “We’re gonna be the next {x} of {y}” or “We’re gonna disrupt {X} industry” are usually great story tellers, able to convince a set of folk who hold money (their own or otherwise) to hand it over to the story teller. It’s little to do with belief at the time.

Having the belief to get actual customers and make revenue is a different matter. The story telling will only go so far.


The Challenge

So here’s the question, could you look in the mirror and say, “I’m going to bet £10/£50/£100* that my venture is going to make £100,000/£750,000/£1,000,000,000* profit in 1/3/5/10 years*”? Even better find a way to make that bet concrete (I know there are all reasons not to bet, if you’re uncomfortable with it I understand, you’ll need to find another way). What could you forgo in the short term to increase the upside when the venture is successful?

Even better, announce your intention. There’s nothing exposure to public ridicule to galvanise the attention. If you fail then that’s fine, it happens, been there plenty of times myself.

For me personally I don’t take funding, simple as that, build it then find customers. I choose my mentors based on the industry, not the locality.  And I’m in for the long play, not short term gains. There’s timing in everything, you might think you’re ready but the rest of the world may not.

* – Delete as applicable

#Hadoop #HDFS in Safemode, here’s how to get out of it.

It happens once in a while. When you are performing an operating with HDFS like adding new data you may see this message:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
put: Cannot create directory /users/txt6. Name node is in safe mode.

Safemode in HDFS can be switched off with one command:

hdfs dfsadmin -safemode leave

Once you’re run that command you will be able put data in to HDFS again.

|LIVE NOW| #meerkat How using Twitter’s user base may end badly.

Over the years (I can say that now) I’ve had the opinion that solely using Twitter as a user generation tool usually ends up bad. I also have the sneaking suspicion that history will no doubt repeat itself with AppMeerkat.


Live Video – The Natural Progression

On the surface it makes complete sense, live video. From messages, to short messages, to photos, to snippets of video to live video, yes makes sense to me. Just call it technical evolution. If you also look at the history of these developments a number of companies were thoroughly dependent on a third party platform to boost their user base.

This too made sense as a large volume of users were hanging out in either Facebook, Twitter or both.

Cherry Picking The API

The common growth strategy to get to 1 to a million users is to go via a third party like Twitter and use their user API to log on to your application. You get the user details and some quick traction. Keep auto posting back to Twitter when a user does something and all’s sweet. It’s not.

Remember this Twitter slide from 2012?


From their “we’re changing the API” post basically saying if you’re a Twitter client then you’re stuffed. In developer communities it sent mini shockwaves of how startups would get traction. In the early days Twitter wanted rapid user growth too so opening up the API and getting developers to create client applications and not costing Twitter a penny, well it was the perfect plan.

The cherry pick happened later, Twitter could look at all the apps using their API and acquire the better ones and attempt to kill off the useless ones. A form of digital natural selection. There were plenty of casualties.

Looking at this image again three years later only reinforces to me one thing, everything in those four quadrants is a moveable feast.

Not Just Apps, Look At Watches

It’s not just apps, look at watches. The cherry pick can happen anywhere. Apple have waited a long time to announce a watch, why? Well to see what everyone else was doing first. In the meantime stock the competitor so there’s upsell revenue coming in and then when you’re ready you cut the competitors out of your ecosystem. Simple. Don’t believe me, Look at the fitness things being removed from Apple Stores in waiting for the watch…..

Predicting AppMeerkat’s Rapid…..

Twitter own Vine, six seconds of video looping. I wager that every Vine in house developer is working on live video right now. The cherry pick is already happening with Twitter restricting automatic tweets for Meerkat. Even Meerkat’s CEO Ben Rubin isn’t convinced his startup will outlast the hype.

“People get excited by the novelty of live streaming, but it wears off,” Meerkat CEO Ben Rubin cautioned me on Skype from Israel.” (from the Gigaom piece 5th May 2015)

At least this time around there’s a CEO under no illusions.

The timing of all this is the perfect storm for startups, pundits and grumpy ones like me. SXSW is going through the motions and the tech press is looking for that startup showpiece that stole the show in Austin. Attendees are live streaming now and on the surface it’s stealing the show.

Cast your minds back to the Foursquare and Gowalla launch battle at SXSW then think, what happened since?