Talking #Kafka and #DeepLearning at #BigDataBelfast

A very quick heads up, I’ll be talking at Big Data Belfast on Thursday 18th October. The talk is an walkthrough of a complete system that creates automated learning system using Kafka and DeepLearning4J.

You don’t need any programming knowledge, I’ll be explaining everything in English. The idea is to show what a full system is like from data acquisition through to predictions.

Not everything in the AI and Machine Learning space is a basic TensorFlow program written in Python with a very limited set of data….. 😉



Belfast we need to talk about Norwegian Airlines – #travel #aviation #airports #northernireland

The northern ireland press lost it’s plop again….

The last 24 hours has been along the lines of OMG! Belfast to New York flight ends at the end of October…… but hey.

And the bet I made with myself at Routes Conference came to pass. Simply because when you have a lot of route experts in the room, well you go talk to them in the breaks, that’s when the conversations happen. I really should have put money on it.

It was going to happen…..

…..deep down we all knew it was going to happen because….. well simple supply and demand. There’s little demand so there was no point Norwegian stepping in to save a Belfast to New York route. The writing was on the wall when United ceased operations. When a £9m bailout is hastily approved (although illegal) you have to ask the question as to why when “Go to Iceland, see the sights for a day and then go to New York” was a better option, especially if you like rancid shark.

With < 20,000 passengers a year capacity (two fights a week and 189 passenger capatcity in a 737MAX) and a wafer thin profit margin, I’m surprised it flew at all. Load factors need to be in the mid 90’s to make a profit and with Norwegian’s IRO 88% load factor there’s a good probability it was loss making before it started.

Sadly I don’t have the actual numbers in my hand….. and any argument about airline passenger duty, well I’ve covered that before. A partially pointless exercise.

As far as the actual aviation press goes, and not BBC Newsline or the BelTel, this was a complete non story, no one has yet to cover it that I’ve seen.

Dear Belfast, Dublin Airport is your hub

And while this is going to be an unpopular opinion, your hub to the rest of the world is basically Dublin. And when it comes to US flights then hands down it is Dublin as the immigration pre clearance just makes like easier. And that’s what customers want oh and shopping, we all need the shopping, it’s emotionally programmed in to us.

Belfast is not a well connected airport in the grand scheme of things if you reverse the route. Coming in from New York you have a better hub from Dublin,

Cheap fares are not always cheap either. Once you add luggage, seat options and food how much are you actually saving compared to a “traditional” airline?

A number of airlines wanted to do Atlantic hops for years and when Ryanair were eyeing up Aer Lingus in times go by the aviation press though it was for transatlantic routes, it was never on their radar. This was happening around the same time that business class airlines were doing Atlantic routes, they didn’t last long, lack of demand. Sooooo……

The sad fact for Belfast types is this, you can drive to Dublin Airport in 100 minutes or so, or you could get the Aircoach or that other thing that Translink do to get you there.

Feed into Dublin

So we’ve established there’s a feed into Dublin Airport from Belfast (your car, train or private jet if you’ve got the money). Derry is another issue, and this is where it actually needs a route from City of Derry Airport in to Dublin Airport.

Now the tattered tale of Citywing’s flight into Dublin, whether it was planned or not, I’ve spoken about before. It is though something that does need to happen.

Now there’s nothing to stop that flight nipping down via Belfast and doing a pick up and then onto Dublin for pre clearance in to New York. The airports have to connect in a sensible way. Belfast International offering flights to the US is fine if they are self sustaining, which as it stands they are not.

To Conclude….

Loss making airlines don’t hang about once the subsidy has been spent. And that goes for any airport, not just Belfast.

I won’t leave it all bad, here’s a list of airports that do want to land at BFS: Aberdeen International, Billund, Brussels, Cambridge, Cologne, Faro (again), London Oxford, Munich, Murcia and Shannon (LAND IN SHANNON! Hint!).


Startups: Dare yourself to be scrappy again. #startups #hustle #product #b2b #b2c

I’ve noticed that people get tetchy and nervous when I speak my mind.


And I for one, well I make no apologies. It’s needed me to binge watch the entire collection of “Halt and Catch Fire” (You can watch it now on Amazon Prime and Netflix US) to remind me how much I love being in a scrappy startup.  Introduced to me by a friend, “I cannot believe you don’t know about this!”, he suggested I watch it immediately, he was 100% right.

Scrappy startups are wonderful, exciting and should have you on the edge of your seat. How quick, how “get it out there” can it be done? Keep it in stealth, no one need know, not just yet anyway. Too many times startups are just announcements. I’ve been there I’m guilty of it too.

Startups that can’t be built until someone passes the money across are essentially dead on arrival in my opinion. “We’re waiting on POC funding before we build” basically says, we have no one who can, or is willing to, code. Be scrappy, buy a book and get on with it.

I love the scrappy, hurried and out there as quick as you can ideas. I miss the “WTF let’s try this!”. I miss the white board sessions in Santa Clara back in 1999, I miss the 11pm curry that would get us through the milestone…. though I’m not sure what my doctor would say now.

There are 24 hour in a day, use them (and reserve eight of them for sleep). Too much cold pizza and warm beer networking sessions keep you away from building, no point attending until you’ve built.

HaCF left me in tears at the end for various reasons. The one thing it did impress on me then most, I love being in a scrappy startup that just doesn’t care what the outcome is, we just tried. And revolving around it all is relationships, it’s not about the tech, it’s about how the tech brings the people together, whether that’s users, investors or the team itself. “We’re building the thing to get to the thing….”

The current state of the startup scene is all to safe, it’s all too samey, it’s about incubators and accelerators, the questionable story tellers and the ideas folk and their blue sky funding. It’s about accounting firms suddenly with startup areas, it’s not about what they have, your startup is about what you have and what’s in your heart. My thoughts here are hardly new, I wrote about it four years ago in “Startups: The Passion and the Paradox“.

And it all starts off with an idea and being scrappy.

Here’s to being scrappy, it’s nice to be back.

I can tell more about your company by how you offer #bacon – #startups #business #meetings

Within a minute I can tell how a meeting and a long terms business relationship is going to go, especially where bacon is concerned.

Can I get you a drink?

Tea, coffee and water are a given. Everyone needs at least one of these things to function. So as a device for meeting and business success prediction it’s fairly weak.

Bacon however changes all that, there’s currency involved and money has been spent.

The Meeting

Proposals were put forward by management, multiple phone calls on whether the company could perform such an operation. All minds put at rest and a date for a pitch set. As an employee I get the call, can I fly over to England to help out on the pitch from a technical viewpoint. Flights booked, info got and meet at the hotel for lunch, then an afternoon planning for the pitch the following morning.

It’s not a small client, this is a household name. And on the morning driving up to the offices you get a sense of scale. Five of us arrive in reception, told to wait and then assistance arrives. Walking through oak paneled corridors you get a sense on the money sloshing about in the industry they’re the leader in.

First thing I and another tech colleague spot is a large platter of bacon baps (the word sandwich gives the sense of white bread slices, it is not this at all, they are baps). “We will do well here” was the general feeling on seeing at least fifty fresh bacon breakfast treats, just as well as breakfast was skipped for a final pitch run through.

The team were assigned to the far side of the table, maximum distance from the bacon platter. “It’ll be okay, they’ll be offered around”, as more staff filed in and sat down nearer the bacon. As people sat down they all passed the platter and picked up a bap and tucked in, the unwritten rules were in play. And while the tea and coffee was poured out the meeting started and at that point I knew the meeting was going to lead to problems.

Three Hours Later

The pitch started, finished and a long drawn out question and answer session continued. We’d had our one cup of tea and the bacon mountain had hardly moved. If a client can’t offer you a bacon bap and extend an arm of confidence, trust or bacon then I will question the long term plan of the client.

As the meeting concluded there were a lot of handshakes (15 client representative, you can work out the combinations for a team of five) and nods of heads, small talk and a large platter of untouched, cold and destined for the bin, bacon baps.

Myself and my colleague gave the platter a final look and as we walked out of the reception area on to the street I said to them, “This is not going to go how we want it to go.”.

I was 100% on the money. Ropey specifications, holding on to information, internal politics like I’d never witnessed before, manipulation of third parties – it very nearly killed the supplier I worked for  and other suppliers too (some others being household names too).

It doesn’t have to be bacon

For me those first meetings tell me everything and I know it’s been documented a thousand times over. I’ve never seen a bacon bap platter since so my focus will go on something else. I think Cloudera were right though, “Data is the new bacon”, bacon taught me an awful lot of decision process, meeting psychology and staff placement in a meeting room. It’s like wedding planning but with more bacon…..

Influencers and Hotels: A system to measure the effectiveness…. #influencers @funforlouis #youtube #marketing #instagram

Remember the old data science T-shirt?

“In God we trust. All others must bring data.”

It’s all about measurement

Measurement is always important, especially for brands, without it you can’t measure return on investment. A lot of marketing campaigns are like that, just throw it out there and hope that someone clicks on the link or at least looks at the landing page. If you’re using an email system like MailChimp then you can see a lot of metrics on opens, bounces and so on. From what I’ve witnessed and thought about this year, influencers don’t seem to be able to measure and this got me thinking.

There’s an increasing distrust between hotels and influencers. To be honest I find the whole approach baffling to a degree which goes along the lines of “If you give me a free room for seven nights I’ll post this stuff on my social media accounts as exposure, I’ve got 10,000 followers”, for some it works so in some respects I’ve got a lot to learn about sticking my neck out and asking for stuff……

Basically…… Jack Bedwani from The Projects put it perfectly in a recent piece in The Atlantic: “They get five to 20 direct inquiries a day from self-titled influencers,” he said. “The net is so wide, and the term ‘influencer’ is so loose.”

Different Platforms Bring Difference Challenges

The path to measurement is not straightforward as one might hope. Thinking of the main platforms of use: Instagram, YouTube, SnapChat and Facebook – each have their own way of doing things. What these mainly measure are views, okay as a starting point but ultimately a pointless measure. “So my picture of your hotel was seen by 2,576 people.”, so what?

With Instagram you can’t embed links in the posts. You can add a link but the reader will have to copy/paste or type it in themselves. So any form of tracking at that point is out of the window. I know, I tried. You can though add the username of the sponsoring account but how on earth do you measure that, well you don’t.

A number of companies have tried the whole embedding links on YouTube video, from Belfast had a good shot at it and gained some traction, sadly it seems to be no more. With YouTube there is at least a decent description block to put all your links in….. watch YouTube on a PS4 console for example and all the decent info is lost or just plain hard to get to.

Measurement is a challenge.

A mini case study: Let’s look at Louis

Louis Cole, better known as FunForLouis is a traveller, vlogger and seems to be a nice guy (never met him but I do watch his stuff). He does a lot of sponsored stuff for various brands like Nokia, Google and tourism departments. The videos are nice and with a small group of friends/influencers has his own boutique influencer consultancy Live The Adventure Club.

Looking at his last video I’ve made a couple of observations, keeping in mind this is just a first pass. I’m not going knock Louis, that’s not what I’m doing I’m just using the video as an example.

Brands will always want to know what the reach is. Louis has 2 million subscribers who get some sort of notification when a new video goes up. The real metric though is the actual views or an average of the views over the last twenty videos. Which I’m estimating to be about in the 45k area which is 2.25% of Louis’ subscriber base. Is that something to be worried about? I have no idea, these are merely observations, my biased opinion would think it’s a red flag but brands go with Louis as he has a reputation and that counts for a lot online so I totally get why a brand would go with him, he’s a safe pair of hands.

The place for measurement……

Here it is, the YouTube description. Let’s look at Louis’ video again.

There’s a lot of real estate here and it’s being used. There’s 18 lines of text here (and a lot more underneath). One is used for the sponsor and, for me the ultimate marketing sin, there’s no link back to the sponsor. Is there a reason why? Is there something I’m not seeing?

It’s not like there can’t be affiliate links as Louis has links to all the computer and camera gear he’s using with Amazon partner links to purchase which give a kickback in the form of cold hard cash. So it’s not like it can’t be done, it just hasn’t been done.

As a sponsor that’s exactly what I’d want to be measured. So the question is, how?

We need a measurement system

Oh yes we do. And I’m going to go back to my influencer/hotel model here because it’s been the one I’ve been thinking about the most, especially when influencers promise all sorts without any form of reference or reputation. Remember brands, anyone can buy 10,000 users for very little money. And to some folk perception is everything, “I’ve got 100k fans!”.

So influencer approaches hotel:

“If you give me seven nights at your hotel in return I’ll do two five minute YouTube videos.”

That’s fine but it shouldn’t happen for free. Right now this is just bartering with no negotiation. And the job comes in the haggling. It’s all about ceilings and floors.

How about the hotel does a 25% discount and along with it a custom link. For every successful booking that comes through that link we’ll deduct 5% of the booked amount from your account. If you get a lot of bookings via your videos then you’ll start making money.

For example: The Eden Andalou Aquapark and Spa is currently £1852 for the Executive Suite per person.  If the 25% influencer discount was applied it would be £1389. Every successful linked booking (it has to come from the video otherwise you can’t track it) would deduct £92.60 from the influencer’s balance. Fifteen bookings via the video would clear the balance and then after that the influencer would be in profit.

If you’re video is getting 15k views then that’s 0.1% click through to purchase, that’s not actually bad going. If 5% of the 15k clicked on the link (750 people) and 5% booked (37.5 but let’s call it 38) then £3,518.80 revenue to the influencer is a nice week at the very plush office. That’s assuming everyone’s booking the same suite as you were in. The hotel will have done well out of the deal too, a return on investment: £70,376 – £3518.80 = £66,857.20

Personally I think it’s important that influencers can think this way. This is what brands will want to see going forward, this kind of thinking. The issue will always be traceability.

It’s a numbers game, it always has been and it always will be.


Brands want to de-risk, it’s as simple as that. Anyone can be an influencer and with so many influencers in the eco-system it’s it’s really difficult for any brand to figure out who’s going to be best for the brand.

So, there has to be sacrifice on both sides. A hotel doesn’t want to lose money and an influencer doesn’t want to lose out on a decent place to stay. (I’ve yet to see an influencer do a sponsored video for Premier Inn or Travelodge).

I think fashion is different, cost of production of dresses is vastly different, easier to ship and it’s just eyeballs at that point, like most fashion advertising, it’s just a sunk cost. If this concept could be used, I’d be on the phone to Zara by the end of the day.

What I’ve outlined is hardly new, it’s affiliate marketing mechanics with measurement on both sides. I’ve just added it to my list of things to build…..

The real challenge is the platforms themselves, I’ve proved out that YouTube is workable, I’m assuming that Facebook posts would be the same. It’s when you get to the mobile platforms that measurement gets tricky. And that’s where the work needs to happen.

Any thoughts? Pop a comment below.

Strata London Talk Slides and Code. #stratadata #kafka #dl4j

Photo 24-05-2018, 17 39 27(1)

Proof if proof were needed…. so Strata London was an absolute pleasure to talk at. Those who know me know I’m a big fan of the conference, so to talk at it was an additional boost. Who knew that Strictly data could cross boundaries like that.

The slides are available on the Strata website and the proof of concept source code is on github.

Many thanks to those who were supportive, gave advice and generally gave me lots to think about during the two days I was there. Also the photo was taken by Ellen Friedman and borrowed via Twitter…..


Simple Linear Regression in 2 minutes. #machinelearning #linearregression #java

With certain data Simple Linear Regression wins and while the rest of the ML/AI world push tools that are far larger scope than needed for most, sometimes our best tools are hidden in plain sight.

Apache Commons Math, old, kinda forgotten but kinda cool, well Simple Linear Regression is hiding in there and is easy to put together.

1. Add the dependency

Put this in your pom.xml file…..

<!-- -->

2. Import the class

In your Java class add this import statement.

import org.apache.commons.math3.stat.regression.SimpleRegression;

3. Add your two data points

I’m reading in a list of comma delimited strings so I’m parsing and converting them. The basic premise of building the model is simple though….

public SimpleRegression getLinearRegressionModel(List<String> lines) {
  SimpleRegression sr = new SimpleRegression();
  for(String s : lines) {
    String[] ssplit = s.split(",");
    double x = Double.parseDouble(ssplit[0]);
    double y = Double.parseDouble(ssplit[1]);

return sr;

3. Make some predictions

The SimpleLinearRegression class will give you back the slope and intercept, from there is plain sailing to make a prediction.

private String runPredictions(SimpleRegression sr, int runs) {
  StringBuilder sb = new StringBuilder();
  // Display the intercept of the regression
  sb.append("Intercept: " + sr.getIntercept());
  // Display the slope of the regression.
  sb.append("Slope: " + sr.getSlope());
  // Display the slope standard error
  sb.append("Standard Error: " + sr.getSlopeStdErr());
  // Display adjusted R2 value
  sb.append("Adjusted R2 value: " + sr.getRSquare());
  sb.append("Running random predictions......");
  Random r = new Random();
  for (int i = 0 ; i < runs ; i++) {
    int rn = r.nextInt(10);
    sb.append("Input score: " + rn + " prediction: " + Math.round(sr.predict(rn)));
  return sb.toString();

Job done.

Now remember the key metric is the R2 score, sr.getRSquare() from your model. It’s a number between 0 and 1. 0 is pointless and the model shouldn’t be used, 1 is basically the most accurate model you can get. Anything less than 50% is basically less reliable than a coin flip. Aim for a minimum of 0.8 (80%) and you’re well on your way to bragging about your predictions at the pub, or on Twitter, or Facebook or at the pub on Twitter and Facebook……



Basic calculation for hidden nodes in a Neural Network #hackthehub18 #ai #machinelearning

I have this written down in a number of notebooks but I’m leaving it here for two reasons:

  1. It just took me twenty minutes to find the right notebook.
  2. It’s Hack The Hub in Belfast and it’s all about Machine Learning and AI.

How Many Nodes In A Hidden Layer?

I want to get a rough idea of the number of nodes to use in a hidden layer in my neural network. Too few or too many and this can have an impact on the accuracy of your training model. You’ll see the outputs of your training accuracy during evaluation (accuracy and F1 scores).

There is a common equation available to give us a rough number.

A scaling factor multiplied by the total number of input and output nodes, divided by the number of samples in the training.

In Clojure it looks like this:

user> (defn node-calc [inputs outputs sample-size scaling]
 (double (/ sample-size (* scaling (+ inputs outputs)))))

The scaling factor is just an arbitrary number between 2 and 10. It’s worth mapping through the range to get a feel for the scores.

Let’s Build a Case

My neural network has 1 input node and ten output nodes (ten possible prediction results), to train I’ve got 474 instances of input data to train. I’m going to map the scaling factor from 2 to 10 so I can see the range of node results.

user> (clojure.pprint/pprint (map (fn [s] (node-calc 1 11 474 s)) (range 2 11))) 

Guess how many times I’d run the model training and evaluation? I’d test all the rounded up/down results and see how the F1 score looks.


The Fact Your #Data is Being Used Should Be a Surprise to No-one

It’s been an interesting weekend for my field of work. Especially in an industry where I do stuff with data….

Ellie Mae O’Hagen wrote a piece called “No one can pretend Facebook is harmless fun anymore” and it’s not a bad overview of where things are. The last line says it all:

“…because people with Facebook profiles aren’t the company’s customers: they are the product it sells to advertisers.”

Which is basically the worst kept secret in technology companies, entrepreneurs and tech “thought leaders”. The value is in the data and once you figure out how to monetise that then a free product to customers is no bad thing.

Anyone who knows me knows my love of customer loyalty data, I’ve worked with it since 2002, mined Nectar card data and came up with recommendations via vouchers and offers on how to get customers to change behavior. The Cambridge Analytica approach is far from new it’s just the domain it was applied to.

Once you know you can change another persons behavior there comes a sense of responsibility with it. As the custodians of the data you now have the power the change the course of another person’s future without their knowledge. That thought alone is scary as I know some that would exploit it for profit like squeezing a grape until no more juice would come out.

So think about it all, every card, whether it’s loyalty cards, bank cards, your medical records on the GP’s system. Do the likes of Tesco/Dunn Humby have a public list of where their Clubcard data is sold? Probably not.  I asked a question during a Big Data Week panel in 2015, “Who has a Clubcard?”, pretty much all the room, “Who wouldn’t mind if your shopping habits were passed on to the insurance company?”, all hands with the exception of two went down very quick.

Telephone call logs are another and the classic line, “we may record your call for training purposes”, training what exactly? Another customer representative or an machine learning or AI tool to decide whether to keep your custom. How do we know, well we never do because we never find out.

Will the events of the weekend turn the tide against Facebook, it’s 50/50. I mean 50m users of Facebook is only about 2.5% of the user base and most hardened cat/dog/baby picture posting users won’t care. If I were to bet, I’d said probably nothing much will happen.

The only people who need to change are you and me, about what data goes were, how it will be used and how to have it deleted when we’re done with that service.


Setting up Org Mode and Babel for the Nervous #emacs #vi #babel

I’m claiming a moral victory for my sanity here…..

As a die hard vi user Emacs occasionally confuses me, I’m happy to admit that. Many a time I’ve pressed :wq instead of C-x C-s when it comes to saving files.

Thing is Emacs has loads of goodies that I never get to quite try out. Org mode being able to run scripts and stuff is one of them. Curiosity has now given way to requirement so it was a wrestling match to get it working (and reading the documentation did help…. I admit). There’s probably better ways to do this but it worked for me.

Installing Org mode

Open your init.el file and add the following line:

(add-to-list 'package-archives '("org" . "") t)

I’m assuming that you’ve already got (require 'package) in your init.el file, if you don’t then add that above the add-to-list line.

Open your Emacs, you now need to install the org package and org-contrib package.

M-x list-packages

You will see org and org-contrib listed (mine are at the top). Install them both, click on the title then click “Install”. Emacs will output a load of stuff but all is normally well. With that done we can now make sure that we can run bash from within Org mode.

Enabling Bash within Emacs

Open your init.el file again. You will now add the org-babel command to load the shell so it can be called from your org file. I usually add this stuff at the bottom of my init file.

(org-babel-do-load-languages 'org-babel-load-languages
 '((shell . t)))

Save the file and restart Emacs.

Testing within Org mode.

So far so good, now to test.

Either open an org mode file or create one. Now add the following:

#+BEGIN_SRC bash
echo 'this is a test'

Then evaluate the output by using C-c C-c and you will be prompted Evaluate this bash code block on your system? Respond with yes.

Look at your org file again and you should see the output.

#+BEGIN_SRC bash
echo 'this is a test'

: this is a test

That’s good enough for me and now I have notes on how to get it working on my work machine tomorrow morning (I’ll forget if I don’t write it down).