I can tell more about your company by how you offer #bacon – #startups #business #meetings

Within a minute I can tell how a meeting and a long terms business relationship is going to go, especially where bacon is concerned.

Can I get you a drink?

Tea, coffee and water are a given. Everyone needs at least one of these things to function. So as a device for meeting and business success prediction it’s fairly weak.

Bacon however changes all that, there’s currency involved and money has been spent.

The Meeting

Proposals were put forward by management, multiple phone calls on whether the company could perform such an operation. All minds put at rest and a date for a pitch set. As an employee I get the call, can I fly over to England to help out on the pitch from a technical viewpoint. Flights booked, info got and meet at the hotel for lunch, then an afternoon planning for the pitch the following morning.

It’s not a small client, this is a household name. And on the morning driving up to the offices you get a sense of scale. Five of us arrive in reception, told to wait and then assistance arrives. Walking through oak paneled corridors you get a sense on the money sloshing about in the industry they’re the leader in.

First thing I and another tech colleague spot is a large platter of bacon baps (the word sandwich gives the sense of white bread slices, it is not this at all, they are baps). “We will do well here” was the general feeling on seeing at least fifty fresh bacon breakfast treats, just as well as breakfast was skipped for a final pitch run through.

The team were assigned to the far side of the table, maximum distance from the bacon platter. “It’ll be okay, they’ll be offered around”, as more staff filed in and sat down nearer the bacon. As people sat down they all passed the platter and picked up a bap and tucked in, the unwritten rules were in play. And while the tea and coffee was poured out the meeting started and at that point I knew the meeting was going to lead to problems.

Three Hours Later

The pitch started, finished and a long drawn out question and answer session continued. We’d had our one cup of tea and the bacon mountain had hardly moved. If a client can’t offer you a bacon bap and extend an arm of confidence, trust or bacon then I will question the long term plan of the client.

As the meeting concluded there were a lot of handshakes (15 client representative, you can work out the combinations for a team of five) and nods of heads, small talk and a large platter of untouched, cold and destined for the bin, bacon baps.

Myself and my colleague gave the platter a final look and as we walked out of the reception area on to the street I said to them, “This is not going to go how we want it to go.”.

I was 100% on the money. Ropey specifications, holding on to information, internal politics like I’d never witnessed before, manipulation of third parties – it very nearly killed the supplier I worked for  and other suppliers too (some others being household names too).

It doesn’t have to be bacon

For me those first meetings tell me everything and I know it’s been documented a thousand times over. I’ve never seen a bacon bap platter since so my focus will go on something else. I think Cloudera were right though, “Data is the new bacon”, bacon taught me an awful lot of decision process, meeting psychology and staff placement in a meeting room. It’s like wedding planning but with more bacon…..


Influencers and Hotels: A system to measure the effectiveness…. #influencers @funforlouis #youtube #marketing #instagram

Remember the old data science T-shirt?

“In God we trust. All others must bring data.”

It’s all about measurement

Measurement is always important, especially for brands, without it you can’t measure return on investment. A lot of marketing campaigns are like that, just throw it out there and hope that someone clicks on the link or at least looks at the landing page. If you’re using an email system like MailChimp then you can see a lot of metrics on opens, bounces and so on. From what I’ve witnessed and thought about this year, influencers don’t seem to be able to measure and this got me thinking.

There’s an increasing distrust between hotels and influencers. To be honest I find the whole approach baffling to a degree which goes along the lines of “If you give me a free room for seven nights I’ll post this stuff on my social media accounts as exposure, I’ve got 10,000 followers”, for some it works so in some respects I’ve got a lot to learn about sticking my neck out and asking for stuff……

Basically…… Jack Bedwani from The Projects put it perfectly in a recent piece in The Atlantic: “They get five to 20 direct inquiries a day from self-titled influencers,” he said. “The net is so wide, and the term ‘influencer’ is so loose.”

Different Platforms Bring Difference Challenges

The path to measurement is not straightforward as one might hope. Thinking of the main platforms of use: Instagram, YouTube, SnapChat and Facebook – each have their own way of doing things. What these mainly measure are views, okay as a starting point but ultimately a pointless measure. “So my picture of your hotel was seen by 2,576 people.”, so what?

With Instagram you can’t embed links in the posts. You can add a link but the reader will have to copy/paste or type it in themselves. So any form of tracking at that point is out of the window. I know, I tried. You can though add the username of the sponsoring account but how on earth do you measure that, well you don’t.

A number of companies have tried the whole embedding links on YouTube video, Taggled.tv from Belfast had a good shot at it and gained some traction, sadly it seems to be no more. With YouTube there is at least a decent description block to put all your links in….. watch YouTube on a PS4 console for example and all the decent info is lost or just plain hard to get to.

Measurement is a challenge.

A mini case study: Let’s look at Louis

Louis Cole, better known as FunForLouis is a traveller, vlogger and seems to be a nice guy (never met him but I do watch his stuff). He does a lot of sponsored stuff for various brands like Nokia, Google and tourism departments. The videos are nice and with a small group of friends/influencers has his own boutique influencer consultancy Live The Adventure Club.

Looking at his last video I’ve made a couple of observations, keeping in mind this is just a first pass. I’m not going knock Louis, that’s not what I’m doing I’m just using the video as an example.

Brands will always want to know what the reach is. Louis has 2 million subscribers who get some sort of notification when a new video goes up. The real metric though is the actual views or an average of the views over the last twenty videos. Which I’m estimating to be about in the 45k area which is 2.25% of Louis’ subscriber base. Is that something to be worried about? I have no idea, these are merely observations, my biased opinion would think it’s a red flag but brands go with Louis as he has a reputation and that counts for a lot online so I totally get why a brand would go with him, he’s a safe pair of hands.

The place for measurement……

Here it is, the YouTube description. Let’s look at Louis’ video again.

There’s a lot of real estate here and it’s being used. There’s 18 lines of text here (and a lot more underneath). One is used for the sponsor and, for me the ultimate marketing sin, there’s no link back to the sponsor. Is there a reason why? Is there something I’m not seeing?

It’s not like there can’t be affiliate links as Louis has links to all the computer and camera gear he’s using with Amazon partner links to purchase which give a kickback in the form of cold hard cash. So it’s not like it can’t be done, it just hasn’t been done.

As a sponsor that’s exactly what I’d want to be measured. So the question is, how?

We need a measurement system

Oh yes we do. And I’m going to go back to my influencer/hotel model here because it’s been the one I’ve been thinking about the most, especially when influencers promise all sorts without any form of reference or reputation. Remember brands, anyone can buy 10,000 users for very little money. And to some folk perception is everything, “I’ve got 100k fans!”.

So influencer approaches hotel:

“If you give me seven nights at your hotel in return I’ll do two five minute YouTube videos.”

That’s fine but it shouldn’t happen for free. Right now this is just bartering with no negotiation. And the job comes in the haggling. It’s all about ceilings and floors.

How about the hotel does a 25% discount and along with it a custom link. For every successful booking that comes through that link we’ll deduct 5% of the booked amount from your account. If you get a lot of bookings via your videos then you’ll start making money.

For example: The Eden Andalou Aquapark and Spa is currently £1852 for the Executive Suite per person.  If the 25% influencer discount was applied it would be £1389. Every successful linked booking (it has to come from the video otherwise you can’t track it) would deduct £92.60 from the influencer’s balance. Fifteen bookings via the video would clear the balance and then after that the influencer would be in profit.

If you’re video is getting 15k views then that’s 0.1% click through to purchase, that’s not actually bad going. If 5% of the 15k clicked on the link (750 people) and 5% booked (37.5 but let’s call it 38) then £3,518.80 revenue to the influencer is a nice week at the very plush office. That’s assuming everyone’s booking the same suite as you were in. The hotel will have done well out of the deal too, a return on investment: £70,376 – £3518.80 = £66,857.20

Personally I think it’s important that influencers can think this way. This is what brands will want to see going forward, this kind of thinking. The issue will always be traceability.

It’s a numbers game, it always has been and it always will be.


Brands want to de-risk, it’s as simple as that. Anyone can be an influencer and with so many influencers in the eco-system it’s it’s really difficult for any brand to figure out who’s going to be best for the brand.

So, there has to be sacrifice on both sides. A hotel doesn’t want to lose money and an influencer doesn’t want to lose out on a decent place to stay. (I’ve yet to see an influencer do a sponsored video for Premier Inn or Travelodge).

I think fashion is different, cost of production of dresses is vastly different, easier to ship and it’s just eyeballs at that point, like most fashion advertising, it’s just a sunk cost. If this concept could be used, I’d be on the phone to Zara by the end of the day.

What I’ve outlined is hardly new, it’s affiliate marketing mechanics with measurement on both sides. I’ve just added it to my list of things to build…..

The real challenge is the platforms themselves, I’ve proved out that YouTube is workable, I’m assuming that Facebook posts would be the same. It’s when you get to the mobile platforms that measurement gets tricky. And that’s where the work needs to happen.

Any thoughts? Pop a comment below.

Strata London Talk Slides and Code. #stratadata #kafka #dl4j

Photo 24-05-2018, 17 39 27(1)

Proof if proof were needed…. so Strata London was an absolute pleasure to talk at. Those who know me know I’m a big fan of the conference, so to talk at it was an additional boost. Who knew that Strictly data could cross boundaries like that.

The slides are available on the Strata website and the proof of concept source code is on github.

Many thanks to those who were supportive, gave advice and generally gave me lots to think about during the two days I was there. Also the photo was taken by Ellen Friedman and borrowed via Twitter…..


Simple Linear Regression in 2 minutes. #machinelearning #linearregression #java

With certain data Simple Linear Regression wins and while the rest of the ML/AI world push tools that are far larger scope than needed for most, sometimes our best tools are hidden in plain sight.

Apache Commons Math, old, kinda forgotten but kinda cool, well Simple Linear Regression is hiding in there and is easy to put together.

1. Add the dependency

Put this in your pom.xml file…..

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-math3 -->

2. Import the class

In your Java class add this import statement.

import org.apache.commons.math3.stat.regression.SimpleRegression;

3. Add your two data points

I’m reading in a list of comma delimited strings so I’m parsing and converting them. The basic premise of building the model is simple though….

public SimpleRegression getLinearRegressionModel(List<String> lines) {
  SimpleRegression sr = new SimpleRegression();
  for(String s : lines) {
    String[] ssplit = s.split(",");
    double x = Double.parseDouble(ssplit[0]);
    double y = Double.parseDouble(ssplit[1]);

return sr;

3. Make some predictions

The SimpleLinearRegression class will give you back the slope and intercept, from there is plain sailing to make a prediction.

private String runPredictions(SimpleRegression sr, int runs) {
  StringBuilder sb = new StringBuilder();
  // Display the intercept of the regression
  sb.append("Intercept: " + sr.getIntercept());
  // Display the slope of the regression.
  sb.append("Slope: " + sr.getSlope());
  // Display the slope standard error
  sb.append("Standard Error: " + sr.getSlopeStdErr());
  // Display adjusted R2 value
  sb.append("Adjusted R2 value: " + sr.getRSquare());
  sb.append("Running random predictions......");
  Random r = new Random();
  for (int i = 0 ; i < runs ; i++) {
    int rn = r.nextInt(10);
    sb.append("Input score: " + rn + " prediction: " + Math.round(sr.predict(rn)));
  return sb.toString();

Job done.

Now remember the key metric is the R2 score, sr.getRSquare() from your model. It’s a number between 0 and 1. 0 is pointless and the model shouldn’t be used, 1 is basically the most accurate model you can get. Anything less than 50% is basically less reliable than a coin flip. Aim for a minimum of 0.8 (80%) and you’re well on your way to bragging about your predictions at the pub, or on Twitter, or Facebook or at the pub on Twitter and Facebook……



Basic calculation for hidden nodes in a Neural Network #hackthehub18 #ai #machinelearning

I have this written down in a number of notebooks but I’m leaving it here for two reasons:

  1. It just took me twenty minutes to find the right notebook.
  2. It’s Hack The Hub in Belfast and it’s all about Machine Learning and AI.

How Many Nodes In A Hidden Layer?

I want to get a rough idea of the number of nodes to use in a hidden layer in my neural network. Too few or too many and this can have an impact on the accuracy of your training model. You’ll see the outputs of your training accuracy during evaluation (accuracy and F1 scores).

There is a common equation available to give us a rough number.

A scaling factor multiplied by the total number of input and output nodes, divided by the number of samples in the training.

In Clojure it looks like this:

user> (defn node-calc [inputs outputs sample-size scaling]
 (double (/ sample-size (* scaling (+ inputs outputs)))))

The scaling factor is just an arbitrary number between 2 and 10. It’s worth mapping through the range to get a feel for the scores.

Let’s Build a Case

My neural network has 1 input node and ten output nodes (ten possible prediction results), to train I’ve got 474 instances of input data to train. I’m going to map the scaling factor from 2 to 10 so I can see the range of node results.

user> (clojure.pprint/pprint (map (fn [s] (node-calc 1 11 474 s)) (range 2 11))) 

Guess how many times I’d run the model training and evaluation? I’d test all the rounded up/down results and see how the F1 score looks.


The Fact Your #Data is Being Used Should Be a Surprise to No-one

It’s been an interesting weekend for my field of work. Especially in an industry where I do stuff with data….

Ellie Mae O’Hagen wrote a piece called “No one can pretend Facebook is harmless fun anymore” and it’s not a bad overview of where things are. The last line says it all:

“…because people with Facebook profiles aren’t the company’s customers: they are the product it sells to advertisers.”

Which is basically the worst kept secret in technology companies, entrepreneurs and tech “thought leaders”. The value is in the data and once you figure out how to monetise that then a free product to customers is no bad thing.

Anyone who knows me knows my love of customer loyalty data, I’ve worked with it since 2002, mined Nectar card data and came up with recommendations via vouchers and offers on how to get customers to change behavior. The Cambridge Analytica approach is far from new it’s just the domain it was applied to.

Once you know you can change another persons behavior there comes a sense of responsibility with it. As the custodians of the data you now have the power the change the course of another person’s future without their knowledge. That thought alone is scary as I know some that would exploit it for profit like squeezing a grape until no more juice would come out.

So think about it all, every card, whether it’s loyalty cards, bank cards, your medical records on the GP’s system. Do the likes of Tesco/Dunn Humby have a public list of where their Clubcard data is sold? Probably not.  I asked a question during a Big Data Week panel in 2015, “Who has a Clubcard?”, pretty much all the room, “Who wouldn’t mind if your shopping habits were passed on to the insurance company?”, all hands with the exception of two went down very quick.

Telephone call logs are another and the classic line, “we may record your call for training purposes”, training what exactly? Another customer representative or an machine learning or AI tool to decide whether to keep your custom. How do we know, well we never do because we never find out.

Will the events of the weekend turn the tide against Facebook, it’s 50/50. I mean 50m users of Facebook is only about 2.5% of the user base and most hardened cat/dog/baby picture posting users won’t care. If I were to bet, I’d said probably nothing much will happen.

The only people who need to change are you and me, about what data goes were, how it will be used and how to have it deleted when we’re done with that service.


Setting up Org Mode and Babel for the Nervous #emacs #vi #babel

I’m claiming a moral victory for my sanity here…..

As a die hard vi user Emacs occasionally confuses me, I’m happy to admit that. Many a time I’ve pressed :wq instead of C-x C-s when it comes to saving files.

Thing is Emacs has loads of goodies that I never get to quite try out. Org mode being able to run scripts and stuff is one of them. Curiosity has now given way to requirement so it was a wrestling match to get it working (and reading the documentation did help…. I admit). There’s probably better ways to do this but it worked for me.

Installing Org mode

Open your init.el file and add the following line:

(add-to-list 'package-archives '("org" . "http://orgmode.org/elpa/") t)

I’m assuming that you’ve already got (require 'package) in your init.el file, if you don’t then add that above the add-to-list line.

Open your Emacs, you now need to install the org package and org-contrib package.

M-x list-packages

You will see org and org-contrib listed (mine are at the top). Install them both, click on the title then click “Install”. Emacs will output a load of stuff but all is normally well. With that done we can now make sure that we can run bash from within Org mode.

Enabling Bash within Emacs

Open your init.el file again. You will now add the org-babel command to load the shell so it can be called from your org file. I usually add this stuff at the bottom of my init file.

(org-babel-do-load-languages 'org-babel-load-languages
 '((shell . t)))

Save the file and restart Emacs.

Testing within Org mode.

So far so good, now to test.

Either open an org mode file or create one. Now add the following:

#+BEGIN_SRC bash
echo 'this is a test'

Then evaluate the output by using C-c C-c and you will be prompted Evaluate this bash code block on your system? Respond with yes.

Look at your org file again and you should see the output.

#+BEGIN_SRC bash
echo 'this is a test'

: this is a test

That’s good enough for me and now I have notes on how to get it working on my work machine tomorrow morning (I’ll forget if I don’t write it down).


Walking as a debugging technique. #programming #debugging #code #learning

Kris is totally on the money, this tweet is 100% true. One story I tell to developers is from personal experience.

While working on the Sporting Life* website for the Press Association I was working on a Perl script, quite a beefy one, to populate pools coupons so people could play online.

All morning I was fixated on a bug and I couldn’t see the wood for the trees. My boss sat opposite but didn’t say a word, nor did I realise he was teaching me. After a while he decided the time was right, “Jase, go for a walk.”, I was blunt, “No, not until I’ve fixed this bug….”, “Jase, go for a WALK!”. I got the hint…..

The Press Association car park is a fair size so I did a lap, just the one. All the while during that lap I was talking under my breath about such an absurd command from my boss. My first proper programming job and I was than impressed…..

That all changed in an instant. I opened the door to the office, walked to my desk and before I even sat down pointed at the screen and said, “Oh look, there’s a comma missing….”, made the correction and it worked first time.

Stuck with a problem? Go for a walk.


* Two milestones of my programming career being one of the first involved in the very first online betting platform and second, the first online pools coupon….. this coming from the man who has no interest in sport at all.


I’m talking about streaming apps at ClojureX 4-5th December, London at @Skillsmatter #clojure #onyx #streaming #kafka #kinesis

Who Let Him Up There Again??

Last year at ClojureX I did an introduction to Onyx, this year it’s about what I really learned at the coal face. I’ll be talking about how I bled all over Onyx with a really big project.

This time though, no naff jokes, no Strictly Come Dancing and Linear Regression*, no temptation to use that Japanese War Tuba picture. It will be about designing streaming applications, task life cycles, heartbeats, docker deployment considerations and the calculating log volume sizes for when you’re on holiday.

I’m looking forward to it. If you are interested in the current schedule you can read that here, if you want more information on the conference then that’s on the SkillsMatter website.

* If you’re interesed the Darcey Coefficient is (as a Clojure function):

(ns darceycoefficient.core)
(defn predict-score [x-score]
 (+ 3.031 (* 0.6769 x-score)))

Like BigData tools you won’t need AI 99% of the time . #bigdata #data #machinelearning #ai #hadoop #spark #kafka

The Prologue.

Recently I’ve been very curious, I know that alone makes people in tech really nervous. I was curious to find out the first mentions of BigData and Hadoop in this blog, April 2012 and the previous year I’d been doing a lot of reading on cloud technologies and moreover data, my thirty year focus is data and right now in 2017 I’m halfway through.

The edge as I saw it would be to go macro on data and insight, that had been my thought ten years earlier. The whole play with customer data was clear in my mind then. In 2002 though we didn’t have the tooling, we made it ourselves. Crude, yes. Worked, it did.

When I moved to Northern Ireland I kept talking about the data plays to mainly deaf ears, some got it. Most didn’t. “Hadoop, never heard of it”. Five years later everyone has heard of Hadoop… too late.

It’s usually about now we have a word cloud with lots of big data related words on it.

Small Data, Big Data Tools

Most of the stories I hear about Big Data adoption are just this, using Big Data tools to solve small data problems. On the face of it the amount of data an organisation has rarely amounts to the need for huge tooling like Hadoop or Spark. My guess is (and I’ve seen partially confirmed) that the larger platforms like Cloudera, MapR and Hortonworks compete on a very narrow field of real big customers.

Let’s be honest with ourselves, Netflix and Amazon sized data are more deviations of the mean than the mean itself and the probability of it being given to you is very small unless it’s made public.

I personally found out in 2012 when I put together Cloudatics, using big data tools is a very hard sell. Many companies just don’t care, not all understand the benefits and those who cared still didn’t see how it would apply to them. Your pipeline is slim, at a guess 100:1 ratio would apply, that was optimistic then let alone five years on.

Most of us aren’t near “Averaged Sized Data” let alone Big Data.

When first met Bruce Durling back in late 2013 (he probably regretted that coffee) we talked about all the tools, how there’s no need to write all this Java stuff when a few lines of Pig will do and how solving a specific problem with existing big data tools was far better than trying to launch a platform (yup, know that, already tried).

What Bruce and I also know that we work with average sized data…. it’s not big data but it’s not small data. Do we need Hadoop or Spark? Probably not, can we code and scale it on our own, yes we can. Do we have the skills to do huge data processing, you betcha.

I sat in a room a few weeks ago where mining 40,000 tweets was classed as a monumental achievement, I don’t want to burst anyone’s bubble, it’s not. Even 80 million tweets is not a big data problem, neither an average sized data one. On my laptop doing sentiment analysis took under a minute.

Now enter all life saving AI!

And guess what, it looks like the same mistake is going to be repeated. This time with artificial intelligence. It’ll save lives! It’ll replace jobs! It’ll replace humans! It can’t tell the difference between a turtle and a gun! All that stuff is coming back.

If you firmly believe that a black box is going to revolutionise your business then please be my guest. Just be ready with the legals and customer service department, AI is rarely 100% accurate.

Like big data you’ll needs tons of data to train your “I have no idea how it works it’s all voodoo” black box algorithm. The less you train the more error prone your predictions will be. Ultimately the only the only thing it will harm is the organisation who ran the AI in the fist place. Take it as fact that customers will point the finger straight back at you, very publicly, if you get prediction wildly wrong.

I’ve seen Google video and Amazon Alexa voice classification neural works do amazing things, the usual startup on the street may have access to the tools but rarely the data to train. And my key takeaway of learning since doing all that Nectar card stuff, without quality data and lots of it, you’re fight will be a hard one.

I think there is still a good few years at the R&D coalface trying to figure it all out where AI could fit properly. Yes jobs will be replaced by AI, new jobs will be created. Humans will sit aside robotic machines that take the heavy lifting away (that was going on for a long time before the marketers got hold of AI and started scaring the s**t out of people with it.

It’s not impossible to start something in the AI space and put it on the cloud, though, the costs can add up if you take your eye off the ball. The real question is, “do you really have to do it that way? Is there an easier method?”. Most crunching could be done on a database (not blockchain may I add), hell even an Excel spreadsheet is capable for some without the programming knowledge or money to spend on services.

Popular learning methods are still based on the tried and true methods: decision trees, logistical regression and k-means clustering, not black boxes.  The numbers can be worked out away from code as confirmation, though who does that is a different matter entirely. The most well known algorithms can be reverse engineered: decision trees, Bayes networks, Support Vector Machines, Logistic Regression there’s maths laid down bare showing how they work. The rule of thumb is simple: if traditional machine learning methods are not showing good results then try a neural network (the backbone of AI) but only as a last resort, not the first go to.

If you want my advice try the tradition, well tested, algorithms first with the small data you have. I even wrote a book to help you…..

Like BigData, you more than likely don’t need AI.