Calculating The Darcey Coefficient – Part 4 – Live Testing #strictlycomedancing #clojure #linearregression


I promise this is the last part of The Darcey Coefficient, having gone through linear regression, neural networks and refining the accuracy of the prediction, it was only fair I ran the linear regressions against some live scores to see how it performed.

If you want to read the first four parts (yes four, I’m sorry) then they are all here.

Week 5 Scores

As ever the Ultimate Strictly website is on the ball, the scores are all in.

Judges scores
Couple Craig Darcey Len Bruno Total
Robert & Oksana 6 8 8 7 29
Lesley & Anton 5 6 7 6 24
Greg & Natalie 4 6 7 7 24
Anastacia & Gorka* 7 7 8 8 30
Louise & Kevin 8 8 8 9 33
Ed & Katya 2 6 6 4 18
Ore & Joanne 9 9 9 9 36
Daisy & Aljaž 8 8 8 8 32
Danny & Oti 8 9 9 9 35
Claudia & AJ 8 7 8 9 32

So we have data and the expected result. The real question is how well the regressions perform. Time to rig up some code.

Coding the Test

As the spreadsheet did the work I don’t need to reinvent the wheel. All I need is the numbers and put them in a function.

First there’s the Craig -> Darcey regression.

(defn predict-score-from-craig [x-score]
 (+ 3.031 (* 0.6769 x-score)))

And then there’s the All Judges -> Darcey regression.

(defn predict-darcey-from-all [x-score]
 (- (* 0.2855 x-score) 1.2991))

As the predictions will not come out as integers I need a function to round up or down as required. So I nabbed this one from a StackOverflow comment as it works nicely.

(defn round2 [precision d]
 (let [factor (Math/pow 10 precision)]
   (/ (Math/round (* d factor)) factor)))

Finally I need the data, a vector of vectors with Craig, Len, Bruno and Darcey’s score. I leave Darcey’s actual score in so have something to test against. What the predicted score was and what the actual score was.

;; vectors of scores, [craig, len, bruno, darcey's actual score]
(def wk14-scores [[6 8 7 8]
 [5 7 6 6]
 [4 7 7 6]
 [7 8 8 7]
 [8 8 9 8]
 [2 6 4 6]
 [9 9 9 9]
 [8 8 8 8]
 [8 9 9 9]
 [8 8 9 7]])

Predicting Against Craig’s Scores

The difference between Craig and Darcey’s scores can fluctuate depending on the judges comments. The dance with Ed and Katya is a good example, Craig scored 2 and Darcey scored 6, so I’m not expecting great things from this regression, but as it was our starting point let’s test it.

(defn predict-from-craig [scores]
 (map (fn [score]
   (let [craig (first score)
         expected (last score)
         predicted (round2 0 (predict-score-from-craig (first score)))]
    (println "Craig: " craig
             "Predicted: " predicted
             "Actual: " expected
             "Correct: " (if (= (int predicted) expected)
                             false)))) scores))

When run it gives us the following predictions:

strictlylinearregression.core> (predict-from-craig wk14-scores)
Craig: 6 Predicted: 7.0 Actual: 8 Correct: false
Craig: 5 Predicted: 6.0 Actual: 6 Correct: true
Craig: 4 Predicted: 6.0 Actual: 6 Correct: true
Craig: 7 Predicted: 8.0 Actual: 7 Correct: false
Craig: 8 Predicted: 8.0 Actual: 8 Correct: true
Craig: 2 Predicted: 4.0 Actual: 6 Correct: false
Craig: 9 Predicted: 9.0 Actual: 9 Correct: true
Craig: 8 Predicted: 8.0 Actual: 8 Correct: true
Craig: 8 Predicted: 8.0 Actual: 9 Correct: false
Craig: 8 Predicted: 8.0 Actual: 7 Correct: false

Okay, 50/50 but this doesn’t come as a surprise. With more scores (ie Len and Bruno) then we might hit the mark better.

Does The Full Judges Scores Improve the Prediction?

Let’s find out, here’s the code:

(defn predict-from-judges [scores]
 (map (fn [score]
          (let [judges (reduce + (take 3 score))
                expected (last score)
                predicted (round2 0 (predict-score-from-all judges))]
          (println "Judges: " judges
                   "Predicted: " predicted
                   "Actual: " expected
                   "Correct: " (if (= (int predicted) expected)
                                   false)))) scores))

By taking the sum of the first three scores (Craig, Len and Bruno) that total is then run against the predict-score-from-all function. How it performs is anyone’s guess right now.

strictlylinearregression.core> (predict-from-judges wk14-scores)
Judges: 21 Predicted: 5.0 Actual: 8 Correct: false
Judges: 18 Predicted: 4.0 Actual: 6 Correct: false
Judges: 18 Predicted: 4.0 Actual: 6 Correct: false
Judges: 23 Predicted: 5.0 Actual: 7 Correct: false
Judges: 25 Predicted: 6.0 Actual: 8 Correct: false
Judges: 12 Predicted: 2.0 Actual: 6 Correct: false
Judges: 27 Predicted: 6.0 Actual: 9 Correct: false
Judges: 24 Predicted: 6.0 Actual: 8 Correct: false
Judges: 26 Predicted: 6.0 Actual: 9 Correct: false
Judges: 25 Predicted: 6.0 Actual: 7 Correct: false

Well that’s interesting, we get much lower expected scores based on the combined scores. Every prediction was wrong, that would hurt if you were betting on it.

All of this leads us to a conclusion, if you want to predict what Darcey’s score is going to be then look at what Craig does first.

That’s that, case is now closed.


Refining the Coefficient. Iterative Improvements In Learning. #data #machinelearning #linearregression

Refinement is an iterative process, sometimes quick and sometimes slow. If you’ve followed the last few blog posts on score prediction (if not you can catch up here) I’ve run the data once and rolled with the prediction, basically, “that’s good enough for this”.

The kettle is on, tea = thinking time

This morning I was left wondering, as Strictly is on tonight, is there any way to improve reliability of the linear regression from the spreadsheet? The neural network was fine but for good machine learning you need an awful lot of data to get a good prediction fit. The neural net was level pegging with the small linear model, about 72%.

I’ve got two choices, create more data to tighten up the neural net or have a closer look at the original data and find a way of changing my thinking.

Change your thinking for better insights?

Let’s remind ourselves of the raw data again.


Four numbers, the scores from Craig, Len, Bruno and Darcey in that order. The original linear regression only looked at Craig’s score to see the impact on Darcey’s score.


That gave us the predition:

y = 0.6769x + 3.031

And a R squared value of 0.792, not bad going. The neural network took into account all three scores from Craig, Len and Bruno to classify Darcey’s score, it was okay but the lack of raw data actually let it down.

Refining the linear regression with new learning

If I go back to the spreadsheet, let’s tinker with it. What happens if I combine the three scores using the SUM() function to add them together.


Very interesting, the slope is steeper for a start. The regression now gives us:

y = 0.2855x - 1.2991

And the R squared has gone up from 0.792 to 0.8742, an improvement. And as it stands this algorithm is now more accurate than the neural network I created.


It’s a simple change, quite an obvious on and we’ve taken the original hypothesis forward since the original post. How accurate is the linear regression? While I’ll find that out tonight I’m sure.



Calculating The Darcey Coefficient – Part 3 #strictlycomedancing #machinelearning #clojure #weka

The Story So Far…

This started off as a quick look at Linear Regression in spreadsheets and using the findings in Clojure code, that’s all in Part 1. Muggins here decided that wasn’t good enough and rigged up a Neural Network to keep the AI/ML kids happy, that’s all in Part 2.

Darcey, Len, Craig or Bruno haven’t contacted me with a cease and desist so I’ll carry on where I left off….. making this model better. In fact they seem rather supportive of the whole thing.


Weka Has Options.

When you create a classifier in Weka there are options available to you to tweak and refine the model. With the Multilayer Perceptron that was put together in the previous post, that all ran with the defaults. As Weka can automatically build the neural network I don’t have to worry about how many hidden layers to define, that will be handled for me.

I do however want to alter the number of iterations the model runs (epochs) and I want to have a little more control over the learning rate.

The clj-ml library handles the options as a map.

darceyneuralnetwork.core> (def opts {:learning-rate 0.4 :epochs 10000})
darceyneuralnetwork.core> (classifier/make-classifier-options :neural-network :multilayer-perceptron opts)

The code on Github is modified to take those options into account.

(defn train-neural-net [training-data-filename class-index opts]
 (let [instances (load-instance-data training-data-filename)
       neuralnet (classifier/make-classifier :neural-network :multilayer-perceptron opts)]
   (data/dataset-set-class instances class-index)
   (classifier/classifier-train neuralnet instances)))

(defn build-classifier [training-data-filename output-filename]
 (let [opts (classifier/make-classifier-options :neural-network :multilayer-perceptron
                                                {:learning-rate 0.4
                                                 :epochs 10000})
       nnet (train-neural-net training-data-filename 3 opts)]
   (utils/serialize-to-file nnet output-filename)))


There’s not much more I can take this as it stands. The data is actually pretty robust that using Linear Regression would give the kind of answers we were looking for. Another argument would say that you could use a basic decision tree to read Craig’s score and classify Darcey’s score.

If the data were all over the place in terms of scoring then using something along the lines of an artificial neural network would be worth doing. And using Weka with Clojure the whole thing is made a lot easier. It’s actually easy to do in Java which I did in my book Machine Learning: Hands on for Developers and Technical Professionals.


Rest assured this is not the last you’ll see of machine learning in this blog, there’s more to come.




Calculating The Darcey Coefficient – Part 2 #strictlycomedancing #machinelearning #clojure #weka

Previously on…..

In part 1 we looked at using linear regression, with the aid of a spreadsheet, to see if we could predict within a reasonable tolerance predict what Darcey Bussell’s scoring would be based on Craig Revel Horwood’s score.

No big deal, it worked quite well, it took less thank five minutes and didn’t interfere with me making a cup of tea. As we concluded from a bit of data application:

y = 0.6769x + 3.031

And all was well.

Time To Up The Ante

Linear Regression is all well and good but this is 2016, this is the year where every Northern Ireland company decides it’s going to do artificial intelligence and machine learning with hardly any data…. So, we’re going to upgrade the Darcey Coefficient and go all Techcrunch/Google/DeepMind on it, Darcey’s predictions are now going to be an Artificial Neural Network!



My sentiments exactly. For the readers of previous posts, both of you, my excitement for neural networks isn’t exactly up there. They’re good but held with a small amount of skepticism. My reasons? Well, like I’ve said before….

One of the keys to understanding the artificial neural network is knowing that the application of the model implies you’re not exactly sure of the relationship of the input and output nodes. You might have a hunch, but you don’t know for sure. The simple fact of the matter is, if you did know this, then you’d be using another machine learning algorithm.

We’ve already got a rough idea how this is going to pan out, the linear regression gave us a good clue. The amount of data we have isn’t huge either, the data set has 476 rows in it. So the error rate of a neural network might actually be larger than what I’d like.

The fun though is in the trying. And in the aid of reputation, ego and book sales well hey it’s worth a look. So I’m going to use the Weka Machine Learning framework as it’s good, solid and it just works.  The neural network can be used for predicting the score or any judge and as Len’s leaving this is perfect opportunity to give it a whirl. For the means of demonstration though I’ll use Darcey’s scores as it follows on from the previous post.

Preparing the Data

We have a csv file but I’ve parred this down so it’s just the scores of Craig, Darcey, Len and Bruno. Weka can import CSV files but I prefer to craft the proper format the Weka likes which is the ARFF file. It spells out the format, the output class we’re expecting to predict on and so on.

@relation strictlycd

@attribute craig numeric
@attribute len numeric
@attribute bruno numeric
@attribute darcey numeric

7,7,7,7..... and so on

Preparing the Project

Let’s have a crack at this with Clojure, there is a reference to the Weka framework in Clojars so this in theory should be fairly easy to sort out. Using leiningen to create a new project let go:

$ lein new darceyneuralnetwork
Warning: refactor-nrepl requires org.clojure/clojure 1.7.0 or greater.
Warning: refactor-nrepl middleware won't be activated due to missing dependencies.
Generating a project called darceyneuralnetwork based on the 'default' template.
The default template is intended for library projects, not applications.
To see other templates (app, plugin, etc), try `lein help new`.

Copy the arff file into the resources folder, or somewhere on the file system where you can find it easily, then I think we’re ready to rock and rhumba.

I’m going to open the project.clj file and add the Weka dependency in, I’m also going to add the clj-ml project too, this is a handy Clojure wrapper for Weka. It doesn’t cover everything but it takes the pain out of some things like loading instances and so on.

(defproject darceyneuralnetwork "0.1.0-SNAPSHOT"
 :description "FIXME: write description"
 :url ""
 :license {:name "Eclipse Public License"
           :url ""}
 :dependencies [[org.clojure/clojure "1.8.0"]
                [clj-ml "0.0.3-SNAPSHOT"]
                [weka "3.6.2"]])

Training the Neural Network

In the core.clj file I’m going to start putting together the actual code for the neural network (no Web API’s here!).

Now a quick think about what we actually need to do. Actually it’s pretty simple with Weka in control of things, a checklist is helpful all the same.

  • Open the arff training file.
  • Create instances from the training file.
  • Set the class index of the training file, ie what we are looking to predict, in this case it’s Darcey’s score.
  • Define a Multilayer Perceptron and set it’s options.
  • Build the classifier with training data.

The nice thing with using Weka with Clojure is we do REPL driven design and do things one line at a time.

The wrapper library has a load-instances function and takes the file location as a URL.

darceyneuralnetwork.core> (wio/load-instances :arff "file:///Users/jasonbell/work/dataissexy/darceyneuralnetwork/resources/strictlydata.arff")
#object[weka.core.Instances 0x2a5c3f7 "@relation strictlycd\n\n@attribute craig numeric\n@attribute len numeric\n@attribute bruno numeric\n@attribute darcey numeric\n\n@data\n2,5,5,5\n5,6,4,5\n3,5,4,4\n4,6,6,7\n6,6,7,6\n7,7,7,7\n6,7,7,6\n3,5,4,5\n5,6,5,5\n8,7,7,8\n5,7,5,5\n3,5,5,5\n6,6,7,8\n4,4,5,5\n6,6,6,6\n7,7,7,6\n6,6,6,6\n3,5,5,6\n6,7,7,6\n2,4,4,5\n7,8,8,7\n8,8,8,8\n5,5,5,5\n6,5,5,6\n7,6,7,6\n3,5,5,5\n7,6,6,6\n5,6,6,6\n4,7,6,5\n3,6,4,6\n3,5,4,6\n4,5,4,4\n7,7,8,7\n8,8,8,8\n7,6,6,7\n6,6,6,7\n7

Okay, with that working I’m going to add it to my code.

(defn load-instance-data [file-url]
 (woi/load-instances :arff file-url))

Note the load-instances function expects a URL so make sure you filename does begin with “file:///” otherwise it will throw an exception.

With training instances dealt with in one line of code (gotta love Clojure, it would take three in Java) we can now look at the classifier itself. So the decision is to use a Neural Network, in this instances a Multilayer Perceptron. In Java it’s a doddle, in Clojure even more so:

darceyneuralnetwork.core> (classifier/make-classifier :neural-network :multilayer-perceptron)
#object[weka.classifiers.functions.MultilayerPerceptron 0x77bf80a0 ""]

It doesn’t actually do anything yet but there’s a classifier ready and waiting. We have to define which class (Craig, Len, Bruno or Darcey) we wish to classify, so Darcey is number 3. Weka needs to know what you are trying to classify otherwise it will throw an exception.

(data/dataset-set-class instances 3)

Now we can train the model.

darceyneuralnetwork.core> (classifier/classifier-train ann ds)
#object[weka.classifiers.functions.MultilayerPerceptron 0x1b800b32 "Linear Node 0\n Inputs Weights\n Threshold -0.005224665277369991\n Node 1 1.161165780729305\n Node 2 -1.0681086084010063\nSigmoid Node 1\n Inputs Weights\n Threshold -2.5314445242321613\n Attrib craig 1.3343684436571155\n Attrib len 1.290973083908637\n Attrib bruno 1.1941270206738404\nSigmoid Node 2\n Inputs Weights\n Threshold -1.508477761092395\n Attrib craig -0.73817374973773\n Attrib len -0.7490868020959697\n Attrib bruno -1.3714589018840246\nClass \n Input\n Node 0\n"]

The output is showing the input node weights. All looks good. We have a neural network that can predict Darcey’s score based on the other three judges scores.

Remember this is all within the REPL, back to my code now and I can craft a function to train a neural network.

(defn train-neural-net [training-data-filename]
 (let [instances (load-instance-data training-data-filename)
       neuralnet (classifier/make-classifier :neural-network :multilayer-perceptron)]
   (data/dataset-set-class instances 3)
   (classifier/classifier-train neuralnet instances)))

All it does is create the steps I did in the REPL: load the instances, create a classifier, select the class to classify and then train the neural network.

Giving it a dry run we run it as so from the REPL.

darceyneuralnetwork.core> (def training-data "file:///Users/jasonbell/work/dataissexy/darceyneuralnetwork/resources/strictlydata.arff")
darceyneuralnetwork.core> (def nnet (train-neural-net training-data))
darceyneuralnetwork.core> nnet
#object[weka.classifiers.functions.MultilayerPerceptron 0x100e60b7 "Linear Node 0\n Inputs Weights\n Threshold -0.005224665277369991\n Node 1 1.161165780729305\n Node 2 -1.0681086084010063\nSigmoid Node 1\n Inputs Weights\n Threshold -2.5314445242321613\n Attrib craig 1.3343684436571155\n Attrib len 1.290973083908637\n Attrib bruno 1.1941270206738404\nSigmoid Node 2\n Inputs Weights\n Threshold -1.508477761092395\n Attrib craig -0.73817374973773\n Attrib len -0.7490868020959697\n Attrib bruno -1.3714589018840246\nClass \n Input\n Node 0\n"]

All good then. So we’ve crafted a piece of code pretty quickly to train a neural network. I’d like to save the model so I don’t have to go through the pain of training it everytime I want to use it. The utils.clj has a function to serialize the model to a file.

(utils/serialize-to-file nnet output-filename)

The nice thing with Weka is the process is the same for most of the different machine learning types.

  • Load the instances
  • Create a classifier
  • Set the output class
  • Train the model
  • Save the model

So let’s park that there, we have build a neural network. Time to move to predicting some scores.  If you want to have a look at the code I’ve put it up on Github.

Predicting with the Neural Network

With our model (rough, ready and in need of refinement) we can do some predicting. It’s just a case of creating a new instance based on the training instance and running it against the neural network to get a score.

The make-instance function will take a defined instance type and apply data from a vector to create a new instance. Then it’s a case of running that against the model with the classifier-classify function.

darceyneuralnetwork.core> (def to-classify (data/make-instance instances [8 8 8 0]))
darceyneuralnetwork.core> (classifier/classifier-classify model to-classify)

So we have a score, if we rounded it up we’d get an 8. Which is about right. If Craig were to throw a hum dinger of a score in the model performs well under the circumstances.

darceyneuralnetwork.core> (def to-classify (data/make-instance instances [3 8 8 0]))
darceyneuralnetwork.core> (classifier/classifier-classify model to-classify)

Let’s remember this model is created with defaults, it’s far from perfect and with the amount of data we have we it’s not a bad effort.  There’s more we can do but I can hear the kettle.

In Part 3…..

Yes, there’s a part 3. We’ll take this model and have a go at some tweaking and refining to make the predictions even better.








Calculating The Darcey Coefficient – Part 1 #strictlycomedancing @bbcstrictly @DarceyOfficial #numbers #clojure

I Need a Hypothesis

Question, can we safely predict Darcey Bussell’s score based on Craig’s initial scoring?

Okay, first off I’m not really the dancing type but there’s a strange thing with this programme that just kinda keeps you watching. Over time though my focus comes back to the numbers. And the nice thing with Strictly is that we get scores, so someone somewhere is going to be wise/daft/sad enough to record all this data.

Before anyone jumps to conclusions, it’s not me.

In this post I’m going to be using linear regression to see if we can get some hint of a number we can safely use to predict Darcey’s score based on Craig’s score.


Data, data, data…..

Time to applaud the website Ultimate Strictly who has all the judging data since the first series. Applaud even louder that someone had the foresight to publish comma separated variable data for the types who go looking for correlation……

Your link to data nirvana is here.

No Programming Required

You could, if you wanted, code up a whole framework to work out the linear regression and so on. It’s Sunday so you have to be joking, I’m not going to that trouble right now.

Nor am I going to teach you how linear regression works…. there are tons of things on the internet that can teach you that. I just want numbers and quickly, I need more tea. There’s Google…..

Bring On The Spreadsheet

As we have two independent variables, Craig’s score and Darcey’s score, we can work all this out. To take the pain away of wasting time when I could be making another cup of tea, well I’m going to use Numbers (you could use Excel or OpenOffice) to get the numbers.


There’s a nice slope there so there’s definitely a relationship between the scores. As the R squared value is 0.792 there’s a good fit here, not perfect but enough to be getting on with. The R squared range is from 0 to 1, 0 being useless and 1 being prediction perfect.

So with 0.792 we have something workable to make predictions with.

Calculating y…..

If you look in the very top left of the graph you will see the calculation required for finding the value of y.

y = 0.6769x + 3.031

So if Craig scored a 5 we’d get a calculator (yes I do use one) and punch in the numbers in bold:

y = 5 * 0.6769 + 3.031

I can wrap this up in a Clojure function for quick repeated calculations:

(ns darceycoefficient.core)
(defn predict-score [x-score]
 (+ 3.031 (* 0.6769 x-score)))

And use it over and over again.

Testing The Calculation

Craig scores 5 and Darcey scores:

darceycoefficient.core> (predict-score 5)

Yeah I can go with that estimate. Let’s have a look at all ten outcomes.

darceycoefficient.core> (for [x (range 1 10)]
 (predict-score x))

This is actually not bad at all as estimates go. The lower scores and Darcey scores higher, looking at the raw data the lowest score by Darcey has been a 4 and the lowest from Craig was 2. As the scores get higher the scores are aligning, if Craig is scoring a 10 then it’s pretty much assumed all the other judges are scoring 10 as well.

Concluding Part 1.

Pretty basic I know, quick to get an answer I know, but when you watch next Saturday you can whip out a calculator, quickly tap in Craig’s score and impress you family, dinner guests and others.

You can’t do that with X Factor or Bake Off.

I’m liking the raw data, Ultimate Strictly have done a great job on it, as fan sites go it’s one of the best I’ve ever seen.

There’s only one more thing to say really.

Keeeeeeep Statistical Analysing……!


Hey! Let’s talk about your AI startup! #AI #Startups #ArtificialIntelligence

“Jase, I’ve been told all the money is in AI now so I’m off to go and create a company…..”

Not the first time….

…I’ve heard this over the last couple of months. And said with all the same eager puppy excitement when chatbots were going to take over the world, literally the week before.

So my natural reaction, with humility and grace has been the same….


I’m not going to sit here and say, “don’t do it”, it’s not my place to. I do have my opinions though. Firstly….

The Big Boys Have Been There For Years

Yes, I’m sorry to say that the big companies have been working on this for a long long time already but because it wasn’t in the tech press no one really took much notice. Well a few of us did and even then didn’t act on it because the big boys were doing it.

So Amazon, Google, Facebook et al have been beavering away in the quiet and not really saying much. In stealth if you will…. remember that phrase, a lot of us used to use it in times gone by.

Investment Levels in Northern Ireland….

Too low, just simply too low. The plain fact of the matter, go to the US. As of December 2015 90% of the deals done in AI investment were into US companies. If you insist on staying here then close a $5m round at least. Staffing is going to be hard, real hard, and find folk who know how the numbers work. Using a API or third party potentially can hit you hard later on, remeber KISS? Keep It Separate Stupid!

Let’s talk data.

You probably don’t have it. So you need to generate it but you’re already too late unless you’re generating tons of the stuff and storing it. And words ringing in my when asked on a panel, “If I gave you £100 to invest what would you put it in”, I said put it into data mining. If you had been saving the data over the last six years then you’d have something to predict….. but you probably haven’t and for good AI training you’ll need a s**t load of it.


I’ve said my bit on neural networks a long while ago. It still stands, it’s early days and many don’t know the work needed, the big guys did and in that time they’ve got the data, the skills and the money.

What’s Your Edge?

Do you have one? The simple rule of thumb is the same as John Kelley, the Kelly formula is really simple.


What’s the likelihood of you getting the payday, divided by the odds against you. With over 900 AI startups invested on the planet and you having a slither of an edge.

0.01/900 = 0.000011% of you getting the payday.

I know it won’t stop you…… good luck.




#Invent2016 – The TechBet Results

First of all, a huge congratulations to Jumpack for becoming overall winner on Invent2016. Right, so while everyone is probably getting drunk now, down to the business of numbers….


The Parimutuel Betting Results

A total of 285 virtual coins were wagered, for details of what and how this happened please have a look at my earlier blog post. As expected, in my mind, Elemental and Kraydel pulled in the majority of the total bet.

Name Total Amount Wagered
e+press 50
JumPack 25
Purple Magic
Oran Oak 10
Kraydel 125
Take Ten
Point Energy
Elemental 75
The Shield

So the payout is 285/25 = 11.4 (giving odds of Jumpack winning at 10.4/1 and also providing a nice 10.4 per 1 wagered profit, not a bad day’s work for those who bet on it, a secret, it was one person).


Parimutuel Betting on #Invent2016 Winner – @CatalystIncHQ @ciconnect @cihaloni


Prediction is hard, or perhaps I should say that accurate prediction is hard. The more I study prediction markets, betting odds and the like the more I wonder how transferable those theories are to startups. So I decided to run a small experiment….

Perhaps that work at the Sporting Life did actually rub off on me.

Parimutuel Betting with TechBet.

The rules of the road are quite simple:

  • You’re trying to predict the winner of Invent2016.
  • There’s definitely NO money involved.
  • Right now there’s no signing up to play, just make a bet and move on.
  • The betting is open until 7:30pm when the event starts, hopefully this means that no sneaky judges can pile a load of bets on at the last minute.

I’ve limited bet amounts to between 1 and 100 TechBet credits (i.e. a virtual currency with no value, just bragging rights when the winners are listed).

PS: This has nothing to do with Catalyst Inc. apart from they were running the event and it was the perfect candidate to use as a test bed, so if anyone from Catalyst is seething at the sides then I do apologise.


If you want to play then you can, just click on this link.

How Does The Pool Work?

There are no odds in parimutuel betting, what happens is that players place their bet by choosing the winner, during the course of the bet each entry will have an amount of money against them. Here’s an event with four outcomes and the amount that so far has been placed on each.

Name Total Bets
Entry 1 25
Entry 2 78
Entry 3 44
Entry 4 12

If we sum up the amounts bet we get the total amount in the pool. It’s just a sum of all the bets placed.


So in this instance there is a total of 159 “coins” in this event’s pool, this will be split against the winning entry, the payout is then calculated on the number of players and what they individually wagered on the event. So, for example, if Entry 4 won the event, there was a total of 12 “coins” wagered on Entry 4 winning.

159 / 12 = 13.25 per 1 unit bet (the original value of 1 coin bet placed and a profit of 12.25, so the odds are basically 12.25/1

If Player1 had bet 9 on Entry 4 and Player25 had bet 3 on Entry 4 then the pay out would look like.

Player1 = 13.25 * 9 = 119.25

Player25 = 13.25 * 3 = 39.75

Now I’m assuming there’s a 100% payout on the pool, in any normal situation is concerned tax and commission would have been skimmed off the pool before the payout was calculated.

Why On Earth Are You Doing This?

Why not? A couple of reasons, firstly I want to see if people are willing to take a punt with no risk. It also gives me data on the distribution of bets, do people already have a favourite winner of Invent2016? Within this one competition there is a lot of data I can learn from.

Secondly from a proof of concept point of view this was built in a couple of hours and was a good exercise in pinpointing what actually needed building, a tight specification and a narrow view of the deliverable. Yeah it’s on Bootstrap, yeah it looks a bit stale but it does the job.

Thirdly there was some deployment things in Amazon Web Services I’ve been trying to figure out, that ended up being the hardest part of the night, security groups can be a nightmare.

Go and have fun and good luck to all the finalists tonight. Once the winner is announced, I’ll collate all the bets and publish the results on Friday.

The Next Five Years of Machine Learning. #machinelearning #data #bigdata @brianmacnamee @digitalcircle


Last night I attended the Royal Irish Academy lecture “Show me your data, and I’ll tell you who you are” at Ulster University’s Magee Campus. It was an interesting lecture by Dr Brian MacNamee, one that sidestepped any technicalities and aimed for a general audience. It was a very good, informative and entertaining lecture.

One thing that I did notice was the mix of audience, students from school, members of the public and some lecturers from the university, no entrepreneurs in the room that I could see which was a shame though.

It was the amount of school ties in the room that inspired and prompted me to ask the question during the Q&A, “What do you see as the challenges to Machine Learning over the next 5 to 10 years?“.

Some Relics

Some of the machine learning algorithm that we’re using are old, take the decision tree for example, the ID3 algorithm designed by Ross Quinlan goes back to 1986, it’s on its thirtieth birthday. Threshold Logic, which because the foundation for Neural Networks dates back from the work of Warren McCulloch and Walter Pitts back in 1943, that’s 73 years ago.

Most of our modern machine learning systems are based on old technologies and algorithms, is there an opportunity to refine and redevelop these technologies? Is there an opening for new algorithms? I believe there is and the ones who will carry that torch are possibly the ones sat in the Magee lecture room last night proudly wearing their uniforms.

Thanks to Digital Circle for listing the event, I wouldn’t have known about it otherwise.



WhatsApp, encrypted messages don’t matter, your number does.


I’m always bemused when users read tech press comments, company gets acquired and the ringing message in the piece is, “this acquisition won’t change the way users interact with our product”. Of course it will…..

Facebook acquires Instagram, rapid user growth, it made sense. Facebook acquires WhatsApp, okay wasn’t expecting that one (and certainly not for $19Bn) and “it won’t change the service”…..Of course it will….

Phone Number + Facebook Profile = Targeted Insight

The relaxation of WhatsApp interaction with Facebook was always going to happen, you’d have to be pretty naive to think it wasn’t. Even with the encrypted messages that merely skims the surface of messaging as an advertising medium. WhatApp have your number at signup and if you join that with your phone number on your Facebook profile…. well you’ve got the advertising segments, likes, friends, events, movements, checkins and so on. WhatsApp just joined the big old advertising platform graph.


And if you think that the WhatsApp is generated, which it is, remember that Facebook own the whole shebang and will easily be able to generate the same. Even when you mangle it through a MD5 digest it’s still an id of sorts. Perform the same digest on the phone number of your Facebook profile you’ll get a match.

MD5 ("+447900316123plusmyreallylongIMEI") = f6dde26f21be9acef37667d770c95869

Non of this should be a surprise. Expect the ads to roll in soon and remember the old adage, “if you are using a service for free, you are the product”.