I Need a Hypothesis

Question, can we safely predict Darcey Bussell’s score based on Craig’s initial scoring?

Okay, first off I’m not really the dancing type but there’s a strange thing with this programme that just kinda keeps you watching. Over time though my focus comes back to the numbers. And the nice thing with Strictly is that we get scores, so someone somewhere is going to be wise/daft/sad enough to record all this data.

Before anyone jumps to conclusions, it’s not me.

In this post I’m going to be using linear regression to see if we can get some hint of a number we can safely use to predict Darcey’s score based on Craig’s score.

victoria-pendleton-brendon-cole-strictly-come-dancing-series-10-2012-4x600

Data, data, data…..

Time to applaud the website Ultimate Strictly who has all the judging data since the first series. Applaud even louder that someone had the foresight to publish comma separated variable data for the types who go looking for correlation……

Your link to data nirvana is here.

No Programming Required

You could, if you wanted, code up a whole framework to work out the linear regression and so on. It’s Sunday so you have to be joking, I’m not going to that trouble right now.

Nor am I going to teach you how linear regression works…. there are tons of things on the internet that can teach you that. I just want numbers and quickly, I need more tea. There’s Google…..

Bring On The Spreadsheet

As we have two independent variables, Craig’s score and Darcey’s score, we can work all this out. To take the pain away of wasting time when I could be making another cup of tea, well I’m going to use Numbers (you could use Excel or OpenOffice) to get the numbers.

darceyco

There’s a nice slope there so there’s definitely a relationship between the scores. As the R squared value is 0.792 there’s a good fit here, not perfect but enough to be getting on with. The R squared range is from 0 to 1, 0 being useless and 1 being prediction perfect.

So with 0.792 we have something workable to make predictions with.

Calculating y…..

If you look in the very top left of the graph you will see the calculation required for finding the value of y.

y = 0.6769x + 3.031

So if Craig scored a 5 we’d get a calculator (yes I do use one) and punch in the numbers in bold:

y = 5 * 0.6769 + 3.031

I can wrap this up in a Clojure function for quick repeated calculations:

(ns darceycoefficient.core)
(defn predict-score [x-score]
 (+ 3.031 (* 0.6769 x-score)))

And use it over and over again.

Testing The Calculation

Craig scores 5 and Darcey scores:

darceycoefficient.core> (predict-score 5)
6.4155

Yeah I can go with that estimate. Let’s have a look at all ten outcomes.

darceycoefficient.core> (for [x (range 1 10)]
 (predict-score x))
 (3.7079 
  4.3848 
  5.0617 
  5.7386 
  6.4155 
  7.0924 
  7.769299999999999 
  8.4462 
  9.123099999999999
  9.799999999999999)

This is actually not bad at all as estimates go. The lower scores and Darcey scores higher, looking at the raw data the lowest score by Darcey has been a 4 and the lowest from Craig was 2. As the scores get higher the scores are aligning, if Craig is scoring a 10 then it’s pretty much assumed all the other judges are scoring 10 as well.

Concluding Part 1.

Pretty basic I know, quick to get an answer I know, but when you watch next Saturday you can whip out a calculator, quickly tap in Craig’s score and impress you family, dinner guests and others.

You can’t do that with X Factor or Bake Off.

I’m liking the raw data, Ultimate Strictly have done a great job on it, as fan sites go it’s one of the best I’ve ever seen.

There’s only one more thing to say really.

Keeeeeeep Statistical Analysing……!

 

Advertisements