Is Googlewhacking Still Possible? cc @DaveGorman :) #data #bigdata #davegorman #googlewhack

You Need to See This!….

A kind of ritual viewing has happened with myself and the teen recently. Especially as there’s a large interest in statistics and comedy between the pair of us. They’ll suggest one thing, we’ll watch it and then I’ve gone through Monty Python, Billy Connolly, Jasper Carrott, Bill Bailey and so on….. I’ll get shown Game Theory and Film Theory in equal measure.

Then one evening, it hit me….. YOU NEED TO SEE THIS!

Dave Gorman’s Googlewhack Adventure……

(Now as this isn’t an official version of the live show I encourage you to venture here for the DVD and here for the book.)

Francophile namesakes

I owned the DVD when it came out in 2004 and it was a wonder to watch, even after being involved in the web and data industry since 1995 I was still mesmerised.

It’s the true story of Dave Gorman tasked by his friend, Dave Gorman, to find a continuous connection, a chain, of ten Googlewhacks connections. Meeting each Googlewhacker in person they supply Dave with two further Googlewhacks of their own finding. It is, in all seriousness, compelling viewing. And I know what you’re thinking….

What’s a Googlewhack? We need to go back in time a bit.

Googlewhack is a contest for finding a Google search query consisting of exactly two words without quotation marks that returns exactly one hit. A Googlewhack must consist of two actual words found in a dictionary. A Googlewhack is considered legitimate if both of the searched-for words appear in the result page. (From Wikipedia)

Watching again in 2019 it’s still a brilliant story but also it’s interesting to see how much the internet has changed, some for the better and some for the far worse. There’s a far more important question.

Can It Still Be Done?

Has our accelerated lives, data, data shadows and other digital finger prints rendered all of this history. Or, is there a minute glimmer of hope that it could still be done?

The Oxford Dictory contains 171,476 words in current use. From this point on it’s a combinatorics problem, how many word pair combinations actually exist? Back in 2004 I wouldn’t even know how to ask that question let alone find an answer for it….. oh how my life has changed.

There are 14,701,923,550 word pairs that could be searched in an attempt to find a Googlewhack. Fourteen billion….. and from my point of view that’s not a big data problem, it’s an average sized data problem. How long would it take though?

A quick Google search on “Francophile Namesakes” tells us two interesting facts.

Firstly there’s 59,800 results…. no longer a Googlewhack by any stretch of the imagination, and second, the result took 0.33 seconds to find. (14701923550 * 0.33) / 60 gives us 80,860,579 minutes to do all the word pair searches, 1.36m hours. Basically a single computer would take 155 years to just go and hit Google with all the pairs to find a Googlewhack. 

In our world of clustered computing and loads of computers doing the job at the same time, I could deploy a 1,000 machines and it would still take over fifty days to do the work. 

Ultimately, it doesn’t matter. It’s been done already, by Dave, in a time when you could easily do those kinds of searches. When human connection was the default standard of communication. And that’s when I was reminded what the internet has lost for me, the humanity of data. With all the Facebook, Twitter and all the other social networks the social aspect is, to me, lost, it’s the broadcast medium for those who want to listen. Back in 2004 the landscape was much different…… The Googlewhack Adventure just reminded me how much I missed it. 

Thanks Dave. Sadly, I’ve no idea what that’s done to the graph.

 

 

Advertisements

I Shutdown DeskHoppa, here’s why. #DeskHoppa #startups

Yesterday I shut down DeskHoppa. It wasn’t an easy decision but it was the right decision. It surprised a few people that I’d do such a thing and there were a few messages from dear friends wondering if I’d made the right call.

And no, I didn’t delete the code but I did delete all the data.

Marketplace Startups Are Hard

That’s the plain and simple fact. While it’s all very well knowing that there are buyers and sellers out in the market place, actually tying them together via your service is really hard. You are effectively marketing to two sides of the coin, it’s not a simple equation to complete either.

One of the hardest things to solve is in the initial stages. In DeskHoppa’s case you need hosts listing in order to get users searching. Hosts were the hardest customer to get on board, the require convincing and the harsh reality is that most don’t trust you until you can really convince them.

It’s a Numbers Game

Everyone I spoke to was lovely, “That’s a great idea, I needed that yesterday!”. The problem is that kind words do not put money in the bank. So you have to start with a figure in mind, £100,000 turnover for example and work backwards….

There’s 260 working days a year so that’s my frame of reference. £100,000 / 260 = £384.61 a day, that’s what I need to be doing as an average.

If my fees are £1.20 for every £10 booked (card fees are applied after so they don’t chew in to my margin), then I’m looking at 321 bookings a day. Now look at the real world side of that market, the funnel of users.

Search -> View -> Book. 

Assuming my booked users are the 321, I’m guessing the conversion rate is 3% from view to book (users just looking around do that, just look around). I need 10,700 host views a day based on my 321 @ 3%. That, however, is not the end of the story. Not everyone is going to be looking all the time, so far the assumption that 100% of the users are searching, that’s unrealistic. It’s probably 3% again, at best.

So what I’m really saying is I need 356,666 users signed up and booking daily to make £100k/year. Or 3.56m users to make a million revenue a year. That doesn’t even take hosts into account….

Facebook, Instagram, Twitter, Linkedin and GoogleAds…..

This is the first time I ran some experiments on ads. The ability to narrow in on target segments is critical, get that wrong and your spend vanishes in hours as a bunch of underage users poke around to see what you are doing.

Linkedin spends are quite expensive and at least they give a rough idea of the conversion (for my scenarios it was about 0.79%).

Ultimately boosting posts didn’t return anything, some nice users in the US and a two hosts who enquired. Once again though it’s a high volume numbers game, you need money to make money. I knew that all along.

There’s a Skill to Knowing When to Call it a Day

When I embarked on DeskHoppa I was under no illusions, building the service is the easy bit (well it is for me, I can write code quick). The key was always eyeballs and they’re really hard to get. If you fool yourself that folk always care then there’s a hard reality, the majority don’t, it takes time to get their attention and trust.

Knowing when to say, “that’s enough”, is done through various iterations of history. I’ve let things run too long before. Idea validation is the hard part and I don’t believe it’s about product market fit, it’s about market product fit. You have to build the market first, if that market doesn’t exist then you’ll spend a long while creating it. The first person I heard that flipped the whole Product/Market thing was Gretta Van Reil of SkinnyMeTea.

After review numbers and looking at what it might take to get things where they need to be, the right decision was made. There’s a worse position to be in, a service that just trickles money in but doesn’t quite break even. The signal that something is happening but not a volumes you need….. things can become a millstone quickly.

Finally….

There are some wonderful, supportive people out there. Ones who gave feedback, lists of improvements, shouted out repeatedly on social media. Ones who were blunt and told me the reasons why they wouldn’t host desks…. it was all valuable.

I emailed everyone the final email, to say thank you. You can’t just close a service and not say thanks. Some of the responses were lovely.

Thank you.

 

“Where have you been hiding?…..” #nitech #ni #machinelearning #ai #customerloyalty #clojure #java

For those who’ve been asking why I’m not so active in NI…..

Errrm, I haven’t. So far 2019 has thrown some joyous curve balls. Some good, some challenging but the pointers to learn from were in plain sight.

Not Much Conference Talking….

Last year, 2018, was full on tech talks and as much as I love doing them it felt like I was treading old ground, a bit like keeping the old classics in the set even though you hate doing them.

In terms of local talks, I stopped. The transaction costs weren’t that high but I certainly wasn’t getting any value back. Plus the amount of sponsored meetups, hackathons and events were pushing out any realistic assessment of the AI/ML landscape locally, just an opinion.

I’d lost my joy for conference talks, I wasn’t talking about the things that mattered and it wasn’t until I back tracked my roots and realised how I was missing talking about real world retail and customer loyalty….. I know some of you had asked about me doing more of that, I’m still finding an interesting angle. (That and no one asks me now).

This year also saw me more involved in the international conferences that I do love. I’m now part of the programme committees for O’Reilly’s Strata Data Conference in London and San Jose, and also ClojureX in London.

And remember, no one should feel pressured to talk. If you want to do it, do it. If you don’t, then don’t.

Machine Learning Book 2nd Edition

Work has now started on the update to Machine Learning: Hands On for Developers and Technical Professionals. More machine learning at scale on the JVM (in Java and Clojure) and more on Deep Learning, Kafka, Image recognition and text mining.

Release won’t be until the end of the year or into 2020. Not my call, depends on how fast I can type…. If you don’t see me then I’m probably typing.

 

Apache Storm: From Clojure to Java….. some thoughts. #clojure #java #storm

The route to Clojure can be an odd one. For some it just falls straight in to the developer’s way of working (“It’s Lisp, I geddit!”). Others, like me, with our old Java based OOP heads struggled for the penny to drop.

If Java is your main language then the move to Clojure can be difficult, it’s a different way of thinking. If Clojure is your first/main language then doing Java Interop in Clojure is going to melt your head (I’ve seen this a lot, I found it surprising too).

For me the penny dropped when my then boss, Bruce Durling, put it to me like this: “Data goes in to the function, data goes out of the function”. After that everything made sense and if you make functions small, separate and testable then it’s a joy to use.

There’s one issue though that has always been a challenge, not just for Clojure but other languages, mainstream adoption.

It’s better for a developer to have two or three languages in their toolbox, not just one. The reason…. well the Apache Storm project dropped the mic.

https://storm.apache.org/2019/05/30/storm200-released.html

“While Storm’s Clojure implementation served it well for many years, it was often cited as a barrier for entry to new contributors.”

Yup get that completely.

Clojure Takes Time….

Clojure takes time to learn and to do well. There’s a group of folk in society that just get confused by too many parentheses, I was one of them. Another thing I’ve found is that adoption route can be made harder by the documentation in projects, too many times I’ve come across things that you were just supposed to know, it just wasn’t helpful.

I suffered huge huge huge imposter syndrome with the Clojure community, they talked in a different language, my mental reaction was “I don’t fit in here”. They spoke about solutions that were just plain confusing. Over the last four years of this blog I’ve done my best to break stuff down and explain it in English to give the next poor sod a chance. I was actually scared of doing my first talk at ClojureX, petrified actually. The audience in the room knew far more than I did.

Finding Clojure developers is pretty much an uphill struggle, it’s a small circle. Finding good ones is harder, though that could be said of Scala and the like too. It’s easier to cross train someone from Java into Clojure but that takes time and most companies are not in a position to wait, there’s work to be done. Recently I was talking to a company who were potentially interested in hiring but the made one thing very clear, “We wouldn’t want you to do anything in Clojure, no one here can support it.”, I totally agree, the bus number is key.

So with something like Apache Storm this does not come as a surprise, Apache projects need adopters and that is a numbers game. Do a project with minority adoption and then there’s a good chance the project will wither and die. Actually I didn’t realise Storm was written in Clojure until I read the announcement.

The Bottom Line is I Love Clojure

Knowing what I know now I find it hard to move away from Clojure. DeskHoppa is 100% Clojure but I know it’ll be developing that for the time being. I’ve realised that it’s a niche especially when it comes to things like Strata Data Conference where I’ve always put things in Java and some Clojure, I’ve had too otherwise my talks get rejected.

I never wanted to learn Haskell…….

Finding #pi with #montecarlo method and #Clojure – #math #justmath

I was reading a post from Toward Data Science blog this morning on mathematical programming to build up skills in data science by Tirthajyoti Sarkar. While the article was based around Python it didn’t use any of the popular frameworks like NumPy or SciPy.

Now with a bit of a lull I wanted to keep my brain ticking nicely so the thought of using math within Clojure appeals nicely to me. And I’m not saying one is better than the other. The best language to for data science is the one you know. The main key of data science is having a good grounding of the math behind, not the frameworks to make it easier.

Calculating Pi By Simulating Random Dart Board Throws

The Monte Carlo method is the concept of emulating a random process. When the process is repeated a large number of times will give rise to the approximation of some mathematical quantity of interest.

If you imagine a square dart board…..

Now imagine a square dart board with a circle inside the square, the edges of circle touch the square…..

If you throw enough darts at the board some will land within the circle and some outside of it. As the original article graphically put it:

These are random throws, you might throw 10 times, you might throw 1 million times. At the end of the dart throws you count the number of darts within the circle, divide that by the number of throws (10, 1m etc) and then multiply it by 4.

As the original article states: the probability of a dart falling inside the circle is just the ratio of the area of the circle to that of the area of the square board.

The more throws we do the better chance we get of finding a number near Pi. The law of large numbers at work.

Throwing a Dart at The Board

I’m going to create a function that simulates a single dart throw. I want to break down my Clojure code into as many simple functions as possible. This makes testing and bug finding far easier in my opinion.

(defn throw-dart []
  {:x (calc-position 0)
   :y (calc-position 0)})

What I’m creating an x,y coordinate with a 0,0 centre point then passing the coord for the x and the y through another function to calculate the position (calc-position).

(def side-of-square 2)

(defn calc-position [v]
  (* (/ (+ v side-of-square) 2) (+ (- 1) (* 2 (Math/random)))))

The calc-position function takes the value of either x and y and applies the calculation, this is somewhere -side-of-square/2 and +side-of-square/2 around the centre point.

Running this function in a REPL we can see the x or y positions.

mathematical.programming.examples.montecarlo> (calc-position 0)
0.4298901518005238

Is The Dart Within The Circle?

Now I have a x,y position as a map {:x some random throw value :y some random throw value} I want to confirm that the throw is within the circle.

Using the side-of-square value again (hence it’s a def ) I can figure out if the dart hits within. I’ll pass the map with x,y coords in and take the square root of the added squared coordinates.

(defn is-within-circle [m]
  (let [distance-from-center (Math/sqrt (+ (Math/pow (:x m) 2) (Math/pow (:y m) 2)))]
     (< distance-from-center (/ side-of-square 2))))

This function will return true or false. If I check this in the REPL it looks like this:

mathematical.programming.examples.montecarlo> (throw-dart)
{:x 0.22535085231582297, :y 0.04203583357796781}
mathematical.programming.examples.montecarlo> (is-within-circle *1)
true

Now Throws Lots of Darts

So far there are functions to simulate a dart throw and confirm it’s within the circle. Now I need to repeat this process as many times as required.

I’m creating two functions, compute-pi-throwing-dart to run a desired number of throws and throw-range to do the actual working to find the number of true hits in the circle.

(defn throw-range [throws]
  (filter (fn [t] (is-within-circle (throw-dart))) (range 0 throws)))

(defn compute-pi-throwing-dart [throws]
  (double (* 4 (/ (count (throw-range throws)) throws))))

The throw-range function executes the throw-dart function and is-within-circle evaluates the map to see if the value is either true or false. The filter functions will return a list of true values. So, for example, if out of ten throws the first, third and fifth are within the circle I’ll get (1,3,5) as the result from the function.

Calling the function compute-pi-throwing-dart sets all this into motion. Like I said at the start, taking the number of darts in the circle and dividing that by the number of throws taken, multiplying that by four should give a number close to Pi.

The more throws you do, the closer it should get.

mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
3.2
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
3.2
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
3.6
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
2.4
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
4.0
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
2.8
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 100)
2.92
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 1000)
3.136
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10000)
3.138
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 100000)
3.15456
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 1000000)
3.13834
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10000000)
3.1419096

Let’s Build a Simulation

Via the REPL there is proof of an emergent behaviour, the value of Pi comes from the large number of throws we did at the dart board.

The last thing I’ll do is build a function to run the simulation.

(defn run-simulation [iter]
  (map (fn [i]
    (let [throws (long (Math/pow 10 i))]
      (compute-pi-throwing-dart throws))) (range 0 iter)))

If I run 4 simulations I’ll get 1, 10, 100 and 1000 throws computed, these are then returned as a list. If I run 9 simulations (which can take some time depending on the machine you’re using) in the REPL I get the following:

mathematical.programming.examples.montecarlo> (run-simulation 9)
(0.0 3.6 3.28 3.128 3.1176 3.1428 3.142932 3.1425368 3.14173752)

That’s a nice approximation, Pi is 3.14159265 so to get a Monte Carlo method to compute Pi by random evaluations is good.

 

 

 

Using your table tennis table to create startup revenue.

Photo by Dennis Cortés on Unsplash

(Originally posted on the DeskHoppa Medium site.)

The table tennis table, the startup’s secret weapon to get team members to work together and collaborate, allegedly. Recruiters love putting the humble table tennis area as one of the big bonuses of startup hiring, along with the beer fridge and oversized bean bags.

However, utilisation of the slab of board is usually low and if you have remote workers it’s really difficult to have a game of ping pong with them. With a full size playing area of about 19 feet by 11 feet it takes up a large amount of square footage too.

Putting The Table Tennis Table To Better Use

Here’s the DeskHoppa simple guide to putting the table tennis table to better use.

  1. Fold it up (or sell it, or burn it* outside out of harms way).
  2. 19ft x 11ft is 209 square feet. With an average desk and chair taking 30 square feet you can fit six working areas in to the same space.
  3. Create a host account on DeskHoppa. In the Live Availability section type 6 in to the “Desks Available” field and 10 in to the “Price Per Hour” field. Click on “Update Availability” and you’re ready. You can sell day, week and month passes with DeskHoppa too.
  4. In “General Host Info” section create a funky strapline, “We got rid of our table tennis table so we could meet you!” and a general description of your workspace.
  5. You can add features, things such as free tea and coffee, working Wifi, a whiteboard etc in the “Features” section.
  6. Start shouting about your listing on LinkedinTwitterInstragram and any other social channel you are using.

Here Are The Numbers

With six workspaces on DeskHoppa listing at £10 per hour, six guests staying one hour a day will give you an estimated £14,400 per annum. If those same six guests worked a four hours morning then that’s a potential £57,600 incremental revenue.

If there are certain skill sets that you are on the look out for then using DeskHoppa is the perfect tool for finding them. Guests have profiles that you, the host, review.

DeskHoppa was created to help hosts create revenue and for guests to find somewhere peaceful to work. A professional place where the clatter of coffee cups and loud discussions and reduced.

If you want to learn more about hosting on DeskHoppa please take a look at our “Becoming a DeskHost” section on the DeskHoppa website.

* If you do decide to burn your table tennis table then please note we can’t take any responsibility that may arise from you doing you. Your decision, not ours.

DeskHoppa Engineering — Twitter, Kafka and being Data Driven.

This post was originally published on the DeskHoppa Engineering Blog on Medium.

We built DeskHoppa on data driven decisions. The technology though is used to augment our decision making, not wholly make it for us. How we choose the hosts we contact is based on data, algorithms and probability.

The search and match processes to put a guest together with a host is a pursuit of accuracy that can only be done over time with data, training and evaluation.

Putting those things together is not easy, much of the ground work is done by others who put the time in on their own dime. Open source software powers a lot of what we do.

Giving Something Back To The Community

Deciding to publish any code and setups that are useful to others was a very simple decision to make. What seems supposedly simple to us may be days of work for something else, uncovering the gotchas and documenting them can save a developer days, weeks or even months of unpicking. We’ve been there and have had the development rabbit holes that others have.

We’ve put our publishable repositories on our Github account. Some of it will be code written by us, some of it is just handy scripts that might have come from other places but collated in a way that’s easy for the developer to implement.

Using Kafka and Twitter Data

There’s a natural fit for Kafka and streams of Twitter data. Using a mixture of Kafka Connect to make a connection to Twitter Streams API and then using KSQL streaming query language to transform and query the stream is powerful even in the most simplistic of contexts.

While we do an awful lot more with the data past the KSQL stages we wanted to share a really quick setup for anyone to use. For our first community release to Github we wanted to start with raw data, it’s important to collate relevant data from the outset. Our Kafka/Twitter configuration, based on the excellent blog post by Robin Moffatt on the Confluent Blog is our baseline.

The configuration and required files are on Github https://github.com/deskhoppa/kafka-twitter-collector with a README of what to put where. Assuming you’re using the community edition of the Confluent Kafka Platform everything should slot in to place without any bother.

DeskHoppa: Making Any Business a Co-Working Space #coworking #deskspace #startups #freelancers #hotdesk

When I launched DeskHoppa at the start of February the aim was clear: to enable any startup or business to rent out their desks to anyone who needed one.

Co-working spaces are great, they are allowed to use DeskHoppa too, but the monthly membership was always a barrier for me, the cost would outweigh the usage by a factor of 5 to 1. I needed something more on-demand.

Why Not Use a Cafe?

I stood in a street in Belfast and I only needed a desk for an hour, just to check in with the team I was working with at the time. When I say this to friends and founders alike the first response I get is, “You should have used a cafe”.

There are a few reasons I really don’t like working out of cafes. Firstly there’s the noise, now it’s not loud banging or crashing noise but the ambient soundscape of the daily operation of a cafe, it’s plates being stacked or a Gaggia coffee machine steaming away. It’s very difficult to conduct a call with background noise.

Laptop theft is a huge concern, it’s also a data privacy issue if you are working for a company. It also can happen very quickly as illustrated in the “Caught no Camera: Berkeley Laptop Theft” video.

Lastly, have you ever been in a crowded cafe and been drawn to a conversation in earshot? I’ve got a knack for hearing a good brainstorming session or business meeting. And like the majority of people I know I have a notebook and a pen to hand. I’m sure hundreds of startups have been beaten to market because of this.

So the question is, where you can work for an hour? How about within a business with a spare desk?

Desks By The Hour, Day, Week or Month

Many businesses, startups and co-working spaces (the host) have spare capacity and it’s costing them money. It would make sense for a host to maximise the revenue potential of the desk as a money making asset by renting it out for a period of time. That’s what DeskHoppa does, it gives the host a system to rent out desks and create incremental revenue from them.

As a visitor, DeskHoppa becomes the platform for finding somewhere to work. A network of hosts in city, a choice of locations to work from.

As a host you have full control of how many desks you list, what price you charge and what facilities are available to guests. If you want to sell day, week or month passes to guests that’s available too. DeskHoppa handles the booking, the payment and the host’s booking request process. You can review every booking or automatically accept bookings.

The benefits of offering desks to guests is that you build up a network of potential suppliers. They may be video content producers, software developers or graphic designers. For businesses who are looking to fill skill shortages within the organisation then DeskHoppa may become the first stage in building the relationship.

If you want to signup either as a guest or a host then please go to https://www.deskhoppa.com

(This post was originally posted on the DeskHoppa Blog on Medium).

 

 

Does Craig’s 10 predict the winner? #data #voting #strictly #strictlycomedancing #clojure

It started with a conversation on Clojurians Slack…..

Now, we’ve got some experience with the Strictly scores, we know that linear regression completely trumps neural networks on predicting Darcy’s score from Craig’s score.

This however is different and yet still interesting. And as we know we have data available to us up to season 14.

Does Craig’s elusive ten do much to the outcome? Who knows…..

Load Thy Data….

The data I’ve put in the resources directory of the project. To load it in to our program and make it into a nice handy map…. we have the following two functions. Historical data is from Ultimately Strictly.

(def filename "SCD+Results+S14.csv")

(defn format-key [str-key]
  (when (string? str-key)
    (-> str-key
        clojure.string/lower-case
        (clojure.string/replace #" " "-")
        keyword)))

(defn load-csv-file []
  (let [file-info (csv/read-csv (slurp (io/resource filename)) :quot-char \" :separator \,)
        headers (map format-key (first file-info))]
     (map #(zipmap headers %) (rest file-info))))

The format-keyfunction takes the top line of the CSV file and uses the header row as the key names for each column. So when the load-csv-filefunction is called we get a map of the data with the header names as keywords.

The only downside here is the numeric scores are strings as this spans across all the judges from all fourteen series then there are plenty of “-” scores where a judge didn’t take part. Not a big deal but worth keeping in mind.

Grouping Judging Data

What I’d like is a map of weeks, this will give me a breakdown of series, the judges scores, who was dancing and the song etc. As far as the scores are concerned I’m only interested in 10’s as to test Thomas’ hypothesis.

(defn get-week-groups-for-judge [k data]
  (group-by :week (filter #(= "10" (k %)) data)))

I’d also like a collection of weeks so I can figure out which was the first week that a judge gave a score of 10.

(defn get-weeks [m]
  (map #(key %) m))

(defn get-min-week [v]
  (->> (get-weeks v)
       (map #(Integer/valueOf %))
       sort
       first))

Finally a couple of reporting things. A series report for a given week and also a full report for a judge.

(defn report-for-judge [w data]
  (filter #(= w (first %)) data))

(defn report-for-week [jk w data]
  (map #(select-keys % [:series :week jk :couple]) (data w)))

Now we can have a play around with the data and see how it looks.

With Thy REPL I Shall Inspect…

So, Craig’s scores. First of all let’s get our code in to play.

user> (require '[scdtens.core :as scd])

Load our raw CSV data in…

user> (def strictlydata (scd/load-csv-file))
#'user/strictlydata
user> (count strictlydata)
1594

Now I want to extract scores from the raw data where Craig was the judge who scored a 10.

user> (def craigs-data (scd/get-week-groups-for-judge :craig strictlydata))
#'user/craigs-data
user> (count craigs-data)
7

So there’s seven weeks but which was the first week?

user> (scd/get-min-week craigs-data)
8

Week 8, but we don’t know how many series that covers. We can see that though, a function was created for it.

user> (scd/report-for-week :craig "8" craigs-data)
({:series "2", :week "8", :craig "10", :couple "Jill & Darren"} {:series "7", :week "8", :craig "10", :couple "Ali & Brian"})
user> (p/pprint *1)
({:series "2", :week "8", :craig "10", :couple "Jill & Darren"}
{:series "7", :week "8", :craig "10", :couple "Ali & Brian"})
nil
user>

So in two series, 2 and 7, Craig did score a 10. That’s all good so far, the question is did Craig’s score “predict” the winner of the series?

Looking at the final for series 2, Jill and Darren did win. And for series 7, Ali and Brian didn’t win the competition but they did top the leader board for week 8 as the data shows.

What if we pick another judge?

Craig’s scores are one thing but it turns out that Darcey is a blinder with the 10’s.

user> (def darceys-data (scd/get-week-groups-for-judge :darcey strictlydata))
#'user/darceys-data
user> (scd/get-min-week darceys-data)
4
user> (scd/report-for-week :darcey "4" darceys-data)
({:series "14", :week "4", :darcey "10", :couple "Ore & Joanne"})
user>

Week four, no messing. And guess who won series 14….. Ore and Joanne.

Bruno perhaps?

user> (def brunos-data (scd/get-week-groups-for-judge :bruno strictlydata))
#'user/brunos-data
user> (scd/get-min-week brunos-data)
3
user> (scd/report-for-week :bruno "3" brunos-data)
({:series "4", :week "3", :order "11", :bruno "10", :couple "Louisa & Vincent"} {:series "13", :week "3", :order "14", :bruno "10", :couple "Jay & Aliona"})
user> (p/pprint *1)
({:series "4",
:week "3",
:order "11",
:bruno "10",
:couple "Louisa & Vincent"}
{:series "13",
:week "3",
:order "14",
:bruno "10",
:couple "Jay & Aliona"})
nil
user>

Turns out Bruno was impressed from week three. And all the better was that Jay and Aliona won series 13.

Does Craig scoring a 10 have any steer at all?

In all honesty, I think it’s very little, I mean it’s up there with a Hollywood handshake but they’re being thrown out like sandwiches at a festival now.

The earliest week that Craig scored a 10 was week 8 and only had a 50% hit rate in predicting the series winner from that score.

The judges scores only tell half the story and this is where I think things get interesting, especially in series 16, this current series. And once again it comes back down to where people are putting their money. Risk and reward.

Thomas’ question came about because Craig’s first 10 score cropped up last weekend. Ashely and Pasha get the first 40 of the series but the bookies data sees things slightly different.

Do external data forces such as social media followers have any sway and volume on the public vote? Now that’s the question I think that needs to be looked at. Joe Sugg is a YouTube personality and there’s nothing like going on social media and begging for votes for competitions and awards. So it stands to reason that Joe has a very good chance of winning the competition while being outvoted on the judges scores.

The risk of using Craig’s ten indicator as saying Ashley is going to win, well it does come with risk but increased reward. At 7/1 this is basically saying, based on previous betting movements, that there’s 12.5% chance of Ashley winning. Now only if there was a rational way of deciding…..

Get me Neumann and Morgenstern on the phone! Now! Please!

Is there a potential upside to deciding to go with Craig’s score? Let’s see if we can find out. The one book I still want for Christmas, or any other gift giving event, is The Theory of Games and Economic Behavior by John von Neumann and Oskar Morgenstern. It’s my kinda gig.

Back to Ashley, we can work out the expected utility to see if Craig’s ten and the bookies info is worth a punt.

Expected utility: You multiply the probability of winning by the potential gains and multiply the probability of losing by the potential losses. Adding the two gives you the expected utility of the gamble.


A Warning and Disclaimer

It doesn’t have to be money, I’m not encouraging you go to and place a bet with your own money. That’s your decision to make and I’m assuming no responsibility on that one. I shall, however, continue. Got that, good, now….


Within any gamble there are four elements: The potential gain, the potential loss, the chance of winning and the status quo.

The Status Quo

Forgive me, I had to, there are rules….

The status quo is the current situation we are in, which is exactly what will happen if we do not decide to participate in a gamble.

The Potential Gain

Our reward if the gamble pays off. This has to be better than the status quo.

The Potential Loss

What we lose if the gamble does not go in our favour. This should be worse than the status quo.

The Chance of Winning

The probability of the pay off, it also tells us the chance of it NOT paying off.

Ashley’s Expected Utility

With the bookies general probability of Ashley winning at 12.5% and I have a tenner in my back pocket, at 7/1 odd I’d get £80 back (£70 winnings + my original wager of £10). So I’m going to use 80 as my potential gain and 10 as my potential loss. You gain/loss numbers can be anything, it doesn’t have to be money. It’s just with these numbers in mind you have a mechanism for coming to a figure of expected utility.

The expected utility of winning is 80 multiplied by 12.5% = 10

The expected utility of losing is 10 multiplied by 87.5% = 8.75

The expected utility of the gamble is 10 – 8.75 = 1.25

As the expected utility is above zero (is greater than the status quo) then it’s worth a go. If it was below zero, down down deeper and down the status quo then you’d not want to do anything.

Interestingly Darcey’s been throwing out the 10’s to Ashley for a while. I wish I’d see the bookies odds at week six and not week eight. There may have been a more concrete expected utility to strengthen my position.

Conclusion. Well there isn’t one yet.

This series of Strictly is still raging on so we won’t know the actual outcome until 15th of December. It has been very interesting though to look at the various judge’s 10 scores and see if we can predict outcomes with additional information.

If you want to poke around the Clojure code for this post you can do.

https://github.com/jasebell/scdtens