Finding #pi with #montecarlo method and #Clojure – #math #justmath

I was reading a post from Toward Data Science blog this morning on mathematical programming to build up skills in data science by Tirthajyoti Sarkar. While the article was based around Python it didn’t use any of the popular frameworks like NumPy or SciPy.

Now with a bit of a lull I wanted to keep my brain ticking nicely so the thought of using math within Clojure appeals nicely to me. And I’m not saying one is better than the other. The best language to for data science is the one you know. The main key of data science is having a good grounding of the math behind, not the frameworks to make it easier.

Calculating Pi By Simulating Random Dart Board Throws

The Monte Carlo method is the concept of emulating a random process. When the process is repeated a large number of times will give rise to the approximation of some mathematical quantity of interest.

If you imagine a square dart board…..

Now imagine a square dart board with a circle inside the square, the edges of circle touch the square…..

If you throw enough darts at the board some will land within the circle and some outside of it. As the original article graphically put it:

These are random throws, you might throw 10 times, you might throw 1 million times. At the end of the dart throws you count the number of darts within the circle, divide that by the number of throws (10, 1m etc) and then multiply it by 4.

As the original article states: the probability of a dart falling inside the circle is just the ratio of the area of the circle to that of the area of the square board.

The more throws we do the better chance we get of finding a number near Pi. The law of large numbers at work.

Throwing a Dart at The Board

I’m going to create a function that simulates a single dart throw. I want to break down my Clojure code into as many simple functions as possible. This makes testing and bug finding far easier in my opinion.

(defn throw-dart []
  {:x (calc-position 0)
   :y (calc-position 0)})

What I’m creating an x,y coordinate with a 0,0 centre point then passing the coord for the x and the y through another function to calculate the position (calc-position).

(def side-of-square 2)

(defn calc-position [v]
  (* (/ (+ v side-of-square) 2) (+ (- 1) (* 2 (Math/random)))))

The calc-position function takes the value of either x and y and applies the calculation, this is somewhere -side-of-square/2 and +side-of-square/2 around the centre point.

Running this function in a REPL we can see the x or y positions.

mathematical.programming.examples.montecarlo> (calc-position 0)

Is The Dart Within The Circle?

Now I have a x,y position as a map {:x some random throw value :y some random throw value} I want to confirm that the throw is within the circle.

Using the side-of-square value again (hence it’s a def ) I can figure out if the dart hits within. I’ll pass the map with x,y coords in and take the square root of the added squared coordinates.

(defn is-within-circle [m]
  (let [distance-from-center (Math/sqrt (+ (Math/pow (:x m) 2) (Math/pow (:y m) 2)))]
     (< distance-from-center (/ side-of-square 2))))

This function will return true or false. If I check this in the REPL it looks like this:

mathematical.programming.examples.montecarlo> (throw-dart)
{:x 0.22535085231582297, :y 0.04203583357796781}
mathematical.programming.examples.montecarlo> (is-within-circle *1)

Now Throws Lots of Darts

So far there are functions to simulate a dart throw and confirm it’s within the circle. Now I need to repeat this process as many times as required.

I’m creating two functions, compute-pi-throwing-dart to run a desired number of throws and throw-range to do the actual working to find the number of true hits in the circle.

(defn throw-range [throws]
  (filter (fn [t] (is-within-circle (throw-dart))) (range 0 throws)))

(defn compute-pi-throwing-dart [throws]
  (double (* 4 (/ (count (throw-range throws)) throws))))

The throw-range function executes the throw-dart function and is-within-circle evaluates the map to see if the value is either true or false. The filter functions will return a list of true values. So, for example, if out of ten throws the first, third and fifth are within the circle I’ll get (1,3,5) as the result from the function.

Calling the function compute-pi-throwing-dart sets all this into motion. Like I said at the start, taking the number of darts in the circle and dividing that by the number of throws taken, multiplying that by four should give a number close to Pi.

The more throws you do, the closer it should get.

mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 100)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 1000)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10000)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 100000)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 1000000)
mathematical.programming.examples.montecarlo> (compute_pi_throwing_dart 10000000)

Let’s Build a Simulation

Via the REPL there is proof of an emergent behaviour, the value of Pi comes from the large number of throws we did at the dart board.

The last thing I’ll do is build a function to run the simulation.

(defn run-simulation [iter]
  (map (fn [i]
    (let [throws (long (Math/pow 10 i))]
      (compute-pi-throwing-dart throws))) (range 0 iter)))

If I run 4 simulations I’ll get 1, 10, 100 and 1000 throws computed, these are then returned as a list. If I run 9 simulations (which can take some time depending on the machine you’re using) in the REPL I get the following:

mathematical.programming.examples.montecarlo> (run-simulation 9)
(0.0 3.6 3.28 3.128 3.1176 3.1428 3.142932 3.1425368 3.14173752)

That’s a nice approximation, Pi is 3.14159265 so to get a Monte Carlo method to compute Pi by random evaluations is good.




Using your table tennis table to create startup revenue.

Photo by Dennis Cortés on Unsplash

(Originally posted on the DeskHoppa Medium site.)

The table tennis table, the startup’s secret weapon to get team members to work together and collaborate, allegedly. Recruiters love putting the humble table tennis area as one of the big bonuses of startup hiring, along with the beer fridge and oversized bean bags.

However, utilisation of the slab of board is usually low and if you have remote workers it’s really difficult to have a game of ping pong with them. With a full size playing area of about 19 feet by 11 feet it takes up a large amount of square footage too.

Putting The Table Tennis Table To Better Use

Here’s the DeskHoppa simple guide to putting the table tennis table to better use.

  1. Fold it up (or sell it, or burn it* outside out of harms way).
  2. 19ft x 11ft is 209 square feet. With an average desk and chair taking 30 square feet you can fit six working areas in to the same space.
  3. Create a host account on DeskHoppa. In the Live Availability section type 6 in to the “Desks Available” field and 10 in to the “Price Per Hour” field. Click on “Update Availability” and you’re ready. You can sell day, week and month passes with DeskHoppa too.
  4. In “General Host Info” section create a funky strapline, “We got rid of our table tennis table so we could meet you!” and a general description of your workspace.
  5. You can add features, things such as free tea and coffee, working Wifi, a whiteboard etc in the “Features” section.
  6. Start shouting about your listing on LinkedinTwitterInstragram and any other social channel you are using.

Here Are The Numbers

With six workspaces on DeskHoppa listing at £10 per hour, six guests staying one hour a day will give you an estimated £14,400 per annum. If those same six guests worked a four hours morning then that’s a potential £57,600 incremental revenue.

If there are certain skill sets that you are on the look out for then using DeskHoppa is the perfect tool for finding them. Guests have profiles that you, the host, review.

DeskHoppa was created to help hosts create revenue and for guests to find somewhere peaceful to work. A professional place where the clatter of coffee cups and loud discussions and reduced.

If you want to learn more about hosting on DeskHoppa please take a look at our “Becoming a DeskHost” section on the DeskHoppa website.

* If you do decide to burn your table tennis table then please note we can’t take any responsibility that may arise from you doing you. Your decision, not ours.

DeskHoppa Engineering — Twitter, Kafka and being Data Driven.

This post was originally published on the DeskHoppa Engineering Blog on Medium.

We built DeskHoppa on data driven decisions. The technology though is used to augment our decision making, not wholly make it for us. How we choose the hosts we contact is based on data, algorithms and probability.

The search and match processes to put a guest together with a host is a pursuit of accuracy that can only be done over time with data, training and evaluation.

Putting those things together is not easy, much of the ground work is done by others who put the time in on their own dime. Open source software powers a lot of what we do.

Giving Something Back To The Community

Deciding to publish any code and setups that are useful to others was a very simple decision to make. What seems supposedly simple to us may be days of work for something else, uncovering the gotchas and documenting them can save a developer days, weeks or even months of unpicking. We’ve been there and have had the development rabbit holes that others have.

We’ve put our publishable repositories on our Github account. Some of it will be code written by us, some of it is just handy scripts that might have come from other places but collated in a way that’s easy for the developer to implement.

Using Kafka and Twitter Data

There’s a natural fit for Kafka and streams of Twitter data. Using a mixture of Kafka Connect to make a connection to Twitter Streams API and then using KSQL streaming query language to transform and query the stream is powerful even in the most simplistic of contexts.

While we do an awful lot more with the data past the KSQL stages we wanted to share a really quick setup for anyone to use. For our first community release to Github we wanted to start with raw data, it’s important to collate relevant data from the outset. Our Kafka/Twitter configuration, based on the excellent blog post by Robin Moffatt on the Confluent Blog is our baseline.

The configuration and required files are on Github with a README of what to put where. Assuming you’re using the community edition of the Confluent Kafka Platform everything should slot in to place without any bother.

DeskHoppa: Making Any Business a Co-Working Space #coworking #deskspace #startups #freelancers #hotdesk

When I launched DeskHoppa at the start of February the aim was clear: to enable any startup or business to rent out their desks to anyone who needed one.

Co-working spaces are great, they are allowed to use DeskHoppa too, but the monthly membership was always a barrier for me, the cost would outweigh the usage by a factor of 5 to 1. I needed something more on-demand.

Why Not Use a Cafe?

I stood in a street in Belfast and I only needed a desk for an hour, just to check in with the team I was working with at the time. When I say this to friends and founders alike the first response I get is, “You should have used a cafe”.

There are a few reasons I really don’t like working out of cafes. Firstly there’s the noise, now it’s not loud banging or crashing noise but the ambient soundscape of the daily operation of a cafe, it’s plates being stacked or a Gaggia coffee machine steaming away. It’s very difficult to conduct a call with background noise.

Laptop theft is a huge concern, it’s also a data privacy issue if you are working for a company. It also can happen very quickly as illustrated in the “Caught no Camera: Berkeley Laptop Theft” video.

Lastly, have you ever been in a crowded cafe and been drawn to a conversation in earshot? I’ve got a knack for hearing a good brainstorming session or business meeting. And like the majority of people I know I have a notebook and a pen to hand. I’m sure hundreds of startups have been beaten to market because of this.

So the question is, where you can work for an hour? How about within a business with a spare desk?

Desks By The Hour, Day, Week or Month

Many businesses, startups and co-working spaces (the host) have spare capacity and it’s costing them money. It would make sense for a host to maximise the revenue potential of the desk as a money making asset by renting it out for a period of time. That’s what DeskHoppa does, it gives the host a system to rent out desks and create incremental revenue from them.

As a visitor, DeskHoppa becomes the platform for finding somewhere to work. A network of hosts in city, a choice of locations to work from.

As a host you have full control of how many desks you list, what price you charge and what facilities are available to guests. If you want to sell day, week or month passes to guests that’s available too. DeskHoppa handles the booking, the payment and the host’s booking request process. You can review every booking or automatically accept bookings.

The benefits of offering desks to guests is that you build up a network of potential suppliers. They may be video content producers, software developers or graphic designers. For businesses who are looking to fill skill shortages within the organisation then DeskHoppa may become the first stage in building the relationship.

If you want to signup either as a guest or a host then please go to

(This post was originally posted on the DeskHoppa Blog on Medium).



Does Craig’s 10 predict the winner? #data #voting #strictly #strictlycomedancing #clojure

It started with a conversation on Clojurians Slack…..

Now, we’ve got some experience with the Strictly scores, we know that linear regression completely trumps neural networks on predicting Darcy’s score from Craig’s score.

This however is different and yet still interesting. And as we know we have data available to us up to season 14.

Does Craig’s elusive ten do much to the outcome? Who knows…..

Load Thy Data….

The data I’ve put in the resources directory of the project. To load it in to our program and make it into a nice handy map…. we have the following two functions. Historical data is from Ultimately Strictly.

(def filename "SCD+Results+S14.csv")

(defn format-key [str-key]
  (when (string? str-key)
    (-> str-key
        (clojure.string/replace #" " "-")

(defn load-csv-file []
  (let [file-info (csv/read-csv (slurp (io/resource filename)) :quot-char \" :separator \,)
        headers (map format-key (first file-info))]
     (map #(zipmap headers %) (rest file-info))))

The format-keyfunction takes the top line of the CSV file and uses the header row as the key names for each column. So when the load-csv-filefunction is called we get a map of the data with the header names as keywords.

The only downside here is the numeric scores are strings as this spans across all the judges from all fourteen series then there are plenty of “-” scores where a judge didn’t take part. Not a big deal but worth keeping in mind.

Grouping Judging Data

What I’d like is a map of weeks, this will give me a breakdown of series, the judges scores, who was dancing and the song etc. As far as the scores are concerned I’m only interested in 10’s as to test Thomas’ hypothesis.

(defn get-week-groups-for-judge [k data]
  (group-by :week (filter #(= "10" (k %)) data)))

I’d also like a collection of weeks so I can figure out which was the first week that a judge gave a score of 10.

(defn get-weeks [m]
  (map #(key %) m))

(defn get-min-week [v]
  (->> (get-weeks v)
       (map #(Integer/valueOf %))

Finally a couple of reporting things. A series report for a given week and also a full report for a judge.

(defn report-for-judge [w data]
  (filter #(= w (first %)) data))

(defn report-for-week [jk w data]
  (map #(select-keys % [:series :week jk :couple]) (data w)))

Now we can have a play around with the data and see how it looks.

With Thy REPL I Shall Inspect…

So, Craig’s scores. First of all let’s get our code in to play.

user> (require '[scdtens.core :as scd])

Load our raw CSV data in…

user> (def strictlydata (scd/load-csv-file))
user> (count strictlydata)

Now I want to extract scores from the raw data where Craig was the judge who scored a 10.

user> (def craigs-data (scd/get-week-groups-for-judge :craig strictlydata))
user> (count craigs-data)

So there’s seven weeks but which was the first week?

user> (scd/get-min-week craigs-data)

Week 8, but we don’t know how many series that covers. We can see that though, a function was created for it.

user> (scd/report-for-week :craig "8" craigs-data)
({:series "2", :week "8", :craig "10", :couple "Jill & Darren"} {:series "7", :week "8", :craig "10", :couple "Ali & Brian"})
user> (p/pprint *1)
({:series "2", :week "8", :craig "10", :couple "Jill & Darren"}
{:series "7", :week "8", :craig "10", :couple "Ali & Brian"})

So in two series, 2 and 7, Craig did score a 10. That’s all good so far, the question is did Craig’s score “predict” the winner of the series?

Looking at the final for series 2, Jill and Darren did win. And for series 7, Ali and Brian didn’t win the competition but they did top the leader board for week 8 as the data shows.

What if we pick another judge?

Craig’s scores are one thing but it turns out that Darcey is a blinder with the 10’s.

user> (def darceys-data (scd/get-week-groups-for-judge :darcey strictlydata))
user> (scd/get-min-week darceys-data)
user> (scd/report-for-week :darcey "4" darceys-data)
({:series "14", :week "4", :darcey "10", :couple "Ore & Joanne"})

Week four, no messing. And guess who won series 14….. Ore and Joanne.

Bruno perhaps?

user> (def brunos-data (scd/get-week-groups-for-judge :bruno strictlydata))
user> (scd/get-min-week brunos-data)
user> (scd/report-for-week :bruno "3" brunos-data)
({:series "4", :week "3", :order "11", :bruno "10", :couple "Louisa & Vincent"} {:series "13", :week "3", :order "14", :bruno "10", :couple "Jay & Aliona"})
user> (p/pprint *1)
({:series "4",
:week "3",
:order "11",
:bruno "10",
:couple "Louisa & Vincent"}
{:series "13",
:week "3",
:order "14",
:bruno "10",
:couple "Jay & Aliona"})

Turns out Bruno was impressed from week three. And all the better was that Jay and Aliona won series 13.

Does Craig scoring a 10 have any steer at all?

In all honesty, I think it’s very little, I mean it’s up there with a Hollywood handshake but they’re being thrown out like sandwiches at a festival now.

The earliest week that Craig scored a 10 was week 8 and only had a 50% hit rate in predicting the series winner from that score.

The judges scores only tell half the story and this is where I think things get interesting, especially in series 16, this current series. And once again it comes back down to where people are putting their money. Risk and reward.

Thomas’ question came about because Craig’s first 10 score cropped up last weekend. Ashely and Pasha get the first 40 of the series but the bookies data sees things slightly different.

Do external data forces such as social media followers have any sway and volume on the public vote? Now that’s the question I think that needs to be looked at. Joe Sugg is a YouTube personality and there’s nothing like going on social media and begging for votes for competitions and awards. So it stands to reason that Joe has a very good chance of winning the competition while being outvoted on the judges scores.

The risk of using Craig’s ten indicator as saying Ashley is going to win, well it does come with risk but increased reward. At 7/1 this is basically saying, based on previous betting movements, that there’s 12.5% chance of Ashley winning. Now only if there was a rational way of deciding…..

Get me Neumann and Morgenstern on the phone! Now! Please!

Is there a potential upside to deciding to go with Craig’s score? Let’s see if we can find out. The one book I still want for Christmas, or any other gift giving event, is The Theory of Games and Economic Behavior by John von Neumann and Oskar Morgenstern. It’s my kinda gig.

Back to Ashley, we can work out the expected utility to see if Craig’s ten and the bookies info is worth a punt.

Expected utility: You multiply the probability of winning by the potential gains and multiply the probability of losing by the potential losses. Adding the two gives you the expected utility of the gamble.

A Warning and Disclaimer

It doesn’t have to be money, I’m not encouraging you go to and place a bet with your own money. That’s your decision to make and I’m assuming no responsibility on that one. I shall, however, continue. Got that, good, now….

Within any gamble there are four elements: The potential gain, the potential loss, the chance of winning and the status quo.

The Status Quo

Forgive me, I had to, there are rules….

The status quo is the current situation we are in, which is exactly what will happen if we do not decide to participate in a gamble.

The Potential Gain

Our reward if the gamble pays off. This has to be better than the status quo.

The Potential Loss

What we lose if the gamble does not go in our favour. This should be worse than the status quo.

The Chance of Winning

The probability of the pay off, it also tells us the chance of it NOT paying off.

Ashley’s Expected Utility

With the bookies general probability of Ashley winning at 12.5% and I have a tenner in my back pocket, at 7/1 odd I’d get £80 back (£70 winnings + my original wager of £10). So I’m going to use 80 as my potential gain and 10 as my potential loss. You gain/loss numbers can be anything, it doesn’t have to be money. It’s just with these numbers in mind you have a mechanism for coming to a figure of expected utility.

The expected utility of winning is 80 multiplied by 12.5% = 10

The expected utility of losing is 10 multiplied by 87.5% = 8.75

The expected utility of the gamble is 10 – 8.75 = 1.25

As the expected utility is above zero (is greater than the status quo) then it’s worth a go. If it was below zero, down down deeper and down the status quo then you’d not want to do anything.

Interestingly Darcey’s been throwing out the 10’s to Ashley for a while. I wish I’d see the bookies odds at week six and not week eight. There may have been a more concrete expected utility to strengthen my position.

Conclusion. Well there isn’t one yet.

This series of Strictly is still raging on so we won’t know the actual outcome until 15th of December. It has been very interesting though to look at the various judge’s 10 scores and see if we can predict outcomes with additional information.

If you want to poke around the Clojure code for this post you can do.


Collecting Royalties Without the Middleman, a Concept for @DGMLive and David Singleton

This post is really in response to a Facebook post by David Singleton, the joy of Facebook algorithms means that I didn’t see the actual post until this morning. It’s worth a read especially if you an artist and you want to get paid fairly, there’s a link at the bottom of the page.

What I present here is a proof of concept and probably a shaky blueprint at best but hopefully it outlines some concepts that someone within the industry can run with.

I’ll take the angle of a musician but it could apply to anyone who creates something.

Everything in life is a transaction

A radio play of a song is a transaction, a YouTube video play is a transaction (this throws up a few more questions which I’ll get on to later), a concert ticket sale is a transaction…. you get the picture.

There are actors in every part of the process, some of them wield more power than others. With that imbalance of power the distribution effect can be manipulated, skewed and downright ignored. Over the years with the joys of the internet artists have tried, and rightly so, to regain control over their income and artistic rights. Being able to sell direct has been the goal, with offshoots of subscriptions, exclusive club releases and so on. And they’ve worked, on the whole, fairly well.

However, along with the rise of those types of services you still have the larger monopolies such as iTunes, Spotify and Amazon who control their own ecosystems. And with the same message as the National Lottery, and perhaps the same probability of a positive win, you have to be in it to win it. Once a large volume of consumers piles on to the platform the artist is under a certain amount of pressure to join on the fear of missing out on revenue.

One of the joys, especially for me who loves customer loyalty data, transactional data and the real time nature of these things, for me everything is a transactional data point. This includes every musician in the band, past and present, every track recorded, every concert ticket sold.  The question is how to combine all those data sources so everyone gets paid.

Scribbled before heading out of the door….

Yup it’s one of those drawings again…

Have notebook, a pen and a cup of tea. I will scribble.

I see radio stations, streaming services, direct sales and ticket sales as “consumers of the artist”, now they might not directly consume the product but merely act as a wholesaler to the listener/audient. However there is a transaction and that transaction will be recorded. Breaking it down a stage further everything is an entity and it relates with another entity.

The Band/Artist

Should I call this the brand? Perhaps I should, as an entity it’s what the end consumer/fan/audient connects with. It gets tribal, I’m a huge fan of King Crimson, St. Vincent and Level 42…. I connect with them all. I also connect with the members of each of that entity so it needs breaking down a little further.

Having the band/artist as an entity is important. Lineups of that entity can change over time, anyone who knows King Crimson well is aware of this fact, changing lineups may also mean changing publishing rights of the music and this gets important when it comes to compensating people over the long term.

The Asset

Assets of the brand, the type of entity opens up here and gets interesting. A concert is a live asset with multiple members, an album is an asset made up of assets (songs). Each asset has members that performed and wrote the pieces in question. What was once an administrative nightmare could actually be easy to manage in these digital data driven times.

The Individual Member

Who “works” for the band? Like I said above lineups change over time. Here though a member is a member. Interestingly this could be said for a solo artist. Is Annie Clark the member of the brand “St Vincent”? I think so. It also means that frees the individual up to work on other projects outside of the main brand. Collaborations therefore become measurable.

In this instance it doesn’t have to just be a musician, it could be a manager or a producer connected with the brand or artist. If you can negotiate a transactional amount then you can allocate reward over time.

A good case in point would be Nile Rogers who worked on (the asset) Like A Virgin by (the brand) Madonna. Nile waved his advance for producing the album and renegotiated his royalty on sales. My only surprise was the Nassim Taleb didn’t include a paragraph in his book “Skin In The Game“, it was the perfect example.

The Consumer

Once again, as with the asset, a consumer can take on multiple personas. It may be an organisation such as Apple, Spotify or Amazon. It might be Google/YouTube or just an average person who likes to purchase the wares.

A consumer at this point may not be the end user of the asset. This may be a wholesale transaction with a different volume of money associated to it. Multiple consumers can have different sale amounts attached to them.

An Asset Transaction

Now we get to the interesting part. The performance of a song is an asset transaction, whether it be live, recorded, streamed or just straight purchased (I still prefer a purchased CD for value for money played over the long term).

With the member attached to an asset then breaking down becomes much easier, it’s just a process at that point. Especially with a band like King Crimson when songwriting credits are spread over many people over a long period of time and many songs over many periods can be played live.

The live performing band can play Discipline knowing that it will be recorded in a ledger of some form (more on that in a moment). This record once processed knows that the writers: Adrian Belew, Bill Bruford, Robert Fripp and Tony Levin will be due some form of performance payment based on the agreed consumer value. The same goes for someone streaming the same song from Spotify for instance, the record of that transaction is saved and the members compensated accordingly, it’s just another consumer with another value attached to it.

This does mean though that every live performance set list needs to be recorded too. And yes I appreciate that whimsical flights of fancy happen when an audience member yells “House of the Rising Sun” and you launch into it. With a finalised set list pre or post performance you have a list of transactions and everything connects with each other.

Calculating Asset Wealth Distribution

Or, “How do I get my money!?”

As we know the transaction amount to a consumer for a specific asset calculations to what is owed to whom becomes just a case of mapping each transaction and ending up with an amount owed to a member.

We end up with a kind of conceptual graph of the relationship between the writers, the artist, the performed asset and the consumer.

(concert) [:performed] <- (asset) -> [:by] (brand) -> [:written_by] (members)

From there it’s purely data mining, finding out who is owed what. With everything recorded in some form of ledger, well you have something to reference. It just becomes a job of performing that function.

How and what frequency the royalties are calculated is another matter. Doing it in real time while possible is not feasible from a payment point of view. Payment transactions come with their own cost. Depending on transaction volumes a monthly, quarterly or annual run are perfectly reasonable. The calculations themselves are pretty much unique to the brand in question. What works for KC may not work for Level 42, which also may not work for St Vincent. Like I said, consumers have different agreements with the brands.

And you might have noticed at no point have I mentioned a middleman collecting the royalties. It should be done direct, peer to peer, no middleman. The reason, I’m wildly assuming, for them existing in the first place was that performance, recorded or otherwise, was just about impossible to monitor. Radio play may have been different but live performance was hard to monitor. So it was easier for every consumer (shop, public space) to pay a fee and hope it would get distributed fairly.

We need to talk about Youtube

Copyrighted material is difficult to police. YouTube adds to the computational woe in the fact that many who publish an artists work have nothing to do with the artist at all.

And it’s all very well that sharing the love of the band or that song but at the end of the day no one is really getting paid for it. Now there are certain things you might see like some copyright information and a link to purchase the track or album in question. It’s far from peer to peer transactions and it’s also far from perfect.

With the ledger we know that a video is being viewed. If it’s one of your songs is played (the asset) then it’s just another asset transaction, and because we know the connecting members of that asset, well it means they are due something from that consumer. At this point a consumer is every YouTube account that it is publishing your asset. Now that’s opened up tens, hundreds, even thousands of revenue streams.

As far as we’re concerned it’s just another data source.

Challenges, hmmm there’s a few.

Right now there many middlemen and it’s really bad business for them to be cut out of the loop. I know this from previous startups in aircraft leasing, I’m good at annoying brokers in the chain. What I’ve described above while not impossible to do, it’s just data at the end of the day, is an implementation challenge for one simple reason.

Not every player I’ve described will be on board.

At the moment the large ecosystems are telling the artist what they will pay them. Spotify streaming sales are fractions of a pence and even with a long long long tail could take years to make any decent money. Power laws come in to play here, 20% of the catalogue will (probably) make 80% of the revenue.

To get a large organisation to emit data per play is not impossible. It means that brands have to be savvy enough to pull the data in and process it. Investment in some form of venture to handle all this must happen first.

To decentralise the whole royalty payment system out of a few power brokers, well that’s interesting and risky. At this point you become the main broker of performance data (does that not already exist?). The power merely shifts from one place to another.

Decentralisation is hard (and no one dare mention the word Blockchain to me). Implementation to each partner is hard, time consuming and usually too technical for the lay person to understand.

Live performances I’ve already touched on, a system to record concert performances with a list of assets so it can be processed with a result of who exactly gets paid what. Once again all doable, but who are the partners that do all this, is it the band before performance? Is it the venue? Is there a gig register? Cover bands at this point should be worried if you’re filing set lists you’ll be paying everyone.


There is a lot covered here, some ideas are worth fleshing out and some ideas would take so long to implement. There are trade offs and some parts of the data model are easier to execute than others.

Back to my main point though, once the concepts are broken down and everything becomes a transaction then it’s easy to figure who is supposed to get what. And to reduce disputes, as David’s post was getting at, then you need a transaction for everything to do with an asset. After that it’s just accounting.

Getting all the players on board is an entirely different conversation.

Now where do I send my money when I belt out Sartori In Tangier on my Chapman Stick when I’m at home? It’s a live performance after all. 🙂

For those that got this far…. here’s David’s post.



Please don’t feel pressured to do a tech talk to further your career.

The tech talk seems, to me, to have become the new church of tech especially when it comes to meetups. I’m not sure if it’s the fear of missing out (FOMO) or the belief that it going to make a big difference in your career but everyone I’ve come across in the early stages of their career seem to think they have to do one.

Dear reader, I have some information for you: you don’t have to do one.

I wager 99% of the tech community don’t do talks

Honestly, they’re happier doing their job during the day and going home in the evening. At weekends they pursue their hobbies and interests and get on with life. At the end of the month they pick up their payslip and repeat the next month and so on.

And you know what, that’s fine. They may stay at home and learn some new stuff, in their time, or push some code they’ve done to a github repo. It’s still learning and sharing, it just doesn’t require hauling backside to a room of warm beer and pizza.

For the record, my career started in 1988 and it wasn’t until 20 years later I even considered doing a talk. The only reason I started was new location, no network and I needed to make one fast. So I did a Barcamp in 2009, a lot of things happened after that. After that I’ve never done a talk because of pressure either by myself or anyone else. I did them because I enjoyed them, they’re a performance and I like a stage. I think I know where it comes from…..


So if anyone says you have to do a tech talk to get anywhere in this career, they’re talking shite.

If you feel you want to do one, then by all means. Just don’t let other people sway you (unless you really want the drama and attention).

Getting started, if you really must

Small talks are fine to get started, meetups and developer events are all usually good. If you are going to present though make sure you know your stuff 100%. The nice thing with these events is that it’s not normally a huge process to get accepted, just ask and say you want to do something.

You are there to teach, once you understand that then you are unstoppable. Teach something that goes beyond the getting started documentation of a framework for example, it needs to be more than that, a lot more than that. Another walkthrough from the same chatbot framework will drive me mad.

Expect questions and answers at the end. While it’s okay not to know everything people are giving you their time so you can impart knowledge to them, they will have questions.

From experience people are friendly and want to see you do well so Q&A tends to be a nice humane affair. However you do get the odd one, one Barcamp talk in 2010 I was asked, “What time are you finishing, I’ve not learned anything I thought I would“, this kinda knocked me off course but somehow I brought it back. What was a picture was the rest of the audience who were pretty shocked. The chap got ejected from the conference not long after. You do get them, just not very often.

Personally small developer meetups are not for me, they’re not my thing. Location is not on my side either, it’s 140 mile round trip for me, finish work early to drive and then to drive back again. And the beer is off limits as I’m driving. That’s me and that’s fair enough. I also prefer something a little bigger.

Conferences and proposals are a long game

This photo by Ellen Friedman means a lot to me. This isn’t just me doing a talk, this is five years of ideas, proposal writing, tweaking, talking to folk and figuring out how to get my points across. The Strata Conference Call For Proposals (CFP) is competitive to say the least.

And I learned an important factor, writing proposals for this level of conference is not what you want to talk about but what you feel is relevant to the audience at this point in time. Once I sussed that I had an acceptance and a wait listed talk.

I didn’t want to further my career, I just wanted to talk at my favourite conference. There was no master game plan. And I was nervous and I made the fatal mistake while I was preparing my slides, I took all the personality out of it. Big mistake.

Attendees like a personality otherwise it becomes a trudge through talk after talk of very nice people, sometimes you need something to wake you up. I took out all my humour when I was preparing the slides and when I run them past my boss (possibly the very first time I’d done that ever) he told me to put the Jase schtick back in.

Finding out I was last on, “Jase, you’ve got the graveyard shift, how do you feel?”, “Oh I’m fine with it, means I can wake them up.”

I’m glad I listened to my boss. Watching a Strata talk audience do a Mexican wave was rather funny and they enjoyed it. Yes I did do that.

I should try that at ClojureX.

The main thing to remember though….

I’ve always been in control of a number of aspects:

  • Whether I wanted to do the talk. Never forced, I could say no. And I say no a lot of times during the course of a year.
  • I control the content, no employer has told me what to talk about. (and sometimes the content is secondary, it’s airtime for the brand on stage).
  • I was me. And that comes with heckling, controversy and pragmatism. Not everyone is going to agree with what I say, thank goodness. I got boo’d at Big Data Belfast for saying Python wasn’t much cop to use 🙂
  • I’ve not done it to further my career, I just like to share information. “This worked for me, it might work for you”. It’s about giving more information that you receive.

If you feel the slightest bit uncomfortable about doing a talk then seriously don’t do it. There are other avenues to share information, it might be a blog, a podcast or a video. Whatever works best for you, then go for it.

In all honesty the blog and Twitter did more for my career than meetups and conferences.

Too small to #Kafka but too big to wait: Really simple streaming in #Clojure. #queues #pubsub #activemq #rabbitmq

In days gone by businesses proclaimed “we’re gonna do Hadoop Jase!”, no word of a lie, they used to phone me up and tell me so…… my response was fairly standard.

Now the world has gone streaming-based-full-on-kooki-mad because, “Jase we need the real time, we’re all about real time now, our customers demand realtime*

* They probably don’t but hey….

The Problem With Kafka….

Well, in reality it’s not really a problem you just have to appreciate what Kafka was designed for. Kafka was designed for high volume message processing. It’s the Unix pipe on steroids, more steroids and a few more on top of that. When I say “high volume” I mean Linkedin billions of messages high volume.

And to use Kafka properly you need machines, plural. Three beefy servers for each of the Kafka brokers (you need at least three for leader election if the master broker pops off for some reason), then another three for Zookeeper because the last thing you need on earth is Zookeeper dying on you otherwise everything dies. And really if you’re on pager duty that’s the last thing you need.

Call me psychic, I know exactly what you’re thinking…..

Why not use Kinesis, Jase! Yer big ninny!

Well yeah, but no. There are two reasons, firstly I’m tight and I hate spending money whether it be mine or other people’s. “But we got VC funding so we run $5k/month on servers”, it doesn’t wash with me. You’ll run out of money. My other issue is more technical.

Kinesis is fine for things but my heart just doesn’t settle for the whole five consumers per shard malarky. Kinesis has some nice things but performance and consumer scaling are not two of them.

So, what to do?

Something a little more, manageable oh and smaller.

So I asked this question to the wider community (ie everyone) and I got one response from a Mrs Trellis of North Wales, well actually it was Bruce Durling, suggesting Factual’s Durable Queue. I wasn’t aware of it’s existence….

The disk based bit is important, very important. If the stream dies at any point I need to know that it will pick up messages from the point it died. Kafka does it, RabbitMQ does it and the others do it, so I need this to happen.

Durable Queue uses core.async to do it’s bidding. And it’s easy to put a stream or a number of streams together.

A Quick Demo Project

Add the Dependencies

First of all we need to add a few dependencies to a project.

[com.taoensso/timbre "4.10.0"]
[factual/durable-queue "0.1.5"]
[org.clojure/core.async "0.4.474"]

Adding the required namespaces to the namespace.

Let’s create a simple queue. Add the required dependencies in the namespace. You’ll need the Durable Queue and Core.Async namespaces, I’ve added the Taoensso Timbre logging as it’s my preferred way of logging.

(:require [durable-queue :refer :all]
          [clojure.core.async :refer [go-loop <! timeout]]
          [taoensso.timbre :as log])

Create a Disk Based Queue

Our actual queue is easy to create, pick a directory to write your queue to and you’re done. I see this in the same way as a Kafka topic, each file is basically your topic queue.

(def my-test-queue (queues "/tmp/" {}))

Define the Function The Does The Processing of Each Message

Think of this as your streaming api app, where the magic happens so to speak. This is the function that processes every message that’s been taken from the queue.

(defn the-function-that-does-the-work [m]
  (log/info "Got the message! - " m))

Put a Message On the Queue

We need to be able to add messages to the queue. The key is the key of the queue, further on down when we look at the queue loop itself, the reason why will become clear). The value, well I like passing Clojure maps of information but in reality it can be whatever you wish.

(defn send-something-to-queue [k v]
  (put! my-test-queue k v))

Taking Messages From the Queue

Now we’ve got something to put messages on the queue, we need something to take them off. Basically this is take the message from the queue with the same key as we passed in with the send-something-to-queue function. The messsage needs dereferencing so we get the true payload.

(defn take-from-queue [k]
  (let [msgkey (take! my-test-queue k)]
      (the-function-that-does-the-work (deref msgkey))
      (complete! msgkey))))

Also worth noting that we have to manually say that we’ve completed the work on that message with the complete! function. If this doesn’t happen then the message is still classed as queued and if you restart the queue (it dies for example) then that same message will get processed again.

The Queue Loop

I’m going to run a core.async go-loop against this function. If there are messages in the queue to be processed then go and get the next one.

(defn run-queue-loop []
  (if (not= 0 (:enqueued (stats my-test-queue)))
  (take-from-queue :id)))

And finally a core async go-loop to keep everything going. A timeout of one second just to keep things sane.

(go-loop []
    (<! (timeout 1000))

Testing the Demo

You could either run this from the REPL (it works a treat) or run it standalone, though I didn’t provide a -main function in the repo, you’re happy to add if you wish. I’ve created two topic queues, q1 and q2, and will send a number of messages (nrange) to them. I’m just doling out randomly decided topic destination at this point. Mainly doing this to prove that multiple topic queues do work.

(defn run-queue-demo [nrange]
  (map (fn [i]
    (if (= 0 (rand-nth [0 1]))
      (q1/send-something-to-queue :id
        {:uuid (str (java.util.UUID/randomUUID))})
      (q2/send-something-to-queue :id
        {:uuid (str (java.util.UUID/randomUUID))}))) (range 1 nrange)))

When I run this in the REPL with 10 messages, I get the following output. You can see messages going to each queue and as we can see the output of the processing function we know the stream is working.

fractal.queue.demo> (run-queue-demo 10)
(true true true true true true true true true)
18-11-05 07:35:12 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "c55cc0f8-54ff-47ca-81a2-858af68f47b2"}
18-11-05 07:35:12 Jasons-MacBook-Pro.local INFO [fractal.queue.mainqueue:10] - Got the message! - {:uuid "56d83928-0019-4d58-bfc0-af2dbbf625b0"}
18-11-05 07:35:13 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "875dc578-b32c-4548-ac06-51ab9ef93d41"}
18-11-05 07:35:13 Jasons-MacBook-Pro.local INFO [fractal.queue.mainqueue:10] - Got the message! - {:uuid "0315c4e3-ba6b-4533-ac41-62293038da30"}
18-11-05 07:35:14 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "f522deb6-968c-44a1-a042-2f312f4d314b"}
18-11-05 07:35:14 Jasons-MacBook-Pro.local INFO [fractal.queue.mainqueue:10] - Got the message! - {:uuid "bc7fc7e5-aaf2-4226-871b-31f31199356c"}
18-11-05 07:35:15 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "b5a892fd-f516-4133-991b-539f8b8477be"}
18-11-05 07:35:16 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "6ae167b2-c7a3-42ec-87eb-14b629348a21"}
18-11-05 07:35:17 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "2f97c442-31ca-495e-b1b6-b7019917d504"}

Another nice feature of the durable queue is that you pull up stats on the queue. The num-slabs is the number of files the queue uses.

fractal.queue.demo> (durable-queue/stats q1/my-test-queue)
{"id" {:num-slabs 1, :num-active-slabs 1, :enqueued 3, :retried 0, :completed 3, :in-progress 0}}


A basic topic based queue system, a bit like Kafka but without all the overheads. From a Clojure perspective it’s a great use of core.async and looking at the throughput figures on Factual’s github page it’s a system I’m happy in using until I’m at a point where I really do need Kafka (and all those machines).

There’s a Github repo of the demo code here.


So, I was half asleep when I wrote this, it’s Factual not Fractal. Thanks to Mrs Trellis for the headsup…..

Enabling JMX Metrics in Kafka – #kafka #jmx #monitoring #streaming #data

Something that doesn’t get talked about much when you’re starting out with Kafka is monitoring. Most initial thoughts go into topic log timeouts and maximum message sizes.

JMX is your friend

Kafka emits a lot of JMX information, this gives us a very good indication of the internal measurements that are happening. To tap in to this gold mine of information we need to do some very basic settings (well, it’s one setting) to get this information.

Ladies and Gentlemen, start your brokers.

Let’s get this show on the road, it’s very easy.

First of all let’s startup Zookeeper.

$ $KAFKA_HOME/bin/ config/

Open another terminal window tab. Before you start your server, export the JMX_PORTenvironment setting and use a free port, I’m going to use 9999.

$ export JMX_PORT=9999

Once this is set you can start the Kafka server.

$ bin/ config/

Last thing to do is test that our JMX setup is working properly. I’m just using the plain old, but handy all the same, jconsolefrom the Java Development Kit.

$ jconsole localhost:9999

You will see the JConsole window appear, now if you click on the MBeans tab you’ll see all the Kafka JMX metrics. One you should really be keeping an eye on is kafka.server:type=ReplicaManager,name=UnderReplicatedPartitionsbut I’ll talk more on that another time.

More Kafka Posts

Topic level settings you can’t ignore.

Monitoring Consumer Offsets.

Quick Recipe for Kafka Streams in Clojure.