The Annual Pilgrimage to #ClojureX 2018, it’s all about the AI.

I’m not sure how but I got asked back….. ClojureX 2018 runs on 3rd and 4th December at Codenode, hosted by Skillsmatter.

Nothing about Onyx this year, promise.

All the details here.

The full programme here.

And I promise I’m going to wear something different this time.

 

 

Advertisements

Does Craig’s 10 predict the winner? #data #voting #strictly #strictlycomedancing #clojure

It started with a conversation on Clojurians Slack…..

Now, we’ve got some experience with the Strictly scores, we know that linear regression completely trumps neural networks on predicting Darcy’s score from Craig’s score.

This however is different and yet still interesting. And as we know we have data available to us up to season 14.

Does Craig’s elusive ten do much to the outcome? Who knows…..

….even Darcey is shocked when it happens.

Load Thy Data….

The data I’ve put in the resources directory of the project. To load it in to our program and make it into a nice handy map…. we have the following two functions. Historical data is from Ultimately Strictly.

(def filename "SCD+Results+S14.csv")

(defn format-key [str-key]
  (when (string? str-key)
    (-> str-key
        clojure.string/lower-case
        (clojure.string/replace #" " "-")
        keyword)))

(defn load-csv-file []
  (let [file-info (csv/read-csv (slurp (io/resource filename)) :quot-char \" :separator \,)
        headers (map format-key (first file-info))]
     (map #(zipmap headers %) (rest file-info))))

The format-keyfunction takes the top line of the CSV file and uses the header row as the key names for each column. So when the load-csv-filefunction is called we get a map of the data with the header names as keywords.

The only downside here is the numeric scores are strings as this spans across all the judges from all fourteen series then there are plenty of “-” scores where a judge didn’t take part. Not a big deal but worth keeping in mind.

Grouping Judging Data

What I’d like is a map of weeks, this will give me a breakdown of series, the judges scores, who was dancing and the song etc. As far as the scores are concerned I’m only interested in 10’s as to test Thomas’ hypothesis.

(defn get-week-groups-for-judge [k data]
  (group-by :week (filter #(= "10" (k %)) data)))

I’d also like a collection of weeks so I can figure out which was the first week that a judge gave a score of 10.

(defn get-weeks [m]
  (map #(key %) m))

(defn get-min-week [v]
  (->> (get-weeks v)
       (map #(Integer/valueOf %))
       sort
       first))

Finally a couple of reporting things. A series report for a given week and also a full report for a judge.

(defn report-for-judge [w data]
  (filter #(= w (first %)) data))

(defn report-for-week [jk w data]
  (map #(select-keys % [:series :week jk :couple]) (data w)))

Now we can have a play around with the data and see how it looks.

With Thy REPL I Shall Inspect…

So, Craig’s scores. First of all let’s get our code in to play.

user> (require '[scdtens.core :as scd])

Load our raw CSV data in…

user> (def strictlydata (scd/load-csv-file))
#'user/strictlydata
user> (count strictlydata)
1594

Now I want to extract scores from the raw data where Craig was the judge who scored a 10.

user> (def craigs-data (scd/get-week-groups-for-judge :craig strictlydata))
#'user/craigs-data
user> (count craigs-data)
7

So there’s seven weeks but which was the first week?

user> (scd/get-min-week craigs-data)
8

Week 8, but we don’t know how many series that covers. We can see that though, a function was created for it.

user> (scd/report-for-week :craig "8" craigs-data)
({:series "2", :week "8", :craig "10", :couple "Jill & Darren"} {:series "7", :week "8", :craig "10", :couple "Ali & Brian"})
user> (p/pprint *1)
({:series "2", :week "8", :craig "10", :couple "Jill & Darren"}
{:series "7", :week "8", :craig "10", :couple "Ali & Brian"})
nil
user>

So in two series, 2 and 7, Craig did score a 10. That’s all good so far, the question is did Craig’s score “predict” the winner of the series?

Looking at the final for series 2, Jill and Darren did win. And for series 7, Ali and Brian didn’t win the competition but they did top the leader board for week 8 as the data shows.

What if we pick another judge?

Craig’s scores are one thing but it turns out that Darcey is a blinder with the 10’s.

user> (def darceys-data (scd/get-week-groups-for-judge :darcey strictlydata))
#'user/darceys-data
user> (scd/get-min-week darceys-data)
4
user> (scd/report-for-week :darcey "4" darceys-data)
({:series "14", :week "4", :darcey "10", :couple "Ore & Joanne"})
user>

Week four, no messing. And guess who won series 14….. Ore and Joanne.

Bruno perhaps?

user> (def brunos-data (scd/get-week-groups-for-judge :bruno strictlydata))
#'user/brunos-data
user> (scd/get-min-week brunos-data)
3
user> (scd/report-for-week :bruno "3" brunos-data)
({:series "4", :week "3", :order "11", :bruno "10", :couple "Louisa & Vincent"} {:series "13", :week "3", :order "14", :bruno "10", :couple "Jay & Aliona"})
user> (p/pprint *1)
({:series "4",
:week "3",
:order "11",
:bruno "10",
:couple "Louisa & Vincent"}
{:series "13",
:week "3",
:order "14",
:bruno "10",
:couple "Jay & Aliona"})
nil
user>

Turns out Bruno was impressed from week three. And all the better was that Jay and Aliona won series 13.

Does Craig scoring a 10 have any steer at all?

In all honesty, I think it’s very little, I mean it’s up there with a Hollywood handshake but they’re being thrown out like sandwiches at a festival now.

The earliest week that Craig scored a 10 was week 8 and only had a 50% hit rate in predicting the series winner from that score.

The judges scores only tell half the story and this is where I think things get interesting, especially in series 16, this current series. And once again it comes back down to where people are putting their money. Risk and reward.

Thomas’ question came about because Craig’s first 10 score cropped up last weekend. Ashely and Pasha get the first 40 of the series but the bookies data sees things slightly different.

Do external data forces such as social media followers have any sway and volume on the public vote? Now that’s the question I think that needs to be looked at. Joe Sugg is a YouTube personality and there’s nothing like going on social media and begging for votes for competitions and awards. So it stands to reason that Joe has a very good chance of winning the competition while being outvoted on the judges scores.

The risk of using Craig’s ten indicator as saying Ashley is going to win, well it does come with risk but increased reward. At 7/1 this is basically saying, based on previous betting movements, that there’s 12.5% chance of Ashley winning. Now only if there was a rational way of deciding…..

Get me Neumann and Morgenstern on the phone! Now! Please!

Is there a potential upside to deciding to go with Craig’s score? Let’s see if we can find out. The one book I still want for Christmas, or any other gift giving event, is The Theory of Games and Economic Behavior by John von Neumann and Oskar Morgenstern. It’s my kinda gig.

Back to Ashley, we can work out the expected utility to see if Craig’s ten and the bookies info is worth a punt.

Expected utility: You multiply the probability of winning by the potential gains and multiply the probability of losing by the potential losses. Adding the two gives you the expected utility of the gamble.


A Warning and Disclaimer

It doesn’t have to be money, I’m not encouraging you go to and place a bet with your own money. That’s your decision to make and I’m assuming no responsibility on that one. I shall, however, continue. Got that, good, now….


Within any gamble there are four elements: The potential gain, the potential loss, the chance of winning and the status quo.

The Status Quo

Forgive me, I had to, there are rules….

The status quo is the current situation we are in, which is exactly what will happen if we do not decide to participate in a gamble.

The Potential Gain

Our reward if the gamble pays off. This has to be better than the status quo.

The Potential Loss

What we lose if the gamble does not go in our favour. This should be worse than the status quo.

The Chance of Winning

The probability of the pay off, it also tells us the chance of it NOT paying off.

Ashley’s Expected Utility

With the bookies general probability of Ashley winning at 12.5% and I have a tenner in my back pocket, at 7/1 odd I’d get £80 back (£70 winnings + my original wager of £10). So I’m going to use 80 as my potential gain and 10 as my potential loss. You gain/loss numbers can be anything, it doesn’t have to be money. It’s just with these numbers in mind you have a mechanism for coming to a figure of expected utility.

The expected utility of winning is 80 multiplied by 12.5% = 10

The expected utility of losing is 10 multiplied by 87.5% = 8.75

The expected utility of the gamble is 10 – 8.75 = 1.25

As the expected utility is above zero (is greater than the status quo) then it’s worth a go. If it was below zero, down down deeper and down the status quo then you’d not want to do anything.

Interestingly Darcey’s been throwing out the 10’s to Ashley for a while. I wish I’d see the bookies odds at week six and not week eight. There may have been a more concrete expected utility to strengthen my position.

Conclusion. Well there isn’t one yet.

This series of Strictly is still raging on so we won’t know the actual outcome until 15th of December. It has been very interesting though to look at the various judge’s 10 scores and see if we can predict outcomes with additional information.

If you want to poke around the Clojure code for this post you can do.

https://github.com/jasebell/scdtens

 

Collecting Royalties Without the Middleman, a Concept for @DGMLive and David Singleton

This post is really in response to a Facebook post by David Singleton, the joy of Facebook algorithms means that I didn’t see the actual post until this morning. It’s worth a read especially if you an artist and you want to get paid fairly, there’s a link at the bottom of the page.

What I present here is a proof of concept and probably a shaky blueprint at best but hopefully it outlines some concepts that someone within the industry can run with.

I’ll take the angle of a musician but it could apply to anyone who creates something.

Everything in life is a transaction

A radio play of a song is a transaction, a YouTube video play is a transaction (this throws up a few more questions which I’ll get on to later), a concert ticket sale is a transaction…. you get the picture.

There are actors in every part of the process, some of them wield more power than others. With that imbalance of power the distribution effect can be manipulated, skewed and downright ignored. Over the years with the joys of the internet artists have tried, and rightly so, to regain control over their income and artistic rights. Being able to sell direct has been the goal, with offshoots of subscriptions, exclusive club releases and so on. And they’ve worked, on the whole, fairly well.

However, along with the rise of those types of services you still have the larger monopolies such as iTunes, Spotify and Amazon who control their own ecosystems. And with the same message as the National Lottery, and perhaps the same probability of a positive win, you have to be in it to win it. Once a large volume of consumers piles on to the platform the artist is under a certain amount of pressure to join on the fear of missing out on revenue.

One of the joys, especially for me who loves customer loyalty data, transactional data and the real time nature of these things, for me everything is a transactional data point. This includes every musician in the band, past and present, every track recorded, every concert ticket sold.  The question is how to combine all those data sources so everyone gets paid.

Scribbled before heading out of the door….

Yup it’s one of those drawings again…

Have notebook, a pen and a cup of tea. I will scribble.

I see radio stations, streaming services, direct sales and ticket sales as “consumers of the artist”, now they might not directly consume the product but merely act as a wholesaler to the listener/audient. However there is a transaction and that transaction will be recorded. Breaking it down a stage further everything is an entity and it relates with another entity.

The Band/Artist

Should I call this the brand? Perhaps I should, as an entity it’s what the end consumer/fan/audient connects with. It gets tribal, I’m a huge fan of King Crimson, St. Vincent and Level 42…. I connect with them all. I also connect with the members of each of that entity so it needs breaking down a little further.

Having the band/artist as an entity is important. Lineups of that entity can change over time, anyone who knows King Crimson well is aware of this fact, changing lineups may also mean changing publishing rights of the music and this gets important when it comes to compensating people over the long term.

The Asset

Assets of the brand, the type of entity opens up here and gets interesting. A concert is a live asset with multiple members, an album is an asset made up of assets (songs). Each asset has members that performed and wrote the pieces in question. What was once an administrative nightmare could actually be easy to manage in these digital data driven times.

The Individual Member

Who “works” for the band? Like I said above lineups change over time. Here though a member is a member. Interestingly this could be said for a solo artist. Is Annie Clark the member of the brand “St Vincent”? I think so. It also means that frees the individual up to work on other projects outside of the main brand. Collaborations therefore become measurable.

In this instance it doesn’t have to just be a musician, it could be a manager or a producer connected with the brand or artist. If you can negotiate a transactional amount then you can allocate reward over time.

A good case in point would be Nile Rogers who worked on (the asset) Like A Virgin by (the brand) Madonna. Nile waved his advance for producing the album and renegotiated his royalty on sales. My only surprise was the Nassim Taleb didn’t include a paragraph in his book “Skin In The Game“, it was the perfect example.

The Consumer

Once again, as with the asset, a consumer can take on multiple personas. It may be an organisation such as Apple, Spotify or Amazon. It might be Google/YouTube or just an average person who likes to purchase the wares.

A consumer at this point may not be the end user of the asset. This may be a wholesale transaction with a different volume of money associated to it. Multiple consumers can have different sale amounts attached to them.

An Asset Transaction

Now we get to the interesting part. The performance of a song is an asset transaction, whether it be live, recorded, streamed or just straight purchased (I still prefer a purchased CD for value for money played over the long term).

With the member attached to an asset then breaking down becomes much easier, it’s just a process at that point. Especially with a band like King Crimson when songwriting credits are spread over many people over a long period of time and many songs over many periods can be played live.

The live performing band can play Discipline knowing that it will be recorded in a ledger of some form (more on that in a moment). This record once processed knows that the writers: Adrian Belew, Bill Bruford, Robert Fripp and Tony Levin will be due some form of performance payment based on the agreed consumer value. The same goes for someone streaming the same song from Spotify for instance, the record of that transaction is saved and the members compensated accordingly, it’s just another consumer with another value attached to it.

This does mean though that every live performance set list needs to be recorded too. And yes I appreciate that whimsical flights of fancy happen when an audience member yells “House of the Rising Sun” and you launch into it. With a finalised set list pre or post performance you have a list of transactions and everything connects with each other.

Calculating Asset Wealth Distribution

Or, “How do I get my money!?”

As we know the transaction amount to a consumer for a specific asset calculations to what is owed to whom becomes just a case of mapping each transaction and ending up with an amount owed to a member.

We end up with a kind of conceptual graph of the relationship between the writers, the artist, the performed asset and the consumer.

(concert) [:performed] <- (asset) -> [:by] (brand) -> [:written_by] (members)

From there it’s purely data mining, finding out who is owed what. With everything recorded in some form of ledger, well you have something to reference. It just becomes a job of performing that function.

How and what frequency the royalties are calculated is another matter. Doing it in real time while possible is not feasible from a payment point of view. Payment transactions come with their own cost. Depending on transaction volumes a monthly, quarterly or annual run are perfectly reasonable. The calculations themselves are pretty much unique to the brand in question. What works for KC may not work for Level 42, which also may not work for St Vincent. Like I said, consumers have different agreements with the brands.

And you might have noticed at no point have I mentioned a middleman collecting the royalties. It should be done direct, peer to peer, no middleman. The reason, I’m wildly assuming, for them existing in the first place was that performance, recorded or otherwise, was just about impossible to monitor. Radio play may have been different but live performance was hard to monitor. So it was easier for every consumer (shop, public space) to pay a fee and hope it would get distributed fairly.

We need to talk about Youtube

Copyrighted material is difficult to police. YouTube adds to the computational woe in the fact that many who publish an artists work have nothing to do with the artist at all.

And it’s all very well that sharing the love of the band or that song but at the end of the day no one is really getting paid for it. Now there are certain things you might see like some copyright information and a link to purchase the track or album in question. It’s far from peer to peer transactions and it’s also far from perfect.

With the ledger we know that a video is being viewed. If it’s one of your songs is played (the asset) then it’s just another asset transaction, and because we know the connecting members of that asset, well it means they are due something from that consumer. At this point a consumer is every YouTube account that it is publishing your asset. Now that’s opened up tens, hundreds, even thousands of revenue streams.

As far as we’re concerned it’s just another data source.

Challenges, hmmm there’s a few.

Right now there many middlemen and it’s really bad business for them to be cut out of the loop. I know this from previous startups in aircraft leasing, I’m good at annoying brokers in the chain. What I’ve described above while not impossible to do, it’s just data at the end of the day, is an implementation challenge for one simple reason.

Not every player I’ve described will be on board.

At the moment the large ecosystems are telling the artist what they will pay them. Spotify streaming sales are fractions of a pence and even with a long long long tail could take years to make any decent money. Power laws come in to play here, 20% of the catalogue will (probably) make 80% of the revenue.

To get a large organisation to emit data per play is not impossible. It means that brands have to be savvy enough to pull the data in and process it. Investment in some form of venture to handle all this must happen first.

To decentralise the whole royalty payment system out of a few power brokers, well that’s interesting and risky. At this point you become the main broker of performance data (does that not already exist?). The power merely shifts from one place to another.

Decentralisation is hard (and no one dare mention the word Blockchain to me). Implementation to each partner is hard, time consuming and usually too technical for the lay person to understand.

Live performances I’ve already touched on, a system to record concert performances with a list of assets so it can be processed with a result of who exactly gets paid what. Once again all doable, but who are the partners that do all this, is it the band before performance? Is it the venue? Is there a gig register? Cover bands at this point should be worried if you’re filing set lists you’ll be paying everyone.

Concluding….

There is a lot covered here, some ideas are worth fleshing out and some ideas would take so long to implement. There are trade offs and some parts of the data model are easier to execute than others.

Back to my main point though, once the concepts are broken down and everything becomes a transaction then it’s easy to figure who is supposed to get what. And to reduce disputes, as David’s post was getting at, then you need a transaction for everything to do with an asset. After that it’s just accounting.

Getting all the players on board is an entirely different conversation.

Now where do I send my money when I belt out Sartori In Tangier on my Chapman Stick when I’m at home? It’s a live performance after all. 🙂

For those that got this far…. here’s David’s post.

 

 

Please don’t feel pressured to do a tech talk to further your career.

The tech talk seems, to me, to have become the new church of tech especially when it comes to meetups. I’m not sure if it’s the fear of missing out (FOMO) or the belief that it going to make a big difference in your career but everyone I’ve come across in the early stages of their career seem to think they have to do one.

Dear reader, I have some information for you: you don’t have to do one.

I wager 99% of the tech community don’t do talks

Honestly, they’re happier doing their job during the day and going home in the evening. At weekends they pursue their hobbies and interests and get on with life. At the end of the month they pick up their payslip and repeat the next month and so on.

And you know what, that’s fine. They may stay at home and learn some new stuff, in their time, or push some code they’ve done to a github repo. It’s still learning and sharing, it just doesn’t require hauling backside to a room of warm beer and pizza.

For the record, my career started in 1988 and it wasn’t until 20 years later I even considered doing a talk. The only reason I started was new location, no network and I needed to make one fast. So I did a Barcamp in 2009, a lot of things happened after that. After that I’ve never done a talk because of pressure either by myself or anyone else. I did them because I enjoyed them, they’re a performance and I like a stage. I think I know where it comes from…..

 

So if anyone says you have to do a tech talk to get anywhere in this career, they’re talking shite.

If you feel you want to do one, then by all means. Just don’t let other people sway you (unless you really want the drama and attention).

Getting started, if you really must

Small talks are fine to get started, meetups and developer events are all usually good. If you are going to present though make sure you know your stuff 100%. The nice thing with these events is that it’s not normally a huge process to get accepted, just ask and say you want to do something.

You are there to teach, once you understand that then you are unstoppable. Teach something that goes beyond the getting started documentation of a framework for example, it needs to be more than that, a lot more than that. Another walkthrough from the same chatbot framework will drive me mad.

Expect questions and answers at the end. While it’s okay not to know everything people are giving you their time so you can impart knowledge to them, they will have questions.

From experience people are friendly and want to see you do well so Q&A tends to be a nice humane affair. However you do get the odd one, one Barcamp talk in 2010 I was asked, “What time are you finishing, I’ve not learned anything I thought I would“, this kinda knocked me off course but somehow I brought it back. What was a picture was the rest of the audience who were pretty shocked. The chap got ejected from the conference not long after. You do get them, just not very often.

Personally small developer meetups are not for me, they’re not my thing. Location is not on my side either, it’s 140 mile round trip for me, finish work early to drive and then to drive back again. And the beer is off limits as I’m driving. That’s me and that’s fair enough. I also prefer something a little bigger.

Conferences and proposals are a long game

This photo by Ellen Friedman means a lot to me. This isn’t just me doing a talk, this is five years of ideas, proposal writing, tweaking, talking to folk and figuring out how to get my points across. The Strata Conference Call For Proposals (CFP) is competitive to say the least.

And I learned an important factor, writing proposals for this level of conference is not what you want to talk about but what you feel is relevant to the audience at this point in time. Once I sussed that I had an acceptance and a wait listed talk.

I didn’t want to further my career, I just wanted to talk at my favourite conference. There was no master game plan. And I was nervous and I made the fatal mistake while I was preparing my slides, I took all the personality out of it. Big mistake.

Attendees like a personality otherwise it becomes a trudge through talk after talk of very nice people, sometimes you need something to wake you up. I took out all my humour when I was preparing the slides and when I run them past my boss (possibly the very first time I’d done that ever) he told me to put the Jase schtick back in.

Finding out I was last on, “Jase, you’ve got the graveyard shift, how do you feel?”, “Oh I’m fine with it, means I can wake them up.”

I’m glad I listened to my boss. Watching a Strata talk audience do a Mexican wave was rather funny and they enjoyed it. Yes I did do that.

I should try that at ClojureX.

The main thing to remember though….

I’ve always been in control of a number of aspects:

  • Whether I wanted to do the talk. Never forced, I could say no. And I say no a lot of times during the course of a year.
  • I control the content, no employer has told me what to talk about. (and sometimes the content is secondary, it’s airtime for the brand on stage).
  • I was me. And that comes with heckling, controversy and pragmatism. Not everyone is going to agree with what I say, thank goodness. I got boo’d at Big Data Belfast for saying Python wasn’t much cop to use 🙂
  • I’ve not done it to further my career, I just like to share information. “This worked for me, it might work for you”. It’s about giving more information that you receive.

If you feel the slightest bit uncomfortable about doing a talk then seriously don’t do it. There are other avenues to share information, it might be a blog, a podcast or a video. Whatever works best for you, then go for it.

In all honesty the blog and Twitter did more for my career than meetups and conferences.

Too small to #Kafka but too big to wait: Really simple streaming in #Clojure. #queues #pubsub #activemq #rabbitmq

In days gone by businesses proclaimed “we’re gonna do Hadoop Jase!”, no word of a lie, they used to phone me up and tell me so…… my response was fairly standard.

Now the world has gone streaming-based-full-on-kooki-mad because, “Jase we need the real time, we’re all about real time now, our customers demand realtime*

* They probably don’t but hey….

The Problem With Kafka….

Well, in reality it’s not really a problem you just have to appreciate what Kafka was designed for. Kafka was designed for high volume message processing. It’s the Unix pipe on steroids, more steroids and a few more on top of that. When I say “high volume” I mean Linkedin billions of messages high volume.

And to use Kafka properly you need machines, plural. Three beefy servers for each of the Kafka brokers (you need at least three for leader election if the master broker pops off for some reason), then another three for Zookeeper because the last thing you need on earth is Zookeeper dying on you otherwise everything dies. And really if you’re on pager duty that’s the last thing you need.

Call me psychic, I know exactly what you’re thinking…..

Why not use Kinesis, Jase! Yer big ninny!

Well yeah, but no. There are two reasons, firstly I’m tight and I hate spending money whether it be mine or other people’s. “But we got VC funding so we run $5k/month on servers”, it doesn’t wash with me. You’ll run out of money. My other issue is more technical.

Kinesis is fine for things but my heart just doesn’t settle for the whole five consumers per shard malarky. Kinesis has some nice things but performance and consumer scaling are not two of them.

So, what to do?

Something a little more, manageable oh and smaller.

So I asked this question to the wider community (ie everyone) and I got one response from a Mrs Trellis of North Wales, well actually it was Bruce Durling, suggesting Factual’s Durable Queue. I wasn’t aware of it’s existence….

The disk based bit is important, very important. If the stream dies at any point I need to know that it will pick up messages from the point it died. Kafka does it, RabbitMQ does it and the others do it, so I need this to happen.

Durable Queue uses core.async to do it’s bidding. And it’s easy to put a stream or a number of streams together.

A Quick Demo Project

Add the Dependencies

First of all we need to add a few dependencies to a project.

[com.taoensso/timbre "4.10.0"]
[factual/durable-queue "0.1.5"]
[org.clojure/core.async "0.4.474"]

Adding the required namespaces to the namespace.

Let’s create a simple queue. Add the required dependencies in the namespace. You’ll need the Durable Queue and Core.Async namespaces, I’ve added the Taoensso Timbre logging as it’s my preferred way of logging.

(:require [durable-queue :refer :all]
          [clojure.core.async :refer [go-loop <! timeout]]
          [taoensso.timbre :as log])

Create a Disk Based Queue

Our actual queue is easy to create, pick a directory to write your queue to and you’re done. I see this in the same way as a Kafka topic, each file is basically your topic queue.

(def my-test-queue (queues "/tmp/" {}))

Define the Function The Does The Processing of Each Message

Think of this as your streaming api app, where the magic happens so to speak. This is the function that processes every message that’s been taken from the queue.

(defn the-function-that-does-the-work [m]
  (log/info "Got the message! - " m))

Put a Message On the Queue

We need to be able to add messages to the queue. The key is the key of the queue, further on down when we look at the queue loop itself, the reason why will become clear). The value, well I like passing Clojure maps of information but in reality it can be whatever you wish.

(defn send-something-to-queue [k v]
  (put! my-test-queue k v))

Taking Messages From the Queue

Now we’ve got something to put messages on the queue, we need something to take them off. Basically this is take the message from the queue with the same key as we passed in with the send-something-to-queue function. The messsage needs dereferencing so we get the true payload.

(defn take-from-queue [k]
  (let [msgkey (take! my-test-queue k)]
    (do
      (the-function-that-does-the-work (deref msgkey))
      (complete! msgkey))))

Also worth noting that we have to manually say that we’ve completed the work on that message with the complete! function. If this doesn’t happen then the message is still classed as queued and if you restart the queue (it dies for example) then that same message will get processed again.

The Queue Loop

I’m going to run a core.async go-loop against this function. If there are messages in the queue to be processed then go and get the next one.

(defn run-queue-loop []
  (if (not= 0 (:enqueued (stats my-test-queue)))
  (take-from-queue :id)))

And finally a core async go-loop to keep everything going. A timeout of one second just to keep things sane.

(go-loop []
  (do
    (run-queue-loop)
    (<! (timeout 1000))
    (recur)))

Testing the Demo

You could either run this from the REPL (it works a treat) or run it standalone, though I didn’t provide a -main function in the repo, you’re happy to add if you wish. I’ve created two topic queues, q1 and q2, and will send a number of messages (nrange) to them. I’m just doling out randomly decided topic destination at this point. Mainly doing this to prove that multiple topic queues do work.

(defn run-queue-demo [nrange]
  (map (fn [i]
    (if (= 0 (rand-nth [0 1]))
      (q1/send-something-to-queue :id
        {:uuid (str (java.util.UUID/randomUUID))})
      (q2/send-something-to-queue :id
        {:uuid (str (java.util.UUID/randomUUID))}))) (range 1 nrange)))

When I run this in the REPL with 10 messages, I get the following output. You can see messages going to each queue and as we can see the output of the processing function we know the stream is working.

fractal.queue.demo> (run-queue-demo 10)
(true true true true true true true true true)
18-11-05 07:35:12 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "c55cc0f8-54ff-47ca-81a2-858af68f47b2"}
18-11-05 07:35:12 Jasons-MacBook-Pro.local INFO [fractal.queue.mainqueue:10] - Got the message! - {:uuid "56d83928-0019-4d58-bfc0-af2dbbf625b0"}
18-11-05 07:35:13 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "875dc578-b32c-4548-ac06-51ab9ef93d41"}
18-11-05 07:35:13 Jasons-MacBook-Pro.local INFO [fractal.queue.mainqueue:10] - Got the message! - {:uuid "0315c4e3-ba6b-4533-ac41-62293038da30"}
18-11-05 07:35:14 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "f522deb6-968c-44a1-a042-2f312f4d314b"}
18-11-05 07:35:14 Jasons-MacBook-Pro.local INFO [fractal.queue.mainqueue:10] - Got the message! - {:uuid "bc7fc7e5-aaf2-4226-871b-31f31199356c"}
18-11-05 07:35:15 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "b5a892fd-f516-4133-991b-539f8b8477be"}
18-11-05 07:35:16 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "6ae167b2-c7a3-42ec-87eb-14b629348a21"}
18-11-05 07:35:17 Jasons-MacBook-Pro.local INFO [fractal.queue.secondqueue:10] - Got the message on second queue! - {:uuid "2f97c442-31ca-495e-b1b6-b7019917d504"}

Another nice feature of the durable queue is that you pull up stats on the queue. The num-slabs is the number of files the queue uses.

fractal.queue.demo> (durable-queue/stats q1/my-test-queue)
{"id" {:num-slabs 1, :num-active-slabs 1, :enqueued 3, :retried 0, :completed 3, :in-progress 0}}

Conclusion

A basic topic based queue system, a bit like Kafka but without all the overheads. From a Clojure perspective it’s a great use of core.async and looking at the throughput figures on Factual’s github page it’s a system I’m happy in using until I’m at a point where I really do need Kafka (and all those machines).

There’s a Github repo of the demo code here.

Update

So, I was half asleep when I wrote this, it’s Factual not Fractal. Thanks to Mrs Trellis for the headsup…..

Enabling JMX Metrics in Kafka – #kafka #jmx #monitoring #streaming #data

Something that doesn’t get talked about much when you’re starting out with Kafka is monitoring. Most initial thoughts go into topic log timeouts and maximum message sizes.

JMX is your friend

Kafka emits a lot of JMX information, this gives us a very good indication of the internal measurements that are happening. To tap in to this gold mine of information we need to do some very basic settings (well, it’s one setting) to get this information.

Ladies and Gentlemen, start your brokers.

Let’s get this show on the road, it’s very easy.

First of all let’s startup Zookeeper.

$ $KAFKA_HOME/bin/zookeeper-server-start.sh config/zookeeper.properties

Open another terminal window tab. Before you start your server, export the JMX_PORTenvironment setting and use a free port, I’m going to use 9999.

$ export JMX_PORT=9999

Once this is set you can start the Kafka server.

$ bin/kafka-server-start.sh config/server.properties

Last thing to do is test that our JMX setup is working properly. I’m just using the plain old, but handy all the same, jconsolefrom the Java Development Kit.

$ jconsole localhost:9999

You will see the JConsole window appear, now if you click on the MBeans tab you’ll see all the Kafka JMX metrics. One you should really be keeping an eye on is kafka.server:type=ReplicaManager,name=UnderReplicatedPartitionsbut I’ll talk more on that another time.

More Kafka Posts

Topic level settings you can’t ignore.

Monitoring Consumer Offsets.

Quick Recipe for Kafka Streams in Clojure.

Talking #Kafka and #DeepLearning at #BigDataBelfast

A very quick heads up, I’ll be talking at Big Data Belfast on Thursday 18th October. The talk is an walkthrough of a complete system that creates automated learning system using Kafka and DeepLearning4J.

You don’t need any programming knowledge, I’ll be explaining everything in English. The idea is to show what a full system is like from data acquisition through to predictions.

Not everything in the AI and Machine Learning space is a basic TensorFlow program written in Python with a very limited set of data….. 😉

 

Belfast we need to talk about Norwegian Airlines – #travel #aviation #airports #northernireland

The northern ireland press lost it’s plop again….

The last 24 hours has been along the lines of OMG! Belfast to New York flight ends at the end of October…… but hey.

And the bet I made with myself at Routes Conference came to pass. Simply because when you have a lot of route experts in the room, well you go talk to them in the breaks, that’s when the conversations happen. I really should have put money on it.

It was going to happen…..

…..deep down we all knew it was going to happen because….. well simple supply and demand. There’s little demand so there was no point Norwegian stepping in to save a Belfast to New York route. The writing was on the wall when United ceased operations. When a £9m bailout is hastily approved (although illegal) you have to ask the question as to why when “Go to Iceland, see the sights for a day and then go to New York” was a better option, especially if you like rancid shark.

With < 20,000 passengers a year capacity (two fights a week and 189 passenger capatcity in a 737MAX) and a wafer thin profit margin, I’m surprised it flew at all. Load factors need to be in the mid 90’s to make a profit and with Norwegian’s IRO 88% load factor there’s a good probability it was loss making before it started.

Sadly I don’t have the actual numbers in my hand….. and any argument about airline passenger duty, well I’ve covered that before. A partially pointless exercise.

As far as the actual aviation press goes, and not BBC Newsline or the BelTel, this was a complete non story, no one has yet to cover it that I’ve seen.

Dear Belfast, Dublin Airport is your hub

And while this is going to be an unpopular opinion, your hub to the rest of the world is basically Dublin. And when it comes to US flights then hands down it is Dublin as the immigration pre clearance just makes like easier. And that’s what customers want oh and shopping, we all need the shopping, it’s emotionally programmed in to us.

Belfast is not a well connected airport in the grand scheme of things if you reverse the route. Coming in from New York you have a better hub from Dublin,

Cheap fares are not always cheap either. Once you add luggage, seat options and food how much are you actually saving compared to a “traditional” airline?

A number of airlines wanted to do Atlantic hops for years and when Ryanair were eyeing up Aer Lingus in times go by the aviation press though it was for transatlantic routes, it was never on their radar. This was happening around the same time that business class airlines were doing Atlantic routes, they didn’t last long, lack of demand. Sooooo……

The sad fact for Belfast types is this, you can drive to Dublin Airport in 100 minutes or so, or you could get the Aircoach or that other thing that Translink do to get you there.

Feed into Dublin

So we’ve established there’s a feed into Dublin Airport from Belfast (your car, train or private jet if you’ve got the money). Derry is another issue, and this is where it actually needs a route from City of Derry Airport in to Dublin Airport.

Now the tattered tale of Citywing’s flight into Dublin, whether it was planned or not, I’ve spoken about before. It is though something that does need to happen.

Now there’s nothing to stop that flight nipping down via Belfast and doing a pick up and then onto Dublin for pre clearance in to New York. The airports have to connect in a sensible way. Belfast International offering flights to the US is fine if they are self sustaining, which as it stands they are not.

To Conclude….

Loss making airlines don’t hang about once the subsidy has been spent. And that goes for any airport, not just Belfast.

I won’t leave it all bad, here’s a list of airports that do want to land at BFS: Aberdeen International, Billund, Brussels, Cambridge, Cologne, Faro (again), London Oxford, Munich, Murcia and Shannon (LAND IN SHANNON! Hint!).

 

Startups: Dare yourself to be scrappy again. #startups #hustle #product #b2b #b2c

I’ve noticed that people get tetchy and nervous when I speak my mind.

Good.

And I for one, well I make no apologies. It’s needed me to binge watch the entire collection of “Halt and Catch Fire” (You can watch it now on Amazon Prime and Netflix US) to remind me how much I love being in a scrappy startup.  Introduced to me by a friend, “I cannot believe you don’t know about this!”, he suggested I watch it immediately, he was 100% right.

Scrappy startups are wonderful, exciting and should have you on the edge of your seat. How quick, how “get it out there” can it be done? Keep it in stealth, no one need know, not just yet anyway. Too many times startups are just announcements. I’ve been there I’m guilty of it too.

Startups that can’t be built until someone passes the money across are essentially dead on arrival in my opinion. “We’re waiting on POC funding before we build” basically says, we have no one who can, or is willing to, code. Be scrappy, buy a book and get on with it.

I love the scrappy, hurried and out there as quick as you can ideas. I miss the “WTF let’s try this!”. I miss the white board sessions in Santa Clara back in 1999, I miss the 11pm curry that would get us through the milestone…. though I’m not sure what my doctor would say now.

There are 24 hour in a day, use them (and reserve eight of them for sleep). Too much cold pizza and warm beer networking sessions keep you away from building, no point attending until you’ve built.

HaCF left me in tears at the end for various reasons. The one thing it did impress on me then most, I love being in a scrappy startup that just doesn’t care what the outcome is, we just tried. And revolving around it all is relationships, it’s not about the tech, it’s about how the tech brings the people together, whether that’s users, investors or the team itself. “We’re building the thing to get to the thing….”

The current state of the startup scene is all to safe, it’s all too samey, it’s about incubators and accelerators, the questionable story tellers and the ideas folk and their blue sky funding. It’s about accounting firms suddenly with startup areas, it’s not about what they have, your startup is about what you have and what’s in your heart. My thoughts here are hardly new, I wrote about it four years ago in “Startups: The Passion and the Paradox“.

And it all starts off with an idea and being scrappy.

Here’s to being scrappy, it’s nice to be back.

I can tell more about your company by how you offer #bacon – #startups #business #meetings

Within a minute I can tell how a meeting and a long terms business relationship is going to go, especially where bacon is concerned.

Can I get you a drink?

Tea, coffee and water are a given. Everyone needs at least one of these things to function. So as a device for meeting and business success prediction it’s fairly weak.

Bacon however changes all that, there’s currency involved and money has been spent.

The Meeting

Proposals were put forward by management, multiple phone calls on whether the company could perform such an operation. All minds put at rest and a date for a pitch set. As an employee I get the call, can I fly over to England to help out on the pitch from a technical viewpoint. Flights booked, info got and meet at the hotel for lunch, then an afternoon planning for the pitch the following morning.

It’s not a small client, this is a household name. And on the morning driving up to the offices you get a sense of scale. Five of us arrive in reception, told to wait and then assistance arrives. Walking through oak paneled corridors you get a sense on the money sloshing about in the industry they’re the leader in.

First thing I and another tech colleague spot is a large platter of bacon baps (the word sandwich gives the sense of white bread slices, it is not this at all, they are baps). “We will do well here” was the general feeling on seeing at least fifty fresh bacon breakfast treats, just as well as breakfast was skipped for a final pitch run through.

The team were assigned to the far side of the table, maximum distance from the bacon platter. “It’ll be okay, they’ll be offered around”, as more staff filed in and sat down nearer the bacon. As people sat down they all passed the platter and picked up a bap and tucked in, the unwritten rules were in play. And while the tea and coffee was poured out the meeting started and at that point I knew the meeting was going to lead to problems.

Three Hours Later

The pitch started, finished and a long drawn out question and answer session continued. We’d had our one cup of tea and the bacon mountain had hardly moved. If a client can’t offer you a bacon bap and extend an arm of confidence, trust or bacon then I will question the long term plan of the client.

As the meeting concluded there were a lot of handshakes (15 client representative, you can work out the combinations for a team of five) and nods of heads, small talk and a large platter of untouched, cold and destined for the bin, bacon baps.

Myself and my colleague gave the platter a final look and as we walked out of the reception area on to the street I said to them, “This is not going to go how we want it to go.”.

I was 100% on the money. Ropey specifications, holding on to information, internal politics like I’d never witnessed before, manipulation of third parties – it very nearly killed the supplier I worked for  and other suppliers too (some others being household names too).

It doesn’t have to be bacon

For me those first meetings tell me everything and I know it’s been documented a thousand times over. I’ve never seen a bacon bap platter since so my focus will go on something else. I think Cloudera were right though, “Data is the new bacon”, bacon taught me an awful lot of decision process, meeting psychology and staff placement in a meeting room. It’s like wedding planning but with more bacon…..