Time For A Change? #algorithms #ml #ai #business #data


, , , , , , ,

My first WordPress blog post was back in October 2008, my first blog post goes back way before then in 2000. Twenty-one years of knowledge, insight and, let’s be honest, nonsense.

While most folk know me for the big data thing, whether that’s Hadoop, Spark, Kafka or Pulsar. What I want to do now is focus on the fun stuff again, the side projects that make me tick. So it’s really about data and insight. As my day job is Kafka, Hadoop and Pulsar with the brilliant company Digitalis I’ll keep those kinds of post for them.

With lockdowns, dodgy information, increased anxiety and a complete lack of sleep, well the last couple of months have only reminded me that data is sexy, and the incline to a truly algorithmic business is worth looking into in my spare time.

So, if you want stuff on algorithms, data and artificial intelligence in retail, it’ll be here. Kafka and the lark, that will be with my employer. Time for tea I think……


Tales from the Frontline of Kafka – #kafka #confluent #podcast


, ,

Listen now >>>>> https://developer.confluent.io/podcast/tales-from-the-frontline-of-apache-kafka-devops-ft-jason-bell

The podcast has been up a while now but I’ve been busy. It was definitely fun to do. There’s a lot of opinion especially about how teams, developers and others embrace Kafka.

We also confirmed that Dave Klein is not omnipresent, I was convinced that he was.

Kafka Is A Team Sport Redux – #kafka #apachekafka #streamingdata #data #confluentkafka

The feedback from Saturday’s blog post Kafka Is A Team Sport was very positive. I still felt I hadn’t quite got my point across though, well not clearly enough. Then it hit me.

The Three Functions

I see the current ecosystem as three distinct parts. Development, DevOps Engineering and Data Engineering. They all have different functions and, in my eyes, operate differently.

Excuse the Apple Pencil, they don’t allow sharp objects in here….. and it was early and I’ve only had one cup of tea.


The application development area. The Client APIs whether that’s Java, Go, Python, Clojure, PHP, C++ or anything else I can think of. The main thing is that there is a development task that needs writing. It might be a producer, a consumer, a streaming job or even a KSQL query.

I honestly don’t need to care how Kafka works at this point, I just want to get a message or read a message. And I can’t program in any of those languages I have the REST proxy to help me too. HTTP still has it’s place in the world, just be careful with the security.


Jobs that require setting up for things like Kafka Connect and Replicator/Mirror Maker are more tooling jobs. When I say tooling I really mean “sorting out configuration” and sending it over REST (especially with Connect). Do I need to know how to program? No, not really.


Brokers, latency, capacity planning and (oh no!) Zookeeper, are admin functions, or DevOps if you will. I’d never ask a software developer get into the minute detail of the server.properties file or look at volume throughput. In the same way I’d never ask them to tune a PostgreSQL database (in fact I wouldn’t touch that either, I’ve no idea).


Kafka Is A Team Sport – #kafka #apachekafka #streamingdata #data #confluentkafka


, , , ,

One of the nice things about being an early adopter is watching things grow. Another is observing how that adoption happens from organisation to organisation. The questions I get tend to be varied and in the oddest of locations, doing an impromptu Q&A at London City Airport is still a highlight of data related daftness.

The last few weeks have been all about Kafka, well it is my job, and it’s also my sideline hobby. Giving talks is always fun for me. Over the last couple of weeks I’ve presented at a couple of meetups and also done a full podcast interview with Tim Berglund for the Confluent Streaming Audio podcast. It finished off my presentations for 2020 perfectly.

Something that came out of all three events though was the amount of knowledge someone needs about the ecosystem.

Developer, Devops, Support or Something Else?

The question is how do we come to Kafka in the first place? For me it was, “Can you look after this for me please?” at work a few years ago. Did I know how it worked, I knew some concepts of streaming from using ActiveMQ and RabbitMQ but how Kafka actually worked, no not really.

What I was being asked to do was make sure that the cluster didn’t die and cause issues for the customer. That’s different, that’s a support function.

The opportunity though was there to learn about how it works and that’s where I spent me time. So I spend time coding up basic producers and consumers to see how it all worked. At that time the ecosystem wasn’t really there, it was basic. Producers wrote to the broker(s) and consumers read from them. That was pretty much it.

Does a developer need to know about producer/consumer throughput? No, not really. There’s a huge difference between needing to know and wanting to know. Most developers I know want to get the job done.

It’s All About The Organisation

Startups tend to be scrappy, they build stuff and download frameworks and just get on with things. So there’s a good chance that the knowledge is combined in to a small pool of people.

When it comes to more “traditional” businesses and the employee size goes up to hundreds or thousands, then that’s where things become interesting. Silos begin to take hold (and this is no bad thing, it all depends on the organisation) and the knowledge share is limited. Many times there’s a dividing wall between a developer and the framework in question.

A developer may be asked to write an application to “write messages to Kafka” but after that there’s no real need to know how Kafka works. Take some sample, alter it to your means and off it goes for testing, QA and deployment.

Does a developer need to know KSQL internals? No. Kafka Connect’s partition/thread balancing act? No, not really. In my eyes the developer doesn’t explicitly need to know the framework just merely know the accepted requirements for the application.

If I were a betting man and I asked a developer if they knew about the ProducerInterceptor class, I’m wagering I’d get blank looks. I personally think it’s pretty important but that’s merely my opinion.

Who Needs to Know What. Really?

Kafka becomes a team sport. Developers write applications so I’d expect them to know how the Client APIs work and how to subscribe and send messages to the Rest PROXY (if it’s active).

A working knowledge of Schema Registry, possible not. Some clusters quite merrily run on raw strings and JSON payloads. If you want to take things seriously and use Avro then you start getting to schemas.

Kafka Connect can be an art form in itself and I wouldn’t expect a developer to be involved unless they were writing a custom connector. Deployment of connectors is more than likely a DevOps like function then it’s a monitoring task. And lastly does a developer need to know broker SSL and how brokers really work? To get an application working, no so much. If anything it should be pointing at certificates in properties.

So, as a developer do I need to know everything about Kafka?



Tesco Clubcard: #customerloyalty is dead! It’s all about membership! #clubcard #tesco #loyalty #data #datamining #algorithms


, , , , , ,

(First things first, four months since my last post…. apologies, just a lot of things going on.)

The “rona” has done a lot of things to retail, it’s also done an awful lot of things to the economy. The UK furlough scheme reduced incomes and made customers very very very price sensitive. And today is the last day of the original furlough scheme as it stands, the next few months will be sadly even more worrying for some.

For Tesco it was also being forced from the likes of Lidl and Aldi on certain produce, fresh fruit and vegetable prices dropped in a wave of trying to keep up with the new kids in town. Consumers obviously liked it and certainly needed it.

All the while during the Covid pandemic, customer loyalty was going through it’s own problems.

For me, having worked in customer loyalty data mining, recommendation systems and other loyalty card related work (yes, I can bore you to tears with this stuff, oddly enough it’s still one of the most requested topics I’m asked to talk about). The loyalty shift was quick and dramatic, but we first need to see where we were before the lockdowns began.

Four Christmases a Year

This was the mantra of DunnHumby/Tesco since the Clubcard was introduced in 1995, while the world was generally losing it’s knickers over algorithms, Tesco have been calmly doing it for 25 years and we gave them permission to do it. Each quarter was when you got your Clubcard statement through with the vouchers that reflected your points accrual.

If you bought specific items that Tesco wanted you to buy, then you got more points. It was a trade off, if you want money off and better deals then you have to tow the line. And oh boy we did. It was a behavioural science playbook and it was executed perfectly most of the time.

It was a task and a half to get these coupons to households, a full security operation too with secured vehicles and secondary dummy vehicles and so on. Your loyalty was guarded better than some bad’uns who were locked up. Why? Because coupons have a monetary value, albeit very small, it’s still currency and it’s accountable.

The reason for the quarterly coupon drop, well that’s how long it took to mine the data back in the day. When you’re dealing with a firehose of basket data, well it takes a bit of time to get through it all. Back in 1995 it was only 10% of the baskets that were measured this gave out some really wobbly recommendations and money off coupons to customers….. so once the scale was there, they did everything they could.

And I LOVED that concept, I still do. I’ve been down a customer loyalty data rabbit hole ever since. For it be my happy place…..

Clubcard. Why do retailers invest in loyalty programmes? Key goal is to increase customer loyalty. Champion. All customers can be placed at some point in this 3-D cube. A customer’s location in the cube suggests actions suitable to earn his/ her lifelong loyalty. Contribution: profitability today. Commitment: future value. – likelihood of remaining a customer. – ‘headroom’ Championing: ambassador. Commitment. Consumer. Contribution. Scoring Points (2003), Humby et al.

While the cube is a simple concept, the practicalities of wading through the data efficiently and the computational nature of each customer (which segments get which offers) is not. It takes time and the figuring out who should get what, it also costs money (computing power and so on) to do so the payoff for the retailer has to be worth it.

I Wish It Could Be Christmas Everyday

(Everything from here is basically opinion, if I’m wrong then please tell me).

Once lockdown restrictions started to ease and we weren’t kettled through the supermarket one way system (and the joy of going around in circles to get back to where the vinegar was because you walked past it), the Clubcard sprouted a new set of wings. Discounts in store.


“Clubcard Prices”, or one price for the “regular” customer and a discounted price for Clubcard holders. This shifts the whole concept of what the Clubcard is and what it actually stood for, which was loyalty. Now it stands for, in my eyes, membership. Just by holding the card now means that you are entitled to discounts on certain items when passed through the checkout. Well that’s how it could be perceived but more on that in a moment.

Now it’s no secret that Tesco need to save money and have some serious competition from the likes of Aldi and Lidl. Store visits would have gone down, getting delivery slots during Covid was increasingly difficult depending on where you lived and choice was drastically reduced. The perceived value of holding the card then starts to decrease. Add to that the costs Tesco encounter of processing Clubcard data, it is now pretty expensive, between data centres, software licensing costs and salaries it’s an expensive operation to maintain.

Is this now meaningless to the customer?

So, discounting at source, no longer caring what you’re really buying, that’s the future of the Clubcard. Well that’s partly true, there’s no emphasis on buying certain items or attempting to boost your points balance, the nudge unit has stepped back. As there’s little in the way of physical coupons (unless it’s a promotion with another brand) then there’s little going on in the nudge department either. Remember the Clubcard was about changing behaviour, getting you to buy items you may not have considered.

Linking point of sale (POS) data to your basket and card number (and social/economic data by postcode), that continues as it’s a good earner for Tesco on the supplier side of the equation. Those reports are gold dust still for brands. For the customer though, the emphasis on eagerly waiting for coupons to arrive every three months, well that’s gone now. The Clubcard serves as a means to get perceived discounts and, in my opinion, little else.

And no, I think it has nothing to do with GDPR. We agreed to a contract, points and behaviour for money off and offers. We ticked the boxes and have to be responsible for our actions and the consequences thereon.

So, What’s the Endgame?

Bums on seats…. so to speak. It’s about attention.

From where I’m sat, Tesco are in a mad race to get more customer to signup to the Clubcard, the enticement is staring you in the face in-store, discounts. And some of them are quite compelling especially if you like your spirits.


Do you want to save 35.9% on a bottle of whiskey (yeah I know, the e in whiskey… sue me 🙂 ), you’d be a fool not to be tempted by such a deal and signup to the Clubcard to get it, especially with a Covid Christmas approaching at velocity.

Now, as far as I’m concerned, nothing is ever straightforward working on the theory of “no such thing as a free lunch”. Where there’s a discount on one side there’s a price increase on the other. And when you have 45,000 line items you can easily ramp up 1 or 2p per line item and preserve the bottom line.

There’s always a trade off and think Tesco have done the right thing here. My only thought is that this may really signal the end of the what the Clubcard really stood for, and that was loyalty.


Selling API Access – Three Steps to Customer Success – #api #programming #software #business #data

Application Programming Interfaces (API) are is a method for developers and website owners to connect their product to another product. For example, think about all the tools that use Twitter, Instagram and Facebook to get a customer’s data, developers will more than likely be using the respective API to connect and access the data.

I’ve used API’s in one way or another over my career, I’ve created APIs that have enabled businesses to access my product data. The majority of the time this access comes for free, it’s a win-win for the developer and the source product owner. Sometimes, however, APIs are accessible for a fee.

Fees for APIs come in various forms but the common ones are:

  • Cost per call (e.g. $0.001/per API call)
  • Monthly fee ($25 per month for a number of calls, this may be tiered to higher prices depending on the volume of access you are creating).
  • Credits (Various API calls have a certain amount of credits per call, credits are bought in advance on either a monthly or pay-as-you-go basis).

This last month I’ve been working on an idea, it required a specific API and it was going to cost me money to do so. So hunting around for data sources I found one that would fit the bill, the price was great and looked like it was discounted. So hitting the on PayPal subscribe button I paid my dues and got to work……

I got the data I was looking for and I’d only used less than 1% of my monthly quota. Come the end of the first period I wondered when the next payment was going out. I knew I’d get a PayPal email so wasn’t in a huge hurry to find out. What I read took me by surprise as I’d been charged nearly twenty-five times the price I was expecting…. surely a mistake. I go back to the website, then I saw the small print. The first month excellent price was the trial rate, after which I’d have to give a number of days notice to cancel the trial.

I’m not here to knock the company, I’m not even going to name them. Ultimately I’m responsible for my actions here. It did however get me thinking, if it was my company what would I do when charging for API access. Three things come to mind and I’d like to share them with you.

Confirm The Terms With a Checkbox

If your service depends on PayPal subscriptions, or Stripe and other payment providers, then make sure your customer is crystal clear about what they are about to sign up to. Terms and conditions of the transaction are upfront and clear, the trial cost and then the ongoing costs. Be very clear about trial rates and what will happen and the future charges involved. Don’t worry if this becomes another step in the process, your customer will thank you for this now rather than angry or rant like emails after the trial is over.

Send a Reminder

Give your customer time to cancel but don’t expect them to remember they signed up. Customer success revolves around good communication. An email ten days prior to the end of the trial period and the increase in the fee is only good manners, no one likes to get caught out. At this point you can re-connect with your customer, remind them they are signed up and the trial period is coming to an end. Be clear that the price is going up and if they wish to cancel then this is the time to do it so they are not charged by surprise.

This ends up being a two way thing, the customer might have questions, they might have feedback. The positives of a good reminder are huge here, there is so much opportunity to create better customer service, a better product and a better experience.

Measure the Usage

Most API access comes in tiered usage billed on a monthly basis. The first 1,000 calls might be free, the next 10,000 will cost an amount and the cost increases for each usage band a customer uses. Sometimes a customer can signup for access and go all in for the highest tier and hardly use it.

Once again, email wins here. Now there’s an opportunity to connect with the customer again. “We see you’ve only used 1% of your access allowance.” prior to the renewal time or next subscription payment. This kind of calculation shouldn’t be a difficult one, with the amount of tools out there on the market now, even a spreadsheet will do, you can create a lot of value by either encouraging the user to use the API more or say that you will charge a lower tier amount as it looks like they will never get to that level.

The customer touch points here are a big deal, it means they know that you care about being fair with their business, you want to help them succeed and in turn there’s a good probability they will recommend your service to other people.


Regardless of how your service works, creating multiple touch points for customer success will only do your company good in the long run. The customer will feel valued and there’s an increased chance of more trade. No one likes nasty surprises, whether it’s the customer’s fault or not.

Customer Loyalty – Starting a stampercard: Part 1. #customerloyalty #stampcard #coffee #tea #loyalty #retail #cafe #restaurants

Sometimes the simplest solutions work the best and the stamp card is the easiest way to establish a form of repeat business. The idea is simple, have a card with spaces for nine or ten transactions, where stamps or a pen signature can be applied. Once the card is complete the customer then receives a free item. The most common use is in cafes and coffee bars, the card rewards a free coffee (or other item) once the card is complete.

The stamp card offers a cost-effective way of getting repeat business. The cost to you is small, the cost price of the free item. Some planning must be done in order for this system to work properly. Firstly, what item will the customer have to buy in order to get the card stamped? Is it a single item, like a coffee, or anything over £10 for the card to be stamped? With this system a single item or related group of items, like a coffee or a hot drink for example, works best. If you over complicate the rules, then it becomes confusing for both staff and customer.

Designing the Card

Keep the card design simple, the best size to use is the same as a business card, these are then easily carried by the customer in their purse or wallet. Make sure that one side of the card has your branding so it’s easily identifiable.

On the other side of the card is where the stamping will happen. As you can see in the photo of my stamp cards, they are basic in design. They might use a coffee cup as the thing to be stamped, or it could just be a simple square or circle space.

Use a business card printing service to get the cards made, this might be your local office printing supplier or an online business stationary printing service like Vistaprint or Moo.com.

The Stamp

There are a few options you have, you can either use a pen (ballpoint or marker for example) to mark each transaction on the card.   The key thing is to remain consistent. The alternative is to use a rubber stamper for marking the cards. These can be acquired from office stationary outlets or online. If branding is important to you then you could get a custom-made rubber stamp made specifically for your stamp card.

What Can I Learn?

The basic data gained from the stamp card is spend from the customer. Beyond that there’s little else but it does act as a good gauge. If you are selling a cup of coffee at £1.50 for example and it costs 50 pence to produce, the you know a completed card is giving you a profit of of £8.50.

One completed stamp card = (9 x £1.50) = £13.50 – (9 x £0.50p) – £0.50* = 8.50

* The cost of the item you are rewarding the customer with also has to be factored in.

Retaining the completed cards for the month will give you an idea of the overall performance of the cards. It’s worth keeping a spreadsheet of the completed cards redeemed to see how they are performing.

Are There Risks?

Forgeries in stamp cards can be a problem. It’s easy for anyone to copy a ballpoint pen signature or initials. The rubber stamps are slightly better but these can be easily bought from the supplier, it’s better to design your own so it’s unique to the brand. McDonald’s uses stickers for their coffee loyalty card.

The café chain Caffé Nero stamp card was the subject of many a website that showed you how to elaborately print the required stamps to complete the card.  Most of these sites are gone now and Caffe Nero moved to a digital solution while keeping the traditional stamp card.

In the next part I’ll talk about taking the humble stamp card digital.


Dealing With Imposter Syndrome on Panels – #impostersyndrome #beltech2020 #ai #machinelearning #conferences

Thirty-two years into this industry and this was possibly the first time that imposter syndrome didn’t hit me five minutes before the start of the panel.

If you are doing a talk and it’s you and you alone, then that’s okay. You’ve put the work in, got the slides sorted, rehearsed(!) and when you stand on the platform or stage then you are in control (most of the time). A panel though is different, yes you’ve been invited because of one of many factors: you were pushed by your employer, you know what you’re talking about or you’re an idiot. While the invites I get are based on the second notion, I really do think it’s about the third.

The AI Explainability panel at Beltech 2020 was enjoyable. First of all I didn’t have to hike to Belfast to do it. Secondly while I’ve routinely heckled Andrew Bolster and Austin Tanney many-a-time it was great to share a panel with them.

Now then, back to imposter syndrome. Here’s why: this panel, there are two PhD’s, a Professor, a Doctor and a bloke with some rope GCSE grades but eventually figured out how to string a sentence together. Normally I’d be worried as hell but this time I was fine…… I didn’t play the idiot this time, I chose not too, but nor did I go for the gobby know-it-all from the trenches either, I held back. I’m relaxed, I’m with good people, this should be okay. And it was. This was more like a poker hand, read each response and act accordingly, do I lean in with a little agressive response to the Target Baby Story? No, but I’ll happily give you a winning hand knowing the story deeply.

If you know your subject, you can do a panel. Simple as that. Qualifications don’t matter. Nor did the fact I’d written a book on machine learning…..




Extending Topic Retention in Kafka – #kafka #apachekafka #confluent

There’s a part in my internal body clock that worries about Kafka messages, especially production Kafka messages, especially LOSING Kafka messages…. Even when I know that the retention policies work perfectly well and do as they are told I still wake up and worry. If you maintain a Kafka cluster then you’ll understand. When it comes to messages you will do anything to make sure they don’t vanish.

So just to confirm my assumptions and reduce the usage of Nytol, let’s try it out.

Kafka Topic Retention

Message retention is based on time, message size or both of those things. I don’t know the internals of other company’s cluster configuration but time is widely used. Log retention is based on either hours, minutes or milliseconds.

In terms of priority to be actioned by the cluster milliseconds will win, always. You can set all three but the lowest unit size will be used.

retention.ms is greater than retention.minutes which is greater than retention.hours.

Where possible I advise you use retention.ms and have proper control.

A Prototype Example

Here’s what I’m going to do.
  • Create a topic with a retention time of 3 minutes.
  • Send a message to the topic with an obvious time in the payload.
  • Alter the topic configuration and add another 30 minutes of retention time.
  • Have a cup of tea.
  • Consume the message after the original three minute period and see if it’s still there.
  • Celebrate with another cup of tea.

Create a Topic

Nothing out of the ordinary here, I’m using a standalone Kafka instance so there’s only one partition and one replica. The interesting part is adding the config at the end. I’m setting the topic retention time to three minutes (3 x 60 x 1000 = 180000).
$ bin/kafka-topics --zookeeper localhost:2181 --create --topic rtest2 --partitions 1 --replication-factor 1 --config retention.ms=180000

Send a Message

Once again, standard tools win here. Just a plain text message being sent to the topic. I typed in the JSON, there’s nothing fancy here.

$ bin/kafka-console-producer --broker-list localhost:9092 --topic rtest2
>{"name":"This is Jase, this was sent at 16:15"}

The message is now in the topic log and will be deleted just after 16:18. But I’m now going to extend the retention period to preserve that message a little longer.

Alter the Topic Retention

With the kafka-configs command you can inspect any of the topic configs, along with that you can alter them too. So I’m going to alter the retention.ms and set it to 30 minutes (30 * 60 * 1000 = 1,800,000).

$ bin/kafka-configs --alter --zookeeper localhost:2181 --add-config retention.ms=1800000 --entity-type topics --entity-name rtest2

Completed Updating config for entity: topic 'rtest2'.

Have a Cup of Tea

If everything were to go horribly wrong then it’s going to be about now. So a tea is in order.

Check the Topic by Consuming Messages

Running the consumer from the earliest offset should bring back the original message.

$ bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
{"name":"This is Jase, this was sent at 16:15"}
Processed a total of 1 messages

Okay that’s worked perfectly well (as expected), let’s try it again because I’m basically paranoid when it comes to these things. I’ll add the date this time for added confirmation.

$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
Fri  3 Apr 16:24:18 BST 2020
{"name":"This is Jase, this was sent at 16:15"}
Processed a total of 1 messages

Looking good. And I’m going to do it again because I want to make sure…..

$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
Fri  3 Apr 16:24:50 BST 2020
{"name":"This is Jase, this was sent at 16:15"}

Celebrate Again

The kettle is on. Time for another tea.