Kafka Is A Team Sport Redux – #kafka #apachekafka #streamingdata #data #confluentkafka

The feedback from Saturday’s blog post Kafka Is A Team Sport was very positive. I still felt I hadn’t quite got my point across though, well not clearly enough. Then it hit me.

The Three Functions

I see the current ecosystem as three distinct parts. Development, DevOps Engineering and Data Engineering. They all have different functions and, in my eyes, operate differently.

Excuse the Apple Pencil, they don’t allow sharp objects in here….. and it was early and I’ve only had one cup of tea.

Clients

The application development area. The Client APIs whether that’s Java, Go, Python, Clojure, PHP, C++ or anything else I can think of. The main thing is that there is a development task that needs writing. It might be a producer, a consumer, a streaming job or even a KSQL query.

I honestly don’t need to care how Kafka works at this point, I just want to get a message or read a message. And I can’t program in any of those languages I have the REST proxy to help me too. HTTP still has it’s place in the world, just be careful with the security.

Tooling

Jobs that require setting up for things like Kafka Connect and Replicator/Mirror Maker are more tooling jobs. When I say tooling I really mean “sorting out configuration” and sending it over REST (especially with Connect). Do I need to know how to program? No, not really.

Backend

Brokers, latency, capacity planning and (oh no!) Zookeeper, are admin functions, or DevOps if you will. I’d never ask a software developer get into the minute detail of the server.properties file or look at volume throughput. In the same way I’d never ask them to tune a PostgreSQL database (in fact I wouldn’t touch that either, I’ve no idea).

 

Kafka Is A Team Sport – #kafka #apachekafka #streamingdata #data #confluentkafka

Tags

, , , ,

One of the nice things about being an early adopter is watching things grow. Another is observing how that adoption happens from organisation to organisation. The questions I get tend to be varied and in the oddest of locations, doing an impromptu Q&A at London City Airport is still a highlight of data related daftness.

The last few weeks have been all about Kafka, well it is my job, and it’s also my sideline hobby. Giving talks is always fun for me. Over the last couple of weeks I’ve presented at a couple of meetups and also done a full podcast interview with Tim Berglund for the Confluent Streaming Audio podcast. It finished off my presentations for 2020 perfectly.

Something that came out of all three events though was the amount of knowledge someone needs about the ecosystem.

Developer, Devops, Support or Something Else?

The question is how do we come to Kafka in the first place? For me it was, “Can you look after this for me please?” at work a few years ago. Did I know how it worked, I knew some concepts of streaming from using ActiveMQ and RabbitMQ but how Kafka actually worked, no not really.

What I was being asked to do was make sure that the cluster didn’t die and cause issues for the customer. That’s different, that’s a support function.

The opportunity though was there to learn about how it works and that’s where I spent me time. So I spend time coding up basic producers and consumers to see how it all worked. At that time the ecosystem wasn’t really there, it was basic. Producers wrote to the broker(s) and consumers read from them. That was pretty much it.

Does a developer need to know about producer/consumer throughput? No, not really. There’s a huge difference between needing to know and wanting to know. Most developers I know want to get the job done.

It’s All About The Organisation

Startups tend to be scrappy, they build stuff and download frameworks and just get on with things. So there’s a good chance that the knowledge is combined in to a small pool of people.

When it comes to more “traditional” businesses and the employee size goes up to hundreds or thousands, then that’s where things become interesting. Silos begin to take hold (and this is no bad thing, it all depends on the organisation) and the knowledge share is limited. Many times there’s a dividing wall between a developer and the framework in question.

A developer may be asked to write an application to “write messages to Kafka” but after that there’s no real need to know how Kafka works. Take some sample, alter it to your means and off it goes for testing, QA and deployment.

Does a developer need to know KSQL internals? No. Kafka Connect’s partition/thread balancing act? No, not really. In my eyes the developer doesn’t explicitly need to know the framework just merely know the accepted requirements for the application.

If I were a betting man and I asked a developer if they knew about the ProducerInterceptor class, I’m wagering I’d get blank looks. I personally think it’s pretty important but that’s merely my opinion.

Who Needs to Know What. Really?

Kafka becomes a team sport. Developers write applications so I’d expect them to know how the Client APIs work and how to subscribe and send messages to the Rest PROXY (if it’s active).

A working knowledge of Schema Registry, possible not. Some clusters quite merrily run on raw strings and JSON payloads. If you want to take things seriously and use Avro then you start getting to schemas.

Kafka Connect can be an art form in itself and I wouldn’t expect a developer to be involved unless they were writing a custom connector. Deployment of connectors is more than likely a DevOps like function then it’s a monitoring task. And lastly does a developer need to know broker SSL and how brokers really work? To get an application working, no so much. If anything it should be pointing at certificates in properties.

So, as a developer do I need to know everything about Kafka?

No.

 

Tesco Clubcard: #customerloyalty is dead! It’s all about membership! #clubcard #tesco #loyalty #data #datamining #algorithms

Tags

, , , , , ,

(First things first, four months since my last post…. apologies, just a lot of things going on.)

The “rona” has done a lot of things to retail, it’s also done an awful lot of things to the economy. The UK furlough scheme reduced incomes and made customers very very very price sensitive. And today is the last day of the original furlough scheme as it stands, the next few months will be sadly even more worrying for some.

For Tesco it was also being forced from the likes of Lidl and Aldi on certain produce, fresh fruit and vegetable prices dropped in a wave of trying to keep up with the new kids in town. Consumers obviously liked it and certainly needed it.

All the while during the Covid pandemic, customer loyalty was going through it’s own problems.

For me, having worked in customer loyalty data mining, recommendation systems and other loyalty card related work (yes, I can bore you to tears with this stuff, oddly enough it’s still one of the most requested topics I’m asked to talk about). The loyalty shift was quick and dramatic, but we first need to see where we were before the lockdowns began.

Four Christmases a Year

This was the mantra of DunnHumby/Tesco since the Clubcard was introduced in 1995, while the world was generally losing it’s knickers over algorithms, Tesco have been calmly doing it for 25 years and we gave them permission to do it. Each quarter was when you got your Clubcard statement through with the vouchers that reflected your points accrual.

If you bought specific items that Tesco wanted you to buy, then you got more points. It was a trade off, if you want money off and better deals then you have to tow the line. And oh boy we did. It was a behavioural science playbook and it was executed perfectly most of the time.

It was a task and a half to get these coupons to households, a full security operation too with secured vehicles and secondary dummy vehicles and so on. Your loyalty was guarded better than some bad’uns who were locked up. Why? Because coupons have a monetary value, albeit very small, it’s still currency and it’s accountable.

The reason for the quarterly coupon drop, well that’s how long it took to mine the data back in the day. When you’re dealing with a firehose of basket data, well it takes a bit of time to get through it all. Back in 1995 it was only 10% of the baskets that were measured this gave out some really wobbly recommendations and money off coupons to customers….. so once the scale was there, they did everything they could.

And I LOVED that concept, I still do. I’ve been down a customer loyalty data rabbit hole ever since. For it be my happy place…..

Clubcard. Why do retailers invest in loyalty programmes? Key goal is to increase customer loyalty. Champion. All customers can be placed at some point in this 3-D cube. A customer’s location in the cube suggests actions suitable to earn his/ her lifelong loyalty. Contribution: profitability today. Commitment: future value. – likelihood of remaining a customer. – ‘headroom’ Championing: ambassador. Commitment. Consumer. Contribution. Scoring Points (2003), Humby et al.

While the cube is a simple concept, the practicalities of wading through the data efficiently and the computational nature of each customer (which segments get which offers) is not. It takes time and the figuring out who should get what, it also costs money (computing power and so on) to do so the payoff for the retailer has to be worth it.

I Wish It Could Be Christmas Everyday

(Everything from here is basically opinion, if I’m wrong then please tell me).

Once lockdown restrictions started to ease and we weren’t kettled through the supermarket one way system (and the joy of going around in circles to get back to where the vinegar was because you walked past it), the Clubcard sprouted a new set of wings. Discounts in store.

 

“Clubcard Prices”, or one price for the “regular” customer and a discounted price for Clubcard holders. This shifts the whole concept of what the Clubcard is and what it actually stood for, which was loyalty. Now it stands for, in my eyes, membership. Just by holding the card now means that you are entitled to discounts on certain items when passed through the checkout. Well that’s how it could be perceived but more on that in a moment.

Now it’s no secret that Tesco need to save money and have some serious competition from the likes of Aldi and Lidl. Store visits would have gone down, getting delivery slots during Covid was increasingly difficult depending on where you lived and choice was drastically reduced. The perceived value of holding the card then starts to decrease. Add to that the costs Tesco encounter of processing Clubcard data, it is now pretty expensive, between data centres, software licensing costs and salaries it’s an expensive operation to maintain.

Is this now meaningless to the customer?

So, discounting at source, no longer caring what you’re really buying, that’s the future of the Clubcard. Well that’s partly true, there’s no emphasis on buying certain items or attempting to boost your points balance, the nudge unit has stepped back. As there’s little in the way of physical coupons (unless it’s a promotion with another brand) then there’s little going on in the nudge department either. Remember the Clubcard was about changing behaviour, getting you to buy items you may not have considered.

Linking point of sale (POS) data to your basket and card number (and social/economic data by postcode), that continues as it’s a good earner for Tesco on the supplier side of the equation. Those reports are gold dust still for brands. For the customer though, the emphasis on eagerly waiting for coupons to arrive every three months, well that’s gone now. The Clubcard serves as a means to get perceived discounts and, in my opinion, little else.

And no, I think it has nothing to do with GDPR. We agreed to a contract, points and behaviour for money off and offers. We ticked the boxes and have to be responsible for our actions and the consequences thereon.

So, What’s the Endgame?

Bums on seats…. so to speak. It’s about attention.

From where I’m sat, Tesco are in a mad race to get more customer to signup to the Clubcard, the enticement is staring you in the face in-store, discounts. And some of them are quite compelling especially if you like your spirits.

 

Do you want to save 35.9% on a bottle of whiskey (yeah I know, the e in whiskey… sue me 🙂 ), you’d be a fool not to be tempted by such a deal and signup to the Clubcard to get it, especially with a Covid Christmas approaching at velocity.

Now, as far as I’m concerned, nothing is ever straightforward working on the theory of “no such thing as a free lunch”. Where there’s a discount on one side there’s a price increase on the other. And when you have 45,000 line items you can easily ramp up 1 or 2p per line item and preserve the bottom line.

There’s always a trade off and think Tesco have done the right thing here. My only thought is that this may really signal the end of the what the Clubcard really stood for, and that was loyalty.

 

Selling API Access – Three Steps to Customer Success – #api #programming #software #business #data

Application Programming Interfaces (API) are is a method for developers and website owners to connect their product to another product. For example, think about all the tools that use Twitter, Instagram and Facebook to get a customer’s data, developers will more than likely be using the respective API to connect and access the data.

I’ve used API’s in one way or another over my career, I’ve created APIs that have enabled businesses to access my product data. The majority of the time this access comes for free, it’s a win-win for the developer and the source product owner. Sometimes, however, APIs are accessible for a fee.

Fees for APIs come in various forms but the common ones are:

  • Cost per call (e.g. $0.001/per API call)
  • Monthly fee ($25 per month for a number of calls, this may be tiered to higher prices depending on the volume of access you are creating).
  • Credits (Various API calls have a certain amount of credits per call, credits are bought in advance on either a monthly or pay-as-you-go basis).

This last month I’ve been working on an idea, it required a specific API and it was going to cost me money to do so. So hunting around for data sources I found one that would fit the bill, the price was great and looked like it was discounted. So hitting the on PayPal subscribe button I paid my dues and got to work……

I got the data I was looking for and I’d only used less than 1% of my monthly quota. Come the end of the first period I wondered when the next payment was going out. I knew I’d get a PayPal email so wasn’t in a huge hurry to find out. What I read took me by surprise as I’d been charged nearly twenty-five times the price I was expecting…. surely a mistake. I go back to the website, then I saw the small print. The first month excellent price was the trial rate, after which I’d have to give a number of days notice to cancel the trial.

I’m not here to knock the company, I’m not even going to name them. Ultimately I’m responsible for my actions here. It did however get me thinking, if it was my company what would I do when charging for API access. Three things come to mind and I’d like to share them with you.

Confirm The Terms With a Checkbox

If your service depends on PayPal subscriptions, or Stripe and other payment providers, then make sure your customer is crystal clear about what they are about to sign up to. Terms and conditions of the transaction are upfront and clear, the trial cost and then the ongoing costs. Be very clear about trial rates and what will happen and the future charges involved. Don’t worry if this becomes another step in the process, your customer will thank you for this now rather than angry or rant like emails after the trial is over.

Send a Reminder

Give your customer time to cancel but don’t expect them to remember they signed up. Customer success revolves around good communication. An email ten days prior to the end of the trial period and the increase in the fee is only good manners, no one likes to get caught out. At this point you can re-connect with your customer, remind them they are signed up and the trial period is coming to an end. Be clear that the price is going up and if they wish to cancel then this is the time to do it so they are not charged by surprise.

This ends up being a two way thing, the customer might have questions, they might have feedback. The positives of a good reminder are huge here, there is so much opportunity to create better customer service, a better product and a better experience.

Measure the Usage

Most API access comes in tiered usage billed on a monthly basis. The first 1,000 calls might be free, the next 10,000 will cost an amount and the cost increases for each usage band a customer uses. Sometimes a customer can signup for access and go all in for the highest tier and hardly use it.

Once again, email wins here. Now there’s an opportunity to connect with the customer again. “We see you’ve only used 1% of your access allowance.” prior to the renewal time or next subscription payment. This kind of calculation shouldn’t be a difficult one, with the amount of tools out there on the market now, even a spreadsheet will do, you can create a lot of value by either encouraging the user to use the API more or say that you will charge a lower tier amount as it looks like they will never get to that level.

The customer touch points here are a big deal, it means they know that you care about being fair with their business, you want to help them succeed and in turn there’s a good probability they will recommend your service to other people.

Conclusions

Regardless of how your service works, creating multiple touch points for customer success will only do your company good in the long run. The customer will feel valued and there’s an increased chance of more trade. No one likes nasty surprises, whether it’s the customer’s fault or not.

Customer Loyalty – Starting a stampercard: Part 1. #customerloyalty #stampcard #coffee #tea #loyalty #retail #cafe #restaurants

Sometimes the simplest solutions work the best and the stamp card is the easiest way to establish a form of repeat business. The idea is simple, have a card with spaces for nine or ten transactions, where stamps or a pen signature can be applied. Once the card is complete the customer then receives a free item. The most common use is in cafes and coffee bars, the card rewards a free coffee (or other item) once the card is complete.

The stamp card offers a cost-effective way of getting repeat business. The cost to you is small, the cost price of the free item. Some planning must be done in order for this system to work properly. Firstly, what item will the customer have to buy in order to get the card stamped? Is it a single item, like a coffee, or anything over £10 for the card to be stamped? With this system a single item or related group of items, like a coffee or a hot drink for example, works best. If you over complicate the rules, then it becomes confusing for both staff and customer.

Designing the Card

Keep the card design simple, the best size to use is the same as a business card, these are then easily carried by the customer in their purse or wallet. Make sure that one side of the card has your branding so it’s easily identifiable.

On the other side of the card is where the stamping will happen. As you can see in the photo of my stamp cards, they are basic in design. They might use a coffee cup as the thing to be stamped, or it could just be a simple square or circle space.

Use a business card printing service to get the cards made, this might be your local office printing supplier or an online business stationary printing service like Vistaprint or Moo.com.

The Stamp

There are a few options you have, you can either use a pen (ballpoint or marker for example) to mark each transaction on the card.   The key thing is to remain consistent. The alternative is to use a rubber stamper for marking the cards. These can be acquired from office stationary outlets or online. If branding is important to you then you could get a custom-made rubber stamp made specifically for your stamp card.

What Can I Learn?

The basic data gained from the stamp card is spend from the customer. Beyond that there’s little else but it does act as a good gauge. If you are selling a cup of coffee at £1.50 for example and it costs 50 pence to produce, the you know a completed card is giving you a profit of of £8.50.

One completed stamp card = (9 x £1.50) = £13.50 – (9 x £0.50p) – £0.50* = 8.50

* The cost of the item you are rewarding the customer with also has to be factored in.

Retaining the completed cards for the month will give you an idea of the overall performance of the cards. It’s worth keeping a spreadsheet of the completed cards redeemed to see how they are performing.

Are There Risks?

Forgeries in stamp cards can be a problem. It’s easy for anyone to copy a ballpoint pen signature or initials. The rubber stamps are slightly better but these can be easily bought from the supplier, it’s better to design your own so it’s unique to the brand. McDonald’s uses stickers for their coffee loyalty card.

The café chain Caffé Nero stamp card was the subject of many a website that showed you how to elaborately print the required stamps to complete the card.  Most of these sites are gone now and Caffe Nero moved to a digital solution while keeping the traditional stamp card.

In the next part I’ll talk about taking the humble stamp card digital.

 

Dealing With Imposter Syndrome on Panels – #impostersyndrome #beltech2020 #ai #machinelearning #conferences

Thirty-two years into this industry and this was possibly the first time that imposter syndrome didn’t hit me five minutes before the start of the panel.

If you are doing a talk and it’s you and you alone, then that’s okay. You’ve put the work in, got the slides sorted, rehearsed(!) and when you stand on the platform or stage then you are in control (most of the time). A panel though is different, yes you’ve been invited because of one of many factors: you were pushed by your employer, you know what you’re talking about or you’re an idiot. While the invites I get are based on the second notion, I really do think it’s about the third.

The AI Explainability panel at Beltech 2020 was enjoyable. First of all I didn’t have to hike to Belfast to do it. Secondly while I’ve routinely heckled Andrew Bolster and Austin Tanney many-a-time it was great to share a panel with them.

Now then, back to imposter syndrome. Here’s why: this panel, there are two PhD’s, a Professor, a Doctor and a bloke with some rope GCSE grades but eventually figured out how to string a sentence together. Normally I’d be worried as hell but this time I was fine…… I didn’t play the idiot this time, I chose not too, but nor did I go for the gobby know-it-all from the trenches either, I held back. I’m relaxed, I’m with good people, this should be okay. And it was. This was more like a poker hand, read each response and act accordingly, do I lean in with a little agressive response to the Target Baby Story? No, but I’ll happily give you a winning hand knowing the story deeply.

If you know your subject, you can do a panel. Simple as that. Qualifications don’t matter. Nor did the fact I’d written a book on machine learning…..

 

 

 

Extending Topic Retention in Kafka – #kafka #apachekafka #confluent

There’s a part in my internal body clock that worries about Kafka messages, especially production Kafka messages, especially LOSING Kafka messages…. Even when I know that the retention policies work perfectly well and do as they are told I still wake up and worry. If you maintain a Kafka cluster then you’ll understand. When it comes to messages you will do anything to make sure they don’t vanish.

So just to confirm my assumptions and reduce the usage of Nytol, let’s try it out.

Kafka Topic Retention

Message retention is based on time, message size or both of those things. I don’t know the internals of other company’s cluster configuration but time is widely used. Log retention is based on either hours, minutes or milliseconds.

In terms of priority to be actioned by the cluster milliseconds will win, always. You can set all three but the lowest unit size will be used.

retention.ms is greater than retention.minutes which is greater than retention.hours.

Where possible I advise you use retention.ms and have proper control.

A Prototype Example

Here’s what I’m going to do.
  • Create a topic with a retention time of 3 minutes.
  • Send a message to the topic with an obvious time in the payload.
  • Alter the topic configuration and add another 30 minutes of retention time.
  • Have a cup of tea.
  • Consume the message after the original three minute period and see if it’s still there.
  • Celebrate with another cup of tea.

Create a Topic

Nothing out of the ordinary here, I’m using a standalone Kafka instance so there’s only one partition and one replica. The interesting part is adding the config at the end. I’m setting the topic retention time to three minutes (3 x 60 x 1000 = 180000).
$ bin/kafka-topics --zookeeper localhost:2181 --create --topic rtest2 --partitions 1 --replication-factor 1 --config retention.ms=180000

Send a Message

Once again, standard tools win here. Just a plain text message being sent to the topic. I typed in the JSON, there’s nothing fancy here.

$ bin/kafka-console-producer --broker-list localhost:9092 --topic rtest2
>{"name":"This is Jase, this was sent at 16:15"}

The message is now in the topic log and will be deleted just after 16:18. But I’m now going to extend the retention period to preserve that message a little longer.

Alter the Topic Retention

With the kafka-configs command you can inspect any of the topic configs, along with that you can alter them too. So I’m going to alter the retention.ms and set it to 30 minutes (30 * 60 * 1000 = 1,800,000).

$ bin/kafka-configs --alter --zookeeper localhost:2181 --add-config retention.ms=1800000 --entity-type topics --entity-name rtest2

Completed Updating config for entity: topic 'rtest2'.

Have a Cup of Tea

If everything were to go horribly wrong then it’s going to be about now. So a tea is in order.

Check the Topic by Consuming Messages

Running the consumer from the earliest offset should bring back the original message.

$ bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
{"name":"This is Jase, this was sent at 16:15"}
Processed a total of 1 messages

Okay that’s worked perfectly well (as expected), let’s try it again because I’m basically paranoid when it comes to these things. I’ll add the date this time for added confirmation.

$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
Fri  3 Apr 16:24:18 BST 2020
{"name":"This is Jase, this was sent at 16:15"}
Processed a total of 1 messages

Looking good. And I’m going to do it again because I want to make sure…..

$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
Fri  3 Apr 16:24:50 BST 2020
{"name":"This is Jase, this was sent at 16:15"}

Celebrate Again

The kettle is on. Time for another tea.

Three Things Retailers Can Do Right Now – #COVID-19 #retail #ecommerce #startups #fashion #restaurants #fastfood #bricksandmortar

Retailers are feeling the impact of COVID-19 as the landscape of how people move, interact and generally get on with day-to-day life. The ones I’ve been talking to have seen drastic falls in footfall and takings. As you can assume, they are naturally worried.

For the first time ever I managed to enter and exit and McDonald’s Drive-thru without another vehicle near me.

So the next few months are going to be a bit all over the place for everyone, for retailers though here’s a few ideas that might help.

Make Stock Available Online

There are still a lot of bricks and mortar retailers who do not sell online. The assumption by shoppers is that most stores have some form of online store, the reality is usually different.

If you want to just setup a shop then the likes of Shopify, Squarespace and Wix will get you started quickly. If you want to align your point of sale system with an ecommerce site then Airpos do this, the stock in store is the same stock online.

If you are heavily into Instagram the consider switching to a professional account and using a Facebook store to sell on Instagram.

Sell Gift Vouchers

Once again not everyone does this, if not then now’s a good time to start. Even better to make them available online or via email. There are an army of shoppers who still want to support local businesses during this enforced downtime. They want to stay loyal to the brands they love and this is not always the big names, the small retailers matter just as much.

As vouchers have a shelf life then it means you can sell now and let customers redeem when it’s safe to do so.

Social Social Social

Every retailer has a sweet spot of an audience on social media. For some it’s Instagram, for others Facebook does them well. Twitter is still the best broadcast media in my opinion.

If you’re still open for business, then tell the world your still open for business, it’s just that the rules on how you interact have changed a bit. Technology brings to retailer-to-customer interaction closer together. If you’ve not had chance to practice and harness it, well now you’re opportunity.

The piece about Glossier in the current issue of Wired UK Edition put it perfectly.

Also, if you need help then don’t be afraid to ask for it, once again Twitter’s the perfect place. Right now, we have to help each other.

Be safe but also proactive.

The National Trust and the Travelling Salesperson – #TSP #Data #NationalTrust @NationalTrustNI #R #RLang

I’ve not had to change much of my routine when it comes to self isolation, I work from home anyways. Saying that with all my speaking engagements cancelled (and rightly so) my brain needed something else to do…..

The Travelling Salesperson problem has fascinated me for years but I’ve never had the time to really sit down with it.

Welcome to TSP!

The Travelling Salesperson problem is this.

“Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?”

What we have here is a graph problem, each location is a node and the route connecting each node are the edges.

There are a number of algorithms that have been devised over the years to calculate the optimum shortest route, it’s the sheer number of permutations to calculate the problem that makes me interested. For now the thing to keep in mind is this….

(n-1)!/2

More on this in a moment, right now I need some locations to test all this out. If I pick a high number it’ll take for ever to calculate a route. I’m going to use R, the Google Maps API and the TSP R Library to see if I can plot a route of some nodes. I need a decent dataset, the one I have in mind is massive so let’s take a subset instead…

The National Trust NI Locations.

The National Trust NI Sites

There’s a nice number of National Trust properties in Northern Ireland. I could have chosen pubs but that’s not what I really do, so as the card carrying member I will do my bit to support the cause.

Here’s my list of locations, I’ve pulled the address data from a combination of the National Trust website and Google Maps:

ADDRESSES = c(
"Carrick-a-Rede, 119a Whitepark Road, Ballintoy, County Antrim, BT54 6LS",
"The Crown Bar, 46 Great Victoria Street, Belfast, County Antrim, BT2 7BA",
"Divis and Black Mountain, Divis Road, Hannahstown, near Belfast, County Antrim, BT17 0NG",
"Dunseverick Castle, Causeway Road, Bushmills",
"Fair Head, Fairhead Road, Ballyvoy, Ballycastle,BT54 6RD",
"Giant's Causeway, 44 Causeway Road, Bushmills, County Antrim, BT57 8SU",
"Patterson's Spade Mill, 751 Antrim Rd, Templepatrick, Newtownabbey, Ballyclare BT39 0AP",
"Ardress House, 64 Ardress Road, Annaghmore, Portadown, County Armagh, BT62 1SQ",
"Coney Island, Lough Neagh, Dungannon, BT71 6PA",
"Derrymore House, Bessbrook, Newry BT35 7EF",
"Castle Ward, Strangford, Downpatrick BT30 7BA",
"Mount Stewart, Portaferry Rd, Newtownards BT22 2AD",
"Murlough Nature Reserve, Keel Point, Dundrum, Newcastle BT33 0NQ",
"Rowallane Garden, Crossgar Rd, Saintfield, Ballynahinch BT24 7LH",
"Castle Coole, Castlecoole Rd, Enniskillen BT74 6JY",
"Crom Estate, Newtownbutler, Enniskillen BT92 8AJ",
"Springhill, 20 Springhill Rd, Moneymore, Magherafelt BT45 7NQ",
"Downhill Estate, Mussenden Rd, Castlerock, Coleraine BT51 4RP",
"Hezlett House, 107 Sea Rd, Castlerock, Coleraine BT51 4TW",
"Portstewart Strand, 118 Strand Rd, Portstewart BT55 7PG",
"Gray's Printing Press, 49 Main St, Strabane BT82 8AU",
"Wellbrook Beetling Mill, 20 Wellbrook Rd, Corkill Rd, Cookstown BT80 9RY"
)

The ! means factorial and it means the product of a number. eg 1 x 2 x 3…..

3! is 6 for example 1 x 2 x 3.

7! is 5,040, 1 x 2 x 3 x 4 x 5 x 6 x 7.

With 21 Trust properties in NI, it’s going to be a large number. The equation though has a few more bits to it, I’m already at my starting point so I can take that one away and I’m only going in one direction so I divide the answer by 2.

How many permutations are there to start at each point and figure out every possible route?

(21-1)!/2 = 1,216,451,004,088,320,000

Or....
20 x 19 x 18 x 17 x 16 x 15 x 14 x 13 x 12 
    x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1
divided by 2 as we're only going one way.....

We could be here for a while with a quintillion routes to calculate. That’s the brute force method, using the other algorithms will make life much easier. Ever more so, I’m not going to waste my time writing new code when someone has already done it.

Andrew Collier (@datawookie) already has the code done for us…. thanks Andrew. His marvellous code will figure out all the positions in the map, work out the optimum route and then generate a Google Map for me. Brilliant!

In the time it took for Andrew’s TSP code to work out a route, I’d made a cup of tea. That’s good going and this is what was waiting for me when I got back…..

Lovely isn’t it?

So What’s the Route?

So the shortest route is 417.5 miles altogether and a total driving time of just over 11 hours, the algorithm started me off at Dunseverick Castle and suggested this shortest route:

Dunseverick Castle > Giant’s Causeway > Portstewart Strand > Hezlett House > Downhill Estate > Gray’s Printing Press > Castle Coole > Crom Estate > Wellbrook Beetling Mill > Springhill > Coney Island > Ardress House > Derrymore House > Murlough Nature Reserve > Castle Ward > Rowallane Garden > Mount Stewart > Crown Bar > Divis and Black Mountain > Patterson’s Spade Mill > Fair Head > Carrick-a-Rede

Interesting that the shortest route also means you can put that fear of heights well to the back of your mind until you are on the final leg of the journey (I’ve still not gone over that bridge).

Now all I need to know which properties have second hand bookshops in them.

With the Northern Ireland locations figured out, and that quintillion number firmly lodged in my head as it’s a rather large number, it merely brings me on to the next question in my mind.

How Many Permutations to Visit ALL the National Trust Properties?

The things that keep my mind occupied. We know the brute force calculation of the NI properties is (21-1)!/2, there are 1500 National Trust properties in the UK. I’m really curious now how that looks…… how many permutations are there?

Back to that factorial again. (1500-1)!/2

160399926559325828672233003119379326061160268022420460271028531372101
917296670318637640768617355779506228213365397811950646840303748727220
900904939604644438105426542700999480497779296815631318368290559599231
084345664901559430781330877755152176202495350252487889761078688818307
591803733611460289091231259625586667371351061600814514024732016213923
2331686047840739961706485632803190198184986301818944059...

That’s only the first 400 digits of it, there are 4112 digits in the actual number. Even Wolfram Alpha won’t tell me the full answer in one go so I’ll have to pad it out…. If I fill in the remaining numbers with zeroes (no way I’m going to guess) this is the sort of number we’re dealing with.

Ready?

16039992655932582867223300311937932606116026802242046027102853137210191729667031
86376407686173557795062282133653978119506468403037487272209009049396046444381054
26542700999480497779296815631318368290559599231084345664901559430781330877755152
17620249535025248788976107868881830759180373361146028909123125962558666737135106
16008145140247320162139232331686047840739961706485632803190198184986301818944059
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000

Sorry, I’m not typing the commas in for this one….. you’ll just have to imagine. As an educated guess it would take estimated 2 CPU years time to figure out the optimum route using Concorde TSP algorithm.

 

Machine Learning Hands On 2nd edition now available. #machinelearning #ai #java #kafka #clojure #weka #dl4j #spark #r

Towards the end of February the author copies of the second edition of my book, Machine Learning: Hands-on for developers and technical professionals landed on my desk. I’m very happy with the way it’s turned out.

I’ve spend the last few months resting, asides from the day job at Digitalis.io – the plan was to gradually ease myself back in at Strata Data Conference in London, sadly this is cancelled due to COVID-19 concerns, only compounded by FlyBe going into administration.

I am, however, doing some speaking things over March and April – DevOps Belfast on 24th March and I’ll be on the AI Panel at Beltech 2020 as well.