There’s a part in my internal body clock that worries about Kafka messages, especially production Kafka messages, especially LOSING Kafka messages…. Even when I know that the retention policies work perfectly well and do as they are told I still wake up and worry. If you maintain a Kafka cluster then you’ll understand. When it comes to messages you will do anything to make sure they don’t vanish.

So just to confirm my assumptions and reduce the usage of Nytol, let’s try it out.

Kafka Topic Retention

Message retention is based on time, message size or both of those things. I don’t know the internals of other company’s cluster configuration but time is widely used. Log retention is based on either hours, minutes or milliseconds.

In terms of priority to be actioned by the cluster milliseconds will win, always. You can set all three but the lowest unit size will be used.

retention.ms is greater than retention.minutes which is greater than retention.hours.

Where possible I advise you use retention.ms and have proper control.

A Prototype Example

Here’s what I’m going to do.
  • Create a topic with a retention time of 3 minutes.
  • Send a message to the topic with an obvious time in the payload.
  • Alter the topic configuration and add another 30 minutes of retention time.
  • Have a cup of tea.
  • Consume the message after the original three minute period and see if it’s still there.
  • Celebrate with another cup of tea.

Create a Topic

Nothing out of the ordinary here, I’m using a standalone Kafka instance so there’s only one partition and one replica. The interesting part is adding the config at the end. I’m setting the topic retention time to three minutes (3 x 60 x 1000 = 180000).
$ bin/kafka-topics --zookeeper localhost:2181 --create --topic rtest2 --partitions 1 --replication-factor 1 --config retention.ms=180000

Send a Message

Once again, standard tools win here. Just a plain text message being sent to the topic. I typed in the JSON, there’s nothing fancy here.

$ bin/kafka-console-producer --broker-list localhost:9092 --topic rtest2
>{"name":"This is Jase, this was sent at 16:15"}

The message is now in the topic log and will be deleted just after 16:18. But I’m now going to extend the retention period to preserve that message a little longer.

Alter the Topic Retention

With the kafka-configs command you can inspect any of the topic configs, along with that you can alter them too. So I’m going to alter the retention.ms and set it to 30 minutes (30 * 60 * 1000 = 1,800,000).

$ bin/kafka-configs --alter --zookeeper localhost:2181 --add-config retention.ms=1800000 --entity-type topics --entity-name rtest2

Completed Updating config for entity: topic 'rtest2'.

Have a Cup of Tea

If everything were to go horribly wrong then it’s going to be about now. So a tea is in order.

Check the Topic by Consuming Messages

Running the consumer from the earliest offset should bring back the original message.

$ bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
{"name":"This is Jase, this was sent at 16:15"}
Processed a total of 1 messages

Okay that’s worked perfectly well (as expected), let’s try it again because I’m basically paranoid when it comes to these things. I’ll add the date this time for added confirmation.

$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
Fri  3 Apr 16:24:18 BST 2020
{"name":"This is Jase, this was sent at 16:15"}
Processed a total of 1 messages

Looking good. And I’m going to do it again because I want to make sure…..

$ date ; bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic rtest2 --from-beginning
Fri  3 Apr 16:24:50 BST 2020
{"name":"This is Jase, this was sent at 16:15"}

Celebrate Again

The kettle is on. Time for another tea.