My first WordPress blog post was back in October 2008, my first blog post goes back way before then in 2000. Twenty-one years of knowledge, insight and, let’s be honest, nonsense.
While most folk know me for the big data thing, whether that’s Hadoop, Spark, Kafka or Pulsar. What I want to do now is focus on the fun stuff again, the side projects that make me tick. So it’s really about data and insight. As my day job is Kafka, Hadoop and Pulsar with the brilliant company Digitalis I’ll keep those kinds of post for them.
With lockdowns, dodgy information, increased anxiety and a complete lack of sleep, well the last couple of months have only reminded me that data is sexy, and the incline to a truly algorithmic business is worth looking into in my spare time.
So, if you want stuff on algorithms, data and artificial intelligence in retail, it’ll be here. Kafka and the lark, that will be with my employer. Time for tea I think……
One of the nice things about being an early adopter is watching things grow. Another is observing how that adoption happens from organisation to organisation. The questions I get tend to be varied and in the oddest of locations, doing an impromptu Q&A at London City Airport is still a highlight of data related daftness.
The last few weeks have been all about Kafka, well it is my job, and it’s also my sideline hobby. Giving talks is always fun for me. Over the last couple of weeks I’ve presented at a couple of meetups and also done a full podcast interview with Tim Berglund for the Confluent Streaming Audio podcast. It finished off my presentations for 2020 perfectly.
Something that came out of all three events though was the amount of knowledge someone needs about the ecosystem.
Developer, Devops, Support or Something Else?
The question is how do we come to Kafka in the first place? For me it was, “Can you look after this for me please?” at work a few years ago. Did I know how it worked, I knew some concepts of streaming from using ActiveMQ and RabbitMQ but how Kafka actually worked, no not really.
What I was being asked to do was make sure that the cluster didn’t die and cause issues for the customer. That’s different, that’s a support function.
The opportunity though was there to learn about how it works and that’s where I spent me time. So I spend time coding up basic producers and consumers to see how it all worked. At that time the ecosystem wasn’t really there, it was basic. Producers wrote to the broker(s) and consumers read from them. That was pretty much it.
Does a developer need to know about producer/consumer throughput? No, not really. There’s a huge difference between needing to know and wanting to know. Most developers I know want to get the job done.
It’s All About The Organisation
Startups tend to be scrappy, they build stuff and download frameworks and just get on with things. So there’s a good chance that the knowledge is combined in to a small pool of people.
When it comes to more “traditional” businesses and the employee size goes up to hundreds or thousands, then that’s where things become interesting. Silos begin to take hold (and this is no bad thing, it all depends on the organisation) and the knowledge share is limited. Many times there’s a dividing wall between a developer and the framework in question.
A developer may be asked to write an application to “write messages to Kafka” but after that there’s no real need to know how Kafka works. Take some sample, alter it to your means and off it goes for testing, QA and deployment.
Does a developer need to know KSQL internals? No. Kafka Connect’s partition/thread balancing act? No, not really. In my eyes the developer doesn’t explicitlyneed to know the framework just merely know the accepted requirements for the application.
If I were a betting man and I asked a developer if they knew about the ProducerInterceptor class, I’m wagering I’d get blank looks. I personally think it’s pretty important but that’s merely my opinion.
Who Needs to Know What. Really?
Kafka becomes a team sport. Developers write applications so I’d expect them to know how the Client APIs work and how to subscribe and send messages to the Rest PROXY (if it’s active).
A working knowledge of Schema Registry, possible not. Some clusters quite merrily run on raw strings and JSON payloads. If you want to take things seriously and use Avro then you start getting to schemas.
Kafka Connect can be an art form in itself and I wouldn’t expect a developer to be involved unless they were writing a custom connector. Deployment of connectors is more than likely a DevOps like function then it’s a monitoring task. And lastly does a developer need to know broker SSL and how brokers really work? To get an application working, no so much. If anything it should be pointing at certificates in properties.
So, as a developer do I need to know everything about Kafka?