The feedback from Saturday’s blog post Kafka Is A Team Sport was very positive. I still felt I hadn’t quite got my point across though, well not clearly enough. Then it hit me.
The Three Functions
I see the current ecosystem as three distinct parts. Development, DevOps Engineering and Data Engineering. They all have different functions and, in my eyes, operate differently.
Excuse the Apple Pencil, they don’t allow sharp objects in here….. and it was early and I’ve only had one cup of tea.
The application development area. The Client APIs whether that’s Java, Go, Python, Clojure, PHP, C++ or anything else I can think of. The main thing is that there is a development task that needs writing. It might be a producer, a consumer, a streaming job or even a KSQL query.
I honestly don’t need to care how Kafka works at this point, I just want to get a message or read a message. And I can’t program in any of those languages I have the REST proxy to help me too. HTTP still has it’s place in the world, just be careful with the security.
Jobs that require setting up for things like Kafka Connect and Replicator/Mirror Maker are more tooling jobs. When I say tooling I really mean “sorting out configuration” and sending it over REST (especially with Connect). Do I need to know how to program? No, not really.
Brokers, latency, capacity planning and (oh no!) Zookeeper, are admin functions, or DevOps if you will. I’d never ask a software developer get into the minute detail of the server.properties file or look at volume throughput. In the same way I’d never ask them to tune a PostgreSQL database (in fact I wouldn’t touch that either, I’ve no idea).
One of the nice things about being an early adopter is watching things grow. Another is observing how that adoption happens from organisation to organisation. The questions I get tend to be varied and in the oddest of locations, doing an impromptu Q&A at London City Airport is still a highlight of data related daftness.
The last few weeks have been all about Kafka, well it is my job, and it’s also my sideline hobby. Giving talks is always fun for me. Over the last couple of weeks I’ve presented at a couple of meetups and also done a full podcast interview with Tim Berglund for the Confluent Streaming Audio podcast. It finished off my presentations for 2020 perfectly.
Something that came out of all three events though was the amount of knowledge someone needs about the ecosystem.
Developer, Devops, Support or Something Else?
The question is how do we come to Kafka in the first place? For me it was, “Can you look after this for me please?” at work a few years ago. Did I know how it worked, I knew some concepts of streaming from using ActiveMQ and RabbitMQ but how Kafka actually worked, no not really.
What I was being asked to do was make sure that the cluster didn’t die and cause issues for the customer. That’s different, that’s a support function.
The opportunity though was there to learn about how it works and that’s where I spent me time. So I spend time coding up basic producers and consumers to see how it all worked. At that time the ecosystem wasn’t really there, it was basic. Producers wrote to the broker(s) and consumers read from them. That was pretty much it.
Does a developer need to know about producer/consumer throughput? No, not really. There’s a huge difference between needing to know and wanting to know. Most developers I know want to get the job done.
It’s All About The Organisation
Startups tend to be scrappy, they build stuff and download frameworks and just get on with things. So there’s a good chance that the knowledge is combined in to a small pool of people.
When it comes to more “traditional” businesses and the employee size goes up to hundreds or thousands, then that’s where things become interesting. Silos begin to take hold (and this is no bad thing, it all depends on the organisation) and the knowledge share is limited. Many times there’s a dividing wall between a developer and the framework in question.
A developer may be asked to write an application to “write messages to Kafka” but after that there’s no real need to know how Kafka works. Take some sample, alter it to your means and off it goes for testing, QA and deployment.
Does a developer need to know KSQL internals? No. Kafka Connect’s partition/thread balancing act? No, not really. In my eyes the developer doesn’t explicitlyneed to know the framework just merely know the accepted requirements for the application.
If I were a betting man and I asked a developer if they knew about the ProducerInterceptor class, I’m wagering I’d get blank looks. I personally think it’s pretty important but that’s merely my opinion.
Who Needs to Know What. Really?
Kafka becomes a team sport. Developers write applications so I’d expect them to know how the Client APIs work and how to subscribe and send messages to the Rest PROXY (if it’s active).
A working knowledge of Schema Registry, possible not. Some clusters quite merrily run on raw strings and JSON payloads. If you want to take things seriously and use Avro then you start getting to schemas.
Kafka Connect can be an art form in itself and I wouldn’t expect a developer to be involved unless they were writing a custom connector. Deployment of connectors is more than likely a DevOps like function then it’s a monitoring task. And lastly does a developer need to know broker SSL and how brokers really work? To get an application working, no so much. If anything it should be pointing at certificates in properties.
So, as a developer do I need to know everything about Kafka?