Introduction to Kafka @ ALT.NET
On Tuesday night I gave a presentation on Apache Kafka at the Sydney ALT.NET meet up.
What is Kafka?
Kafka is publish-subscribe messaging rethought as a distributed commit log. Originally developed at LinkedIn in 2011 it has been adopted by some big companies since such as Twitter, Netflix and Microsoft to provide high throughput, low latency messaging.
Most common use cases for Kafka include:
- Activity tracking
- Log aggregation
- Stream processing
- Event sourcing
- Commit logs
How do I use it?
There are a bunch of libraries available for a lot of languages. Unfortunately the .NET ones are quite immature. Recently a few of the boffins at Microsoft open-sourced the library they’ve used internally called CSharpClient-for-Kafka. This library is targeted at version 0.8, but should soon support 0.9. It is fast and with some work will be quite reliable.
However, it does not support the new consumer protocol that was released with v0.9. If you want something that does, you’ll have to go for ah-/RDKafka-DotNet. This is a wrapper around the librdkafka c library that’s the most commonly used outside of Java-land. The downside of this is that due to interop it’s not quite as performant as the native .NET TCP communication of the MS library. In my tests I’ve found publishing with CSharpClient-for-Kafka to be ~.1ms per message/batch and consuming ~20ms. RDKafka-DotNet performs these at ~2ms and ~150ms with similar parameters.
Also, if you think that TCP is too hard core, you can give the Confluent REST Proxy or Kafka Pixy a go… These are services that allow you to publish to and subscribe from Kafka via HTTP APIs. Not the most performant option, but definitely easier to debug…
If you’re looking for samples of using these clients you can check out a couple of my repositories –
- kafka-basic – a wrapper for CSharpClient-for-Kafka that provides simple producer/consumer abstractions
- rdkafka-tests – a test console for RDKafka-DotNet
Where do I learn more?
The Apache Kafka site has a lot of great information to get you up and running. Confluent are the commercial entity providing supported distributions and whose founders were originally responsible for the development of Kafka at LinkedIn.
My presentation is available from Tuesday night on slideshare with a few funky animations to help visualise message delivery and rebalances.