This blog tries to present a comparison of JMS 1.1 based messaging brokers and Apache Kafka with respect to typical messaging use cases.
Traditional messaging – Quick look
If you have used messaging systems in the past, specifically systems/brokers that comply to the JMS 1.1 specification (ActiveMQ, RabbitMQ etc.), you would recall that they had two broad mechanisms of communicating through messaging – Queues and Topics. Let us try to run through a quick refresher to recall what each one of these mechanisms offer.
- Point-to-Point messaging – Queues provide point-to-point messaging, so a message sent by a producer/sender is typically meant for a specific consumer/receiver
- Only once consumption – Even if there are multiple consumers (consumer instances of same type running in a cluster or consumers of different types), only one of them can read a message, once a message is read, it is taken off the queue. So, multiple consumers/receivers cannot consume the same message.
- Persistence – If there are no consumers to read messages from a queue, the messages remain in the queue until they are read by some consumer. However once read by a consumer, the message is not available to be read by any other consumer.
- Publish-Subscribe – Topics provide pub-sub (or broadcast) kind of messaging.
- Multiple Consumers – Consumers subscribe to a topic and Producers publish to the topic. There can be multiple consumers subscribing to the same topic, in which case, all consumers get to read every message published to a topic.
- No persistence – Only consumers that are currently subscribed to the topic get to read messages published to the topic at a particular time. If a new consumer subscribes to the same topic, it does not get to read old messages, it can just read messages published any further to this topic.
Limitations or Constraints
The traditional messaging brokers supporting these mechanisms have been good at what they intended to do, and have served us well all this while. However, as you may observe, each of the above referenced mechanism has its own unique characteristics and one cannot combine the behavior of two mechanisms, which sometimes surface as limitations for the ever changing use cases.
So for example, Queues let us have guaranteed at-least-once processing of messages by providing the partial persistence behavior. However they do not support multiple consumers. So, if our use case requires our application/system to send messages to multiple applications/systems and based on the message type, and let a particular system/application (or even more than one of those) process specific messages while other systems/applications ignore such messages, with guaranteed at-least once processing by the particular system even if the system is down at the time of sending message, we would not be able to achieve this using traditional messaging systems. Well, one may argue that it can be achieved through message selectors, but it would still be somewhat limited.
Talking about pub-sub messaging, if we would like multiple consumer systems to consume our messages (broadcast semantics) irrespective of the time the messages are sent or published to the messaging system, traditional messaging systems would not fully support this because they would deliver the messages to only currently subscribed consumers. So any new consumers subscribing in future would not be able to read through the messages sent earlier to their subscription.
Similarly, one cannot scale consumers in the pub-sub model, since each subscriber is a unique subscriber, so even if we try to run multiple instances of a particular consumer, all of them would receive the messages, which is obviously not desired (since we are trying to scale a particular consumer).
Furthermore, an important case of reprocessing an already consumed set of messages is straightaway not supported by either of the above messaging mechanisms.
What is Apache Kafka, How does it help
Imagine all of these and many more use cases getting catered to with a highly resilient, highly available, fault tolerant distributed messaging system – that is what describes Kafka, apart from the very high throughput, low latency messaging feature that Kafka is widely known for.
So, lets try to understand Kafka from this usage standpoint.
To state to it in simple terms, Apache Kafka is a distributed pub-sub messaging system, based on the central idea of having persistent Topics. Topics have producers and consumers. There can be multiple producers, and multiple consumers for every topic. All of this sounds very similar to the pub-sub model within traditional messaging systems, which is kind of true, or is it!! Well, here comes the differences.
1. Grouped Consumers – Consumers can be clubbed into groups. Each message gets delivered to only one consumer within a group. You could have multiple groups (of consumers) consume from the same topic.
2. Message Persistence – Messages are persistent and the retention is configurable – Kafka can be configured to retain messages based on total space they occupy, or time, or using another advance strategy called log compaction.
3. Read versus Write time insensitivity – Consumers can read from Topics at any point in time. As long as the messages are not yet deleted (based on persistence/retention policy), even a consumer started just now (think of it as subscribing just now) would be able to read the messages published earlier to this time.
The grouped consumers capability, combined with inherent persistence/retention, lets us leverage best of both worlds, combining the concepts of Queues and Topics from traditional messaging systems. We can have one or more consumers belonging to the same group, read from a topic, in effect creating a traditional Queue like behavior. And having multiple such groups lets us achieve the multiple consumers-to-Queue scenario, with an additional capability of the past/current/future availability/subscription of an consuming application (i.e. a group of consumers).
Likewise, the consumer group feature also lets us leverage a Kafka in a traditional pub-sub model, supporting scaling/clustering of specific consumers as well. Again, there is no special configuration required to do this.
Again, all of the above does not require any specific configuration at a topic or the publisher level. The behavior can be customized at a consumer, by clubbing specific consumers into specific groups, thus detaching message publishing and message retention from message consumption.
Similarly, the configurable message persistence and the read-write time insensitivity lets us repeatedly read the messages from the same Kafka topic, as long as they are not deleted based on their retention settings, thus enabling a lot of crucial reprocessing use-cases.
I hope I have generated some interest towards Apache Kafka as messaging system. I realize I have talked about a lot of features without diving into details, but believe me, I have just scratched the surface, and there is much more to talk about, in terms of details of the above described features, some more features, and architectural details. And as you may have guessed by now, a single blog would be insufficient to talk about all of this. So my intent here was to start with a blog to compare Kafka with traditional messaging systems with respect to use cases we come across while working with messaging systems and hopefully follow this up with a series of blogs to dig deeper into the details of Apache Kafka.
Looking forward to reading your comments, feedback, thoughts, suggestions if any!
Happy learning!! Hope see you again with with another one in this series.