Subscriptions: Multiple Groups of Consumers on a Pulsar Topic
I’ve been asked a few times about whether Apache Pulsar supports multiple groups of consumers (or consumer groups) on a topic. In other words, if multiple applications can consume messages from a single topic. The answer is definitely yes. In fact, Pulsar is very good at this.
In this post, I will show you how that works. I will be using our managed Apache Pulsar service to demonstrate (which you can try for free). But you can do exactly the same things if you are running your own Apache Pulsar cluster or standalone instance.
The first thing to understand is that Pulsar manages consumers on topics using subscriptions. A subscription keeps track of the messages in the topic that are to be delivered to one or more consumers. When a consumer connects to a topic in Pulsar, it specifies the subscription it is using. The subscription keeps track of the position in the topic from which the consumer is reading. As the consumer reads and acknowledges a message, its read position in the topic is automatically updated.
Multiple Subscriptions on a Topic
A Pulsar topic can support many subscriptions at the same time. This means that multiple consumers (or groups of consumers) can be reading messages from the topic simultaneously. So, you can have multiple applications consuming from a topic, each getting their own copy of the message and each having a different read position.
For example, if a publisher publishes a message to a topic that has three subscriptions, one for each consuming application, then that message will be sent to each subscription. For every message published to that topic, three copies of that message are consumed. This is how Pulsar provides message fan-out to multiple applications.
Multiple Consumers in Subscription
To fan-out messages to multiple applications, you just need to add subscriptions to your Pulsar topic. Within each subscription, there can be one or more consumers. Pulsar supports four different types of subscriptions which vary based on how the messages are sent to the consumers of that subscription.
With an exclusive subscription, one and only one consumer is allowed to use the subscription at any given time. If another consumer attempts to use the subscription, it will be rejected. This is useful if there is a strict need for the messages in the topic to be consumed in order by a specific application.
With a failover subscription, multiple consumers can use the subscription. But only one is allowed to be active or consuming at a time. This type of subscription makes sense if your application runs in an active-standby configuration. If the active fails, the standby is able to pick up where the active left off. Both applications connect to the subscription but only one consumes messages from the topic.
A shared subscription allows multiple consumers to consume from the topic at the same time. The messages are sent round-robin to each connected consumer. This subscription allows you to implement the classic “competing consumers” pattern of a work queue. If your application can be horizontally scaled, you can use the shared subscription to distribute the work to multiple instances of the application.
Example: Multiple Applications Consuming From Topic
Enough of the theoretical talk, let’s see this in action. We are going to publish messages to a topic that are consumed by two different applications. One application needs to process the message in strict order, so it is going to use an exclusive subscription. If the application goes down, it will rely on Pulsar to persist the messages and deliver the messages to it when it recovers.
The second application is consuming from the same topic, but it is horizontally scalable. The order of the messages is not important, but the processing time for each message is long, so multiple copies of the application are needed to keep up with the publishing rate. This application is going to use a shared subscription.
First, we create a Pulsar topic called “subscription-demo”, which looks like this in the Kesque user interface:
As you can see, there are no subscriptions on the topic yet. We will connect a Python client to the topic using an exclusive subscription which will automatically create a subscription. Here is the code we are going to use:
Note that in line 12 we are specifying the consumer_type as Exclusive when subscribing, but this is the default, so it can be omitted if you are using an exclusive subscription.
When that client connects, it looks like this:
The topic now has one subscription with one connected consumer. The subscription type is Exclusive.
Now, we are going to create a shared subscription using the following Python client code. It’s the same as previous code except we changed the consumer_type to Shared on line 12 (and the name of the subscription on line 11):
Since this application is horizontally scalable, we are going to run two copies of the Python code in different terminals. This simulates two independent copies of the application:
Now, the topic has a second subscription of the Shared type with two connected consumers.
At this point, we have a total of three consumers on this topic. One is using an exclusive subscription, so that consumer will receive a copy of each message published to the topic. The other 2 consumers are using a shared subscription, so between the two of them they will receive a copy of each message published to the topic. Pulsar will deliver the messages in a round-robin fashion to the consumers of the shared subscription.
Now let’s publish six messages to the topic using this Python code:
This is what we see on the consumer connected to the exclusive subscription:
As expected, this consumer receives all six published messages.
Now, let’s look at the first consumer on the shared subscription:
This consumer gets all the even-numbered messages. This makes sense since the messages will be evenly distributed between the two connected consumers. As expected, the other consumer on the shared subscription gets all the odd-numbered messages:
To recap, we published six messages to the topic. Those six messages were received by two different applications using different subscriptions. One application is using an exclusive subscription, so it received all six messages. The other application is using a shared subscription with two consumers. It also received six messages, but they were evenly distributed between the two consumers with each consumer getting three messages each.
Now, let’s disconnect all the consumers and publish six messages again. Pulsar will store those messages persistently for when the consumers come back online. This is what that looks like:
As you can see, Pulsar has stored those six messages in the message backlog for each of the subscriptions. We can view the messages in exclusive subscription backlog using the Peek function:
If we were to peek at the shared subscription backlog we would see the same set of messages.
Now, let’s connect the exclusive consumer. That consumer receives its copy of the six messages, but Pulsar holds the other six copies of the messages in the backlog until one or more of the shared consumers connects to the subscription:
As you can see, Pulsar provides a simple way to connect multiple groups of consumers to a topic using subscriptions. With subscriptions you can fan-out messages published to a topic to multiple consuming applications. Because Pulsar supports different types of subscriptions, different applications can connect to a single topic and consume messages in different ways. All of this is handled automatically by the Pulsar broker and client library. There is no need for special code in your application.
Pulsar subscriptions are powerful and give you a great deal of flexibility for consuming from topics. They also support advanced features such as rewind and skip. Look for a future post on these advanced features of Pulsar subscriptions to continue your learning.
(Editor's note: DataStax acquired Kesque in January 2021.)