TechnologyNovember 22, 2021

Exploring Apache Pulsar Use Cases

Pulsar Use Cases
Exploring Apache Pulsar Use Cases

Apache Pulsar is a cloud-native, distributed messaging and event streaming platform. Developed at Yahoo in 2013 and open-sourced in 2016 with the Apache Software Foundation, it now handles hundreds of billions of events every day. But how is Pulsar being used? What are its use cases and applications?

Well, in the beginning, Yahoo used Pulsar as its centralized messaging platform to connect its key products to data. Those applications included the photo-storage app, Flickr, as well as Yahoo Mail, and Yahoo Finance. Since being open-sourced, many modern organizations find Pulsar indispensable to their data stacks. In fact, its been adopted by hundreds of companies that need to efficiently move data, including Yahoo! Japan, Tencent, Comcast, and Overstock. 

Pulsar is a unified messaging and streaming platform that can handle all your pub-sub, queueing, streaming, and stream processing needs in one place. It has numerous advantages over competing solutions like Apache Kafka, including removing the pain and operational costs of having to deploy, manage, maintain, and integrate several systems with similar functions. 

Let’s explore each of these areas and how Pulsar can help.

Publish-subscribe patterns

Publish-subscribe messaging, or pub-sub, is a software design pattern that is one of Pulsar’s core capabilities. Consumers subscribe to topics of interest. When publishers send messages to topics, all the subscribers receive them. Pub-sub decouples message publishers and message subscribers enabling them to communicate independently, without knowledge of what the other side is up to. 

Unlike other messaging systems, publishers don’t have to wait for subscribers to receive messages. Instead, pub-sub systems include a broker component where publishers send messages. The broker then processes the messages and sorts them into topics. From there, the broker’s configuration will determine whether messages are pushed out to subscribers or if the subscribers will pull the messages down from the broker. Subscribers receive topic messages in the same order they arrived at the broker. Pulsar stores each message and only removes it when a subscriber acknowledges it has been successfully received and processed.

Pulsar takes advantage of pub-sub’s asynchronous communication between publishers and subscribers, leading to many advantages for product development.

  • Modular: By keeping message publishing, processing, and retrieval separate, pub-sub systems allow developers to work on specific parts of the system in isolation. You no longer would need to worry about programming shared knowledge between the components. Component isolation also makes monitoring, testing, and debugging easier.
  • Language agnostic: Pub-sub is programming language agnostic, enabling each part of the system to be built using the most appropriate language, and for components to be integrated more easily.
  • Flexible and elastic: Pub-sub systems are incredibly flexible. Any component can be a publisher, subscriber, or both. And, as long as they can connect to the message broker, they can be located anywhere. New processes can be added, modified, replaced, or removed at any time. Loose coupling also allows the system’s logic to remain the same no matter how many publishers and subscribers are active.
  • Dynamic scalability: Pub-sub messaging systems are designed to handle sudden increases or decreases in usage. They seamlessly adjust to unpredictable changes and can scale to reliably handle millions of messages, with no hard limits on users, all while keeping latency low and throughput high. With scaling concerns off the table, the development team can focus on improvements to the application or service.

Queueing

Unlike pub-sub, queueing uses a point-to-point approach, delivering each topic’s message to only one consumer. Messages not successfully processed are stored in the queue, until they can be delivered. Because of this, messages are delivered on a first-available basis making it impossible to process them in order. Once receipt of the message is acknowledged by the consumer, it is deleted from the queue. This provides a safeguard against a message being received by an unintended consumer.

Queueing is a good choice when you need to be absolutely sure each message is received and processed by one consumer, and the order is not important. One of the most common applications of queueing is e-commerce retailers using it to process payments, transactions, and billing statements.

It’s a complex messaging pattern that Pulsar handles well with its shared subscription, removing another heavy task from the shoulders of the development team. Here are some reasons Pulsar excels at queueing:

  • Persistent storage: A queueing messaging system will simply fall apart without trusted, dependable storage. Pulsar guarantees message delivery with persistent storage. Messages are reliably stored until their delivery is acknowledged. Pulsar uses Apache BookKeeper for both queueing and pub-sub storage. BookKeeper is a distributed, horizontally scalable log storage database. This is where all unacknowledged messages are stored.
  • Throughput: High message throughput is another key requirement of any worthwhile message queueing system. With Pulsar, consumers are set up with a receiver queue to process as many messages at a time as appropriate for their use case. The size of the receiver queue is configurable, so consumers can make adjustments on their path to maximum processing throughput. 
  • Load balancing: Pulsar automatically load balances topic messages across consumers. This can also be customized.

Streaming

Often used for real-time analytics, streaming is a great choice for situations where you need to continuously analyze data about a sequence of events, and the time order they occurred matters. For example, it could be used with IoT sensors to track things like temperature, humidity, or pressure levels. In those cases, all data points would need to be retrieved, aggregated, and processed by the same consumer to correctly calculate changes or moving averages.

For these reasons, along with IoT sensor analysis, streaming is often used to gain behavior insights, detect fraud, track financial trades, and evaluate website engagement. The insights provided by streaming are often used for predictions and to make data-driven decisions. With these types of high-profile applications, it’s safe to say streaming is used by just about every modern enterprise.

Streaming functionality is built-into Pulsar. Its exclusive subscription tackles streaming use cases by ensuring all events are sent to a particular consumer in the exact order they occurred.

Stream Processing

Stream data is continuously pouring in. To turn that raw material into something valuable, such as the insights necessary to make key decisions, it needs to be ingested, processed, and transformed. And it all has to happen quickly. Otherwise, you’ll likely miss opportunities you’ll never get back.

Stream data often has the most value right when it arrives—just think about the examples given in the previous section. Stream processing allows you to make computations and update results on this data in motion, right when it’s produced or received. You won’t be slowed down by a need to first store it in a database before it can be queried. Multiple data streams can be processed simultaneously, and the computations from one stream can be fed into other data streams for further processing.

Most meaningful stream processing is stateful. Contextual information about the state of previous events needs to be stored and used for future computations. For example, sensor measurements from a previous period would be required to calculate the change with the latest reading. This makes streaming throughput slower than messaging because, in the case of the latter, information from previous messages is not needed for processing.

For meaningful results and insights, developers working on real-time analytics or event-driven applications need a way to transform incredibly large volumes of data and they need processed data that is always up to date. Stream processing fits that bill.

Stream Processing with Pulsar Functions 

Pulsar has a reliable, easy-to-use stream processor built right in. Called Pulsar Functions, it can handle the vast majority of use cases, including event-based services, filtering, real-time aggregation, routing, enrichment, and simple ETL (extract, transform, and load) operations. Once a Pulsar function is written, its logic can be applied to messages from topics as they are consumed. Then, if needed, a simple API can be used to apply the computation results to other topics.

In most cases, Pulsar Functions relieves the operational burden of setting up and deploying a separate stream processing engine (SPE) like Apache Heron or Apache Storm. Having everything in one place is less costly and makes monitoring, troubleshooting, and maintenance easier. Of course, an external SPE could always be integrated into Pulsar’s infrastructure if needed for more complex use cases.

Pulsar Functions make it easy for developers to quickly be productive. Popular languages like Java, Python, Scala, JavaScript, and Go are supported. So, you won’t need to learn new APIs to write processing functions and deploy them to Pulsar clusters. Pulsar also helps developers by including state management. Previous computational results are saved and the functions you write can access this state information for use.

Pulsar: One solution for your messaging and streaming needs

Pulsar consolidates messaging and streaming and excels at both. While it would be a leading contender for any pub-sub, queueing, or streaming need, its appeal is magnified if you have use cases that currently require multiple platforms. If that’s the case, it might be time to consider Pulsar and take that operational and administrative weight off your shoulders.

Learn how Astra Streaming can help you harness Apache Pulsar for digital experiences, edge computing and IoT, operational ML, and real-time analytics.

Share

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.