Copyright Protection at Scale: Pex's Journey with DataStax Astra Streaming

Pex is a digital rights technology company specializing in enabling fair and transparent use of copyrighted works across the internet. With their industry-leading content identification technology, Pex helps copyright owners find and track their works in user-generated content on platforms such as YouTube, Facebook, Instagram, Twitter, and TikTok. The company has been working on building a scalable and cost-effective system for the past nine years, using data and message queuing systems to process vast amounts of metadata.

Pex

Products & Services

DataStax Astra Streaming

Industry

Technology

Location

Los Angeles, CA
Contact Sales
  • Processes between 50,000 to 100,000 messages per second through Astra Streaming, providing copyright owners the ability to ensure fair and transparent usage of their works across various platforms.
  • Enables the ability to scale up and down based on the workload, allowing Pex to process data without concerns about capacity limitations.
  • Assures a very high level of uptime, as the entire system relies on Astra Streaming for processing.

The Challenges:

Pex faced several challenges in building and operating its system at scale. One of the main challenges was to develop and operate solutions in a cost-effective manner. Leveraging the cloud was a logical choice for Pex, considering the need to optimize costs and experiment with different product development paths. The key to their success was building their system around a persistent message queue that enabled efficient processing of data on compute resources.

According to David Southwell, Director of Infrastructure Engineering at Pex, “Upon understanding the foundations of leveraging ephemeral computing and message queuing systems, we envisioned how far Pex could push the system and how many different types of workloads and products they could develop. The ability to mold these workloads to the paradigm of executing at scale at a low cost opened up numerous possibilities for the company.”

The Solution:

Pex evaluated different messaging queues and streaming systems over the years. They started with RabbitMQ and then switched to Google Pub/Sub, which could scale with their growing workload. When Pex transitioned to Microsoft Azure, they considered native solutions but ultimately chose Apache Pulsar™ due to its unique capability to define functions that run against each processed by the system. They initially evaluated and ran Pulsar on their own but later discovered Kesque, a managed service for Pulsar, which they continued using after DataStax acquired the service and introduced Astra Streaming.

DataStax Astra Streaming is an advanced, fully-managed messaging and event streaming service built on Pulsar. It enables companies to stream real-time data at scale, delivering applications with massive throughput, low latency, and elastic scalability from any cloud, anywhere in the world.

Astra Streaming is a foundational building block of Pex's infrastructure and is critical to its operations. “The uptime and reliability of Astra Streaming are of utmost importance, as any downtime impacts ourdata processing. Pex has found Astra Streaming to be a robust solution, and the accessibility of the DataStax team, their expertise, and their involvement in the open-source community are significant benefits for Pex. Collaborative interactions, particularly with subject matter experts at DataStax, have been highly valued,” Southwell says.

The Results:

Pex's system auto-scales based on the volume of messages processed, which fluctuates depending on the efficiency of their content-finding systems. On average, Pex processes between 50,000 to 100,000 messages per second through Astra Streaming. These messages vary in size, ranging from a few bytes to several megabytes. Pex worked closely with the DataStax team to optimize the client-side library and server-side configurations to handle such a diverse workload. Astra Streaming efficiently scales up and down based on the workload, allowing Pex to process data without concerns about capacity limitations.

Astra Streaming and the DataStax team have helped Pex address its challenges and enable its scalable and cost-effective system to grow. By leveraging the power of Astra Streaming and the cloud, Pex has efficiently processed vast amounts of media and metadata, and provided copyright owners with fair and transparent usage of their works across various platforms.

“Astra Streaming has proven to be a robust and reliable solution, allowing Pex to handle data at scale without concerns about capacity limitations,” says Southwell.

With Astra Streaming as a foundational building block, Pex is well-positioned to continue pushing the boundaries of its system and exploring new possibilities in the digital rights industry.”