Company•December 11, 2018

Introducing the DataStax Apache Kafka® Connector

Chris SplinterProduct Management

UPDATED Dec. 18, 2019: As part of our ongoing support of the Cassandra community, DataStax has madeApache Kafka® Connector freely available for Open Source Cassandra users. Learn more here!

Built by the team that authors the DataStax Drivers for Apache Cassandra®, the DataStax Apache Kafka Connector capitalizes on the best practices of ingesting to DataStax Enterprise (DSE) while delivering enterprise-grade resiliency and security.

Modern architectures are made up of a diverse landscape of technologies, each serving its purpose within the data ecosystem. Apache Kafka fits naturally as a distributed queue for event-driven architectures, serving as a buffer layer to transport the messages to the database and surrounding technologies.

There is no better solution in the market to complement Apache Kafka than DSE. As an operational data layer and hybrid cloud database, DSE delivers a multi-model persistent data store that never goes down and scales horizontally to deliver real-time access that is needed to serve enriched, personalized applications.

Automatic Ingest from Kafka to DSE

The DataStax Apache Kafka Connector is the bridge that allows data to seamlessly move from Apache Kafka to DSE in event-driven architectures. Known in the Kafka Connect framework as a sink, the key features of this connector are its market-leading performance, flexibility, security, and visibility. All of this is offered with DataStax Enterprise and Apache Cassandra at no additional cost.

As mentioned, the DataStax Apache Kafka Connector is built by the experts that develop and maintain Apache Cassandra’s drivers. Without going into the weeds, the same techniques used in the DataStax Bulk Loader that proved to outperform all other bulk loading solutions for Cassandra are also leveraged in the connector.

Flexibility

The design of this sink considers the varying data structures that are found in Apache Kafka, and the selective mapping functionality in the connector allows the user to specify the Kafka fields that should be written to DSE columns. This allows for a single connector instance to read from multiple Apache Kafka topics and write to many DSE tables, thereby removing the burden of managing several connector instances. Whether the Apache Kafka data is in Avro, JSON, or string format, the DataStax Apache Kafka Connector extends advanced parsing to account for the wide range of data inputs.

Security

One of the core value propositions of DSE is its enterprise-grade security. With built-in SSL, LDAP/Active Directory, and Kerberos integration, DSE contains the tools needed to achieve strict compliance regulations for the connection from client to server. These security features are also included in the DataStax Apache Kafka Connector, ensuring that the connection between the connector and the data store is secure.

Visibility

In regards to visibility and error handling, we know that in complex distributed environments, things are bound to hit points of failure. The engineering team at DataStax took special care to account for these error scenarios and all of the intelligence of the DataStax Drivers is applied in the DataStax Apache Kafka Connector. Additionally, there are metrics included that give the operator visibility into the failure rate and latency indicators as the messages pass from Kafka to DSE.

Available Now

We are excited to release this connector and improve the interoperability of DSE in the data ecosystem for DSE versions 5.0 and above. Stay tuned for coming blogs that will detail advanced usage of this sink, visit our documentation and examples for more information, and download the new connector today to try out in your own environment.

Learn about the DataStax Apache Kafka Connector in this short course.

Details of Connector Functionality Below

FEATURES	DATASTAX	DESCRIPTION
Fully supported by DataStax		DataStax fully supports and provides expert services for the connector
Consume Kafka Primitive data format		Connector accepts Kafka record data that is in primitive type form
Consume Kafka JSON data format		Connector accepts Kafka record data that is valid JSON form
Consume Kafka Avro data format		Connector accepts Kafka record data that is valid Avro form
Pluggable Connect converters		Connector works with StringConverter, JsonConverter, AvroConverter, ByteArrayConverter, and Numeric Converters, as well as custom data converters Note that the producer of the data must use the same Converter as the connector
Provides JMX metrics		Connector exposes JMX metrics for record/failure count and latency recordings
Runs within Connect Worker		Connector is deployed in the Kafka Connect framework
At least once guarantee		Connector stores the offset in Kafka and will pick up where it left off if restarted This minimizes the additional work but there are situations where writes to DSE will be retried if many records are in a single failed batch The connector ensures that no records are missed
Standalone mode support		Connector is deployed in Kafka Connect framework and works in standalone mode (meant for dev/test)
Distributed mode / HA support		Connector is deployed in Kafka Connect framework and works in distributed mode (meant for production)
Flexible Kafka topic => DSE table mapping		Connector extends flexible mapping functionality to control the specific fields that are pulled from Kafka and written to DSE
Single Kafka topic => multiple DSE tables		Connector enables common denormalization patterns for DSE by allowing a single topic to be written to many DSE tables
Connector throttling + parallelism		Connector has built-in throttling to limit the max concurrent requests that can be sent by a single connector instance Parallelism is delivered through the integration with the Kafka Connect distributed framework and asynchronous connector internals
Flexible date/time/timestamp formats		Connector accounts for the case that typically separate teams write to the same Kafka deployment and may use varying formats for date/time fields
Configurable consistency level		Connector allows configuring DSE consistency level on a per topic-table basis

FEATURES	DATASTAX	DESCRIPTION
Row-level TTL		Connector allows configuring DSE row-level TTL on a per topic-table basis
Deletes		Connector allows configuring DSE deletes on a per topic-table basis
Handling of nulls		Connector allows configuring DSE null handling on a per topic-table basis
Error handling		Connector has built-in error handling for various failure scenarios These scenarios include bad mappings and DSE write issues
Offset management		Connector leverages the Kafka Connect framework to manage offsets by storing the offset in Kafka
Connector => DSE SSL		Connector allows configuring connection to DSE with SSL
Connector => DSE username/password		Connector allows configuring connection to DSE with username/password
Connector => DSE LDAP/Active Directory		Connector allows configuring connection to DSE with LDAP/Active Directory
Connector => DSE Kerberos		Connector allows configuring connection to DSE with Kerberos
Configurable DSE write timeout		Connector allows configuring write timeout to DSE
Connector => DSE compression		Connector allows configuring connection to DSE with compression strategies

Discover more

Apache Kafka®

JUMP TO SECTION

More Company

View All

DataStax on Microsoft Azure: The Best Destination for Generative AI Applications

Company • July 16, 2024

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.

Learn More

Get Started for Free

Introducing the DataStax Apache Kafka® Connector

Chris SplinterProduct Management

Automatic Ingest from Kafka to DSE

Flexibility

Security

Visibility

Available Now

Details of Connector Functionality Below

Discover more

Share

Share

Automatic Ingest from Kafka to DSE

Flexibility

Security

Visibility

Available Now

Details of Connector Functionality Below

More Company

DataStax on Microsoft Azure: The Best Destination for Generative AI Applications

An Introduction to David Jones-Gilardi, Developer Relations

Introducing Tejas Kumar, Developer Relations Engineer

An Introduction to Phil Nash, Developer Relations

One-stop Data API for Production GenAI