Highly Available and Scalable Distributed Database

Apache Cassandra™ is a distributed database that delivers the high availability, performance, and linear scalability today’s most demanding applications require. It offers operational simplicity and effortless replication across cloud service providers, data centers, and geographies, and it can handle petabytes of information and thousands of concurrent operations per second across hybrid cloud environments.

Apache Cassandra™ Logo

DataStax Commitment to Open Source Cassandra

From its inception, Cassandra has been the premier distributed database on the market, and here at DataStax, we remain committed to continuing that legacy. DataStax offers production-certified Cassandra distributions plus 24x7x365 expert support to ensure all Cassandra users can make the most of this powerful database.

icon

#1 OSS Committer

DataStax has contributed a majority of the open-source Cassandra code commits and we are one of the driving forces behind Apache Cassandra 4.0.

icon

Apache Cassandra Experts

DataStax solutions are developed and updated from the open-source Cassandra project, and the DataStax team has been an integral part of the Cassandra project since its inception.

icon

Open Source Leadership

DataStax provides open-source leadership in other database-related projects (like Apache TinkerPop™) as part of our commitment to open source.

Introducing... DataStax Luna

DataStax Luna provides subscription-based support for open source Cassandra to help ensure you meet required service levels while providing your team with the confidence they need to deploy and run their applications.

  • Get support directly from the distributed database experts who authored a majority of the Cassandra code, drivers, and tools.
  • Choose the right level of support for your budget as well as your technical and business needs then easily scale through the self-service portal.
  • Get started in minutes through a simple process that lets you begin coverage directly through our website.
Dev Laptop DB 1
Coursera Logo

"High availability is extremely important to us and our users, and that was the first thing that caught our eye with Apache Cassandra and DataStax. High availability with reliable performance is a big win for us."

Daniel Chia

Software Engineer, Coursera

DataStax Distribution of Apache Cassandra

Develop and scale your applications with confidence with DataStax Distribution of Apache Cassandra and support from the Cassandra experts.

Video

CNCF, Apache Cassandra, Kubernetes, and Prometheus with Luc Perkins

In this episode Adron Hall speaks with Luc Perkins about his work at the CNCF, Kubernetes, and where projects are heading and what projects they're working on. Adron also speaks with Luc about docs, projects he's been seeing that are really interesting, skeleton code for projects, and lot's of other topics. Highlights! 0:14 Introducing Luc Perkins. 0:58 Talking CNCF, Kubernetes, Containers, and the cloud native paradigm. 1:27 Introduction to the presentation at hand with Prometheus and it's status as a graduated project within the CNCF. 1:50 What has Luc been working on that he's found interesting. 2:16 Developer Advocate, a somewhat nebulas title. Presentations, documentation, conference organizing, graphic design, and sometimes - yes, indeed - contributing code! 3:03 Learn what is the most sought after demand from open source teams for help! 3:50 Skaffolding and related interesting technology discussion and why you might want to check it out too. 5:48 Adron describeds why we need to get back to writing code and stop fiddling with Kubernetes. Then discusses with Luc how Brendan Burns, Kelsey Hightower, and others in the community have said the same and are pushing for solutions. 6:38 Luc weighs in on where and why the developer story is still somewhat lacking, and why. 7:10 More details on Skaffold, the yaml, and the respective service. 9:38 Which of the container technologies does Luc use these days. 10:28 One CNCF project that's interesting, Luc talks about - in the sandbox - is Harbor, a kind of Docker hub. 12:10 What will happen the next couple of years? How can we develop more reliably with questionable connections, disparate and problematic dependency repositories, and the like? For example, what about that NPM issue as of late? Luc and Adron discuss. 13:40 What is the direction the CNCF has been moving to simplify and enhance the local development story? 15:32 Luc describes a secret about the meetup, and more on Kubernetes developer story. 17:55 Stepping into the topic of a complex distributed system (Cassandra) running inside a complex distributed systems (Kubernetes). 19:58 KubeCon is coming up in Seattle in December. A few extra details, and a mention about the Shanghai KubeCon!

Learn More
Video

T-mobile Runs on DataStax for Their Apache Cassandra™ Enterprise Support

Josh Turner, Principal Engineer Distributed Data Systems, SDE at T-Mobile, shares his thoughts on DataStax Accelerate, how working with Apache Cassandra has benefited his organization, and more.

Learn More
Video

5 Steps to an Awesome Apache Cassandra™ Data Model

"As more customers adopt digital channels over traditional brick and mortar stores and call center channels, there is a need to handle high traffic volume fluctuations, and ability to scale dynamically in a cost-efficient way. This is a common challenge faced across the industry. Verizon solved this challenge for its wireless OSS/BSS stack using DSE Cassandra and Spark. DSE enabled Verizon to transform its OSS/BSS platform from a legacy mainframe-based monolithic stack to a highly scalable cloud-enabled stack. This new platform enables better customer experience through high availability and scalability. Verizon was the only major wireless provider that was able to handle the Apple iPhone pre-order volume spike, and was available to take customer orders at 3 a.m. ET when the floodgates were opened. DSE migration addresses the economic scalability and low latency needs of 5G and IoT. In addition to the enhanced customer experience, the DSE migration eliminated 90% of operations cost. Migration of a large-scale application presents unique challenges that are not encountered in green field development. This presentation focuses on the challenges faced in adopting Cassandra for transactional data that is migrated from traditional RDBMS/mainframe. The presentation will cover details on the data model, data ingestion, and data reconciliation techniques that enables digital transformation."

Learn More

Apache Cassandra™ Architecture

The data management needs of the average large organization have changed dramatically over the last ten years, requiring data architects, operators, designers, and developers to rethink the databases they use as their foundation. The proliferation of large-scale, globally distributed data led to the birth of Apache Cassandra™, one of the world’s most powerful and now most popular NoSQL databases. Read this white paper to learn how Cassandra was born, how it’s evolved, how it operates, and what DataStax Distribution of Apache Cassandra™ adds to the equation.

Dev Laptop DB 1

Advanced Database Capabilities

DataStax Enterprise (DSE) goes far beyond Apache Cassandra capabilities with double the horsepower, operational simplicity, and advanced security.

icon

DSE Advanced Performance

DSE includes twice the horsepower of Apache Cassandra, delivering twice the throughput to handle twice the workloads with the same hardware. Plus DataStax Bulk Loader makes loading and unloading data a snap.

icon

DSE NodeSync

A major challenge of Apache Cassandra is operational management. Repairing nodes for synchronization is an intensely manual process that requires the right expertise. DSE NodeSync removes that pain, eliminating 90% of such manual operations. So even novice DBAs and DevOps professionals can run DSE like seasoned professionals.

icon

DSE Advanced Security

Apache Cassandra includes only basic security such as login and password. DSE adds comprehensive, enterprise-grade security, including authentication, authorization, transparent data encryption, JDBC drivers with built-in security, and auditing by user or profile.

Icon
Blog
Cassandra’s Journey — Via the Five Stages of Grief

New technologies usually need to fight their way into the hearts of the people who will end up using them. This fight is often long and hard, and Apache Cassandra didn’t have it any easier than any of the other technological developments of our time. In 2008 the database world was a wild place. Large data infrastructures were testing the limits of relational databases, and companies like Google and Amazon were about to run out of options on how to handle their massive data volumes. At the time I was working at an education company called Hobsons, and was one of those infrastructure engineers trying to get more scale out of my tired old databases. Cassandra caught my eye as something with a great foundation in computer science that also solved many of the issues I was having. But not everyone was as convinced as I was. If you’re not familiar with the  Kübler-Ross model of grieving, also known as The Five Stages of Grief, it describes a way most people end up dealing with loss and change. Looking back, I realize now that the en-masse giving up of relational databases to switch to something more appropriate for the new world of big data—Cassandra— very much followed this same model. Here’s how it happened from my POV in the trenches of data infrastructure. Stage 1: Denial - The individual believe the prognosis is somehow mistaken and clings to a false, preferable reality. In 2008, Apache Cassandra was the closing curtain on a 30-year era of database technology, so denial was an easy and obvious response to it in the early years. Of course, many of the new databases being released weren’t exactly of the highest quality. Coming from a database with years and years of production vetting, it was easy to throw some shade at the newcomers. Cassandra was in that camp. But it could do things relational databases couldn’t, like stay online when physical nodes fail or scale online by just adding more servers. Administrators called it a toy and developers called it a fad — just some kids trying to be cool. Cassandra kept growing, though — and solving real problems. The replication story was unmatched and was catching a lot of attention. There were ways to replicate a relational database, but it was hard and didn’t work well. Data integrity required one primary database with all others being secondary or read-only, and failure modes contributed to a lot of offline pages displayed on web sites. But generally speaking people only want to make the effort to fix things when they absolutely have to, and for now, relational databases weren’t really broken. Stage 2: Anger - The individual recognizes that denial cannot continue and becomes frustrated. Slowly but surely people started to move notable use cases with real production workloads over to Cassandra. There were happy users talking about incredible stories of scale and resiliency! The company names attached to these stories became less cutting edge and more mainstream and it was becoming clear to many that this wasn’t just a fad. It was starting to make a real impact and could be coming to a project meeting soon. I remember one of my first consulting gigs at a big-name company. I was working with the development team on some data models and in the back of the room was a group of engineers, arms crossed, not looking happy. When I talked to them, they made it quite clear that this change was not welcome, and that “This is going to ruin the company.” They were the Oracle database administrators and they saw this at best as a bad idea and at worst as a threat to their livelihood. In the ensuing months I experienced similar tense moments with other groups of engineers. Stage 3: Bargaining - The individual tries to postpone the inevitable and searches for an alternate route. Despite roadblocks and delay tactics, the needs of businesses everywhere dictated a move to high-scaling technologies like Apache Cassandra. It was solving real problems in a way no other database could and no matter how much “tuning” you could do on your other solutions. This led to situations where teams started negotiating the terms of a Cassandra roll-out. One team I worked with wasn’t allowed to put Cassandra in any critical path close to customers. Ironically, when the systems in the critical path started failing, the only system that could withstand the conditions that led to their failure was the much-maligned Cassandra cluster. Then, a new breed of database appeared that tried to capitalize on the fear of non-relational databases. It was called NewSQL and promised full ACID transactions along with Cassandra-like resiliency, but NewSQL never quite worked out when real-world failures presented themselves. That’s how infrastructure goes: It burns half-baked ideas to the ground and calls in a welcoming party for the good ideas. Stage 4: Depression -  "I'm so sad, why bother with anything?" Cassandra started gaining traction in every corner of the tech world. As the solutions implemented to avoid this inevitability failed, fighting the future became less and less appealing. There was a massive growth period when the early adopters became late adopters and they were talking. The relational database holdouts finally just stopped talking about it and did something else. Many decided to move to data warehousing where they could put their amazing SQL skills to use via complex queries. Stage 5: Acceptance - The individual embraces the inevitable future. And then, there was a moment, and nobody knows exactly when it was, that Cassandra became a mainstream database. It might have been when everywhere you looked there was yet another great use case being talked about. As the saying went, anyone doing something at scale on the Internet was probably using Cassandra. For me, the moment I realized Cassandra had finally been accepted was when I saw large numbers of database administrators signing up for training on DataStax Academy. It was like a big shift had occurred in the day-in, day-out world of databases. Application developers were always pushing the cutting edge, but administrators had to keep those applications running until they were replaced, and their new foundation of choice was Cassandra. When you think about it, you really see the same reaction to every new paradigm-shifting technology. The early days of the computer, the Internet, and now blockchain all faced the same fear and doubt as the early days of Cassandra. Collectively—we deny the truth, rage at inevitability, scramble for an alternative, fall into despair, and finally accept and embrace our new reality. What comes after Cassandra is anyone’s guess, but as with people, usually the best kind of change comes little by little and goes almost completely unnoticed until it’s staring you in the face, and you say, “Wow — you’ve changed!” Here’s to the Cassandra of the past, the present, and the future.

Get the Blog
Icon
Blog
The Four Main Challenges with Apache Cassandra™

Enterprises are increasingly flocking to open source technology because of its accessibility, theoretical cost-effectiveness, and ability to attract top talent. According to the 2018 Open Source Program Management Survey, 53% of companies say their organization has an open source software program or plan to establish one within the next year, and according to the 2016 Global Developer Report, 98% of developers use open source tools—even when they’re not supposed to. Here at DataStax we’re HUGE Apache Cassandra fans! We based our technology on Cassandra for good reason: it’s fast, flexible, and foundational. Enterprises can form their data management strategies on it and be confident they’ll be able to scale with their growth. That said, as with other open source tools, Cassandra does present certain challenges at the enterprise level. While these challenges are easily overcome with the right strategy and resources, we think it’s worth exploring exactly what these challenges are, the hidden costs associated with them, and why most enterprises end up needing a little extra help to tap into the full potential of Cassandra. 1. Rising maintenance costs Open source solutions are becoming more and more popular in the enterprise because they’re easier to adopt and they eliminate licensing fees. They eliminate the need for extensive contract negotiations, which can be stressful and time-consuming. However, while open source tools may be free to deploy, they do come with hidden ongoing maintenance costs that can have a significant impact on total cost of ownership (TCO) beyond the cost of acquiring the software. When companies move to open source they end up either investing in internal talent to develop and maintain the technology or depending on a network of third-party developers, especially the open source community. Contributions are voluntary and are made when a contributor has the time and not necessarily when an organization has a need. Still, companies that use open source depend on these contributions for things like maintenance, bug fixes, and new features. These dependencies introduce a lot of risk into the equation, making it more difficult for enterprises to meet service-level agreements as well as bringing the potential of downtime and the costs associated with lost business.   2. Security, compliance, and governance risk HIPAA, Sarbanes-Oxley, GDPR—oh my. Different industries in different countries are forced to comply with different regulations. One of the main reasons open source projects fail or run into issues is because of security compliance. It’s often difficult for organizations to implement global security standards to ensure compliance, particularly in hybrid cloud environments. This makes the complete adoption and use of open source software that much more challenging. Failure to comply with these regulations exposes organizations in regulated industries to significant financial and reputational risk. While Cassandra does offer some built-in security features out of the box—like role-based authentication and authorization—these features, by themselves, can’t guarantee security for organizations that operate in heavily regulated industries.   3. Ad hoc support from multiple sources Because Cassandra’s free, it’s easy to adopt. This ease of implementation, however, comes with its own challenges. Individual teams usually end up implementing the database on an ad hoc basis. As the deployment scales and multiplies across the organization, the need for support services increases. In many cases, organizations end up with a patchwork quilt of support and services from a variety of different sources: some in-house resources, the open source community, and third-party agencies. All of these come with varying levels of Cassandra expertise and response time. It’s not the most efficient, cost-effective, or reliable approach, to say the least.   4. Limited Apache Cassandra expertise Cassandra boasts a robust community that offers a rich set of collective knowledge. But much of that knowledge isn’t organized in an intuitive way. Implementing and configuring Cassandra requires a significant learning curve. Most companies find out that it’s very difficult and costly to hire in-house expertise because there’s a limited supply of talent. Employees usually end up educating themselves on Cassandra, using a combination of open source documentation, help from the community, and trial and error. This slows down adoption and puts an enormous administrative burden on IT. While open source software can help organizations achieve their goals, it is not without its drawbacks. Hidden costs, security risks, a patchwork network of support services, and a lack of expertise are all reasons why organizations struggle with open source adoption. The good news is that, with the right partner, you can unlock the full power of Cassandra without any of the downsides. That’s the ticket to helping your organization realize its full potential.   eBook: The 5 Main Benefits of Apache Cassandra™ READ NOW

Get the Blog
Icon
Webinar
Speed Dating with Apache Cassandra™

Microservices, security and compliance, multi-tenant data centers, cluster sizing … there’s a lot to consider when thinking about your data platform! Join us for an online meetup featuring experts from DataStax and our partner, software consulting firm Expero, to get bite-sized lightning talks covering these topics and more. We’ve curated a list of the most critical topics into this speed dating format to help you unlock the potential in your organization by maximizing the effectiveness of your data platform.

Get the Webinar