So, you’re thinking about taking on a NoSQL project—that’s great! Or perhaps you’re ready to take one on now—that’s even better! Naturally, the next question is, "How hard is this going to be?" The answer: not nearly as hard as you may think. There will be some new things to learn, but, by and large, the similarities to relational (RDBMS) databases and SQL outweigh the differences with Apache CassandraTM.
Here are some important similarities:
- Cassandra’s data model is tabular, it still includes rows and columns.
- All the data in a column represents the same thing (height, amount, customer ID, etc.) and has the same data type.
- Each row represents another record of data, which is the same as RDBMS tables.
- One subtle difference is that Cassandra uses the term “keyspace” where relational models use “schema” or “database.”
- Cassandra tables are strongly typed and defined before inserting data. The tables have schema and the data is validated when inserted into the tables. This ensures data consistency with the other data in the table.
- Cassandra’s query language (CQL)is very similar to SQL. For example, to select a customer’s name and email address from our customers table, the format is quite familiar:
SELECT first_name, last_name, email FROM customers. While CQL looks like SQL, not all features of SQL are present in Cassandra.
So, the main features in Cassandra are not that different from RDBMS, but what else do you need to know?
- Data modeling in Cassandra makes judicious use of denormalization to achieve the massive scale-out benefits of the Cassandra architecture. Instead of complex third-normal-form data modeling, query patterns are optimized to retrieve all necessary information from a single table, in a similar vein as data modeling in data warehousing. This may involve storing the denormalized data in multiple ways.
- Data modeling tends to start with the queries. Knowing what the access patterns will be helps optimize the data layout to efficiently support those access patterns. This is not unique to Cassandra, but is one of the main principles employed when designing the data model.
- Cassandra is a shared-nothing database, and as such, it has its own driver to connect to the database, submit queries, and return results. ODBC and JDBC do not match well with the shared-nothing nature of Cassandra, so DataStax built a set of drivers that leverages all the benefits of this shared-nothing environment. This allows for extreme fault tolerance, as well as multi-data center support and straightforward linear scalability. While the driver is not ODBC/JDBC, there are drivers available in several of the major programming languages (Java, Python, Node.js, etc.) which have been designed for easy adoption by those developers.
- The high-concurrency / low-latency nature of Cassandra lends itself to modern application methodologies such as event sourcing (CQRS, etc.) and microservices. Cassandra is the backbone of many cloud-native designs and implementations.
The move to NoSQL is not as daunting as it might seem. Many of the concepts are similar or analogous, and the differences are relatively straightforward. Moreover, you’re not in it alone. There are tons of resources available including DataStax Academy, DataStax documentation, online presentations, blogs, and webinars. Plus, don’t forget about the incredible Cassandra and DataStax communities.
Time To Modernize Your Database: Making The Move To NoSQL Is Easier Than You Think
Thu, January 16, 2020 • 2:00 PM EST