TechnologyOctober 31, 2019

Five Steps to an Awesome Data Model

Excerpt of Data Modeling in Apache Cassandra™: Five Steps to an Awesome Data Model
Robin Schumacher
Robin Schumacher
Five Steps to an Awesome Data Model

This is an excerpt from the DataStax whitepaper "Data Modeling in Apache Cassandra™;" which delves into how to choose the right data model for your Apache Cassandra™ application in 5 easy steps. Click here to download the full whitepaper.

Step 1: Build the application workflow

When building applications using relational databases, developers often start with the data model, thinking about the data items that need to be stored and how they relate to one another. With Cassandra, just the opposite is recommended. The best practice is to start with the application workflow; an approach referred to as “query-first design.”

Before thinking about how data will be stored, designers need to know what types of queries the database will need to support. Figure 3 presents a simplified application workflow for

Simplified Application Workflow

Figure 3 – Simplified application workflow

The sequence of workflow steps matters because it helps us determine what data is available and required for each query. For example, before we can show basic information about a user (step 2 above), a userid is required. The user first needs to log in to the site (step 1) supplying an email address and password in exchange for the required userid. A userid might also be obtained by searching for a video (steps 6 or 7), showing comments for a video (step 9), and looking up details about the user that commented. Similarly, before the application can display details about a video (step 8) the application needs a videoid obtained by selecting from a list of the latest videos (step 7) or by searching videos by tag (step 6).

Step 2: Model the queries required by the application

Even at the design stage, developers can think through the sequence of tasks required, mock up what each screen will look like, and decide what data will be required at each stage.
Figure 4 shows a simplified entity relationship diagram (ERD) for the KillrVideo application. The application needs to be able to keep track of entities such as users, videos, and comments. Users can perform activities such as adding videos, rating videos, and posting comments. Users can comment on multiple videos, and each video can have multiple user comments associated, but there is only one owner of each video.

KillrVideo: Entity Relationship Diagram (ERD) for KillrVideo

Figure 4 – KillrVideo: Entity relationship diagram (ERD) for KillrVideo

It’s a good idea to iterate between the application workflow and ERD, updating both as new data items and relationships required by the application are identified. Once developers have a clear idea of the application workflow and the key data objects required, it’s possible to start identifying the queries that the application needs to support. A diagram showing key queries and how they relate to data domains is shown in Figure 5.

 Identify the Queries Required to Support the Application Workflow

Figure 5 – Identify the queries required to support the application workflow

Thanks for reading this excerpt from the DataStax whitepaper "Data Modeling in Apache Cassandra™" tune in next week when we release another excerpt or click here to download the full asset.


One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.