Apache Cassandra™ 2.0

Architecture in brief

An overview of Cassandra's structure.

Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure. Its architecture is based on the understanding that system and hardware failures can and do occur. Cassandra addresses the problem of failures by employing a peer-to-peer distributed system where all nodes are the same and data is distributed among all nodes in the cluster. Each node exchanges information across the cluster every second. A sequentially written commit log on each node captures write activity to ensure data durability. Data is also written to an in-memory structure, called a memtable, which resembles a write-back cache. Once the memory structure is full, the data is written to disk in an SSTable data file. All writes are automatically partitioned and replicated throughout the cluster. Using a process called compaction Cassandra periodically consolidates SSTables, discards tombstones (an indicator that a column was deleted), and regenerates the index in the SSTable.

Cassandra is a row-oriented database. Cassandra's architecture allows any authorized user to connect to any node in any data center and access data using the CQL language. For ease of use, CQL uses a similar syntax to SQL. From the CQL perspective the database consists of tables. Typically, a cluster has one keyspace per application. Developers can access CQL through cqlsh as well as via drivers for application languages.

Client read or write requests can go to any node in the cluster. When a client connects to a node with a request, that node serves as the coordinator for that particular client operation. The coordinator acts as a proxy between the client application and the nodes that own the data being requested. The coordinator determines which nodes in the ring should get the request based on how the cluster is configured. For more information, see Client requests.

Key structures

  • Cluster

    A group of nodes where you store your data. A cluster contains one or more data centers.

  • Data center

    A group of related nodes configured together within a cluster for replication and workload-segregation purposes; it is not necessarily a physical data center. Depending on the replication factor, data can be written to multiple data centers.

  • Commit log

    All data is written first to the commit log for durability. After all its data has been flushed to SSTables, it can be archived, deleted, or recycled.

  • Table

    A collection of ordered columns fetched by row. A row consists of columns and have a primary key. The first part of the key is a column name.

  • SSTable

    A sorted string table (SSTable) is an immutable data file to which Cassandra writes memtables periodically. SSTables are append only and stored on disk sequentially and maintained for each Cassandra table.

Key components for configuring Cassandra

  • Gossip

    A peer-to-peer communication protocol to discover and share location and state information about the other nodes in a Cassandra cluster.

    Gossip information is also persisted locally by each node to use immediately when a node restarts. You may want to purge gossip history on node restart for various reasons, such as when the node's IP addresses has changed.

  • Partitioner

    A partitioner determines how to distribute the data across the nodes in the cluster and which node to place the first copy of data on. Basically, a partitioner is a hash function for computing the token of a partition key. Each row of data is uniquely identified by a partition key and distributed across the cluster by the value of the token.

    You must set the partitioner and assign the node a num_tokens value for each node. The number of tokens you assign depends on the hardware capabilities of the system. If not using virtual nodes (vnodes), use the initial_token setting instead.

  • Replication factor

    The total number of replicas across the cluster. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. You define the replication factor for each data center.

  • Replica placement strategy

    Cassandra stores copies (replicas) of data on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines which nodes to place replicas on. The first replica of data is simply the first copy; it is not unique in any sense. The NetworkTopologyStrategy is highly recommended for most deployments because it is much easier to expand to multiple data centers when required by future expansion.

    When creating a keyspace, you must define the replica placement strategy and the number of replicas you want.

  • Snitch

    A snitch defines groups of machines into data centers and racks (the topology) that the replication strategy uses to place replicas.

    You must configure a snitch when you create a cluster. All snitches use a dynamic snitch layer, which monitors performance and chooses the best replica for reading. It is enabled by default and recommended for use in most deployments. Configure dynamic snitch thresholds for each node in the cassandra.yaml configuration file.

  • The cassandra.yaml configuration file

    The main configuration file for setting the initialization properties for a cluster, caching parameters for tables, properties for tuning and resource utilization, timeout settings, client connections, backups, and security.

  • System keyspace table properties

    You set storage configuration attributes on a per-keyspace or per-table basis programmatically or using a client application, such as CQL.

    By default, a node is configured to store the data it manages in the /var/lib/cassandra directory. In a production cluster deployment, you change the commitlog-directory to a different disk drive from the data_file_directories.

Show/hide