The synchronization of replica data on nodes to ensure that the data is
An off-heap structure associated with each SSTable that checks if any data for the
requested row exists in the SSTable before doing any disk I/O.
Two or more Cassandra instances that exchange messages using the gossip
The storage engine process that creates an index and keeps data in order based on the
Columns other than the partition key
compound primary key definition.
The smallest increment of data, which contains a name, a value and a timestamp.
A container for rows, similar to the table in a relational system. Called table
A file to which Cassandra appends changed data for recovery in the event of a hardware failure.
A process that consists primarily of consolidating SSTables
, but also discards tombstones and regenerates the index in the SSTable.
A major compaction merges all SSTables into one. A minor compaction merges from 4 to 32
SSTables for a table.
composite partition key
Stores columns of a row on more than one node using partition keys declared in nested parentheses of the PRIMARY KEY definition of a table.
compound primary key
A primary key consisting of the partition key, which determines on which node data is
stored, and one or more additional columns
that determine clustering.
The synchronization of data on replicas in a cluster. Consistency is categorized as
A setting that defines a successful write or read by the number of cluster replicas
that acknowledge the write or respond to the read request, respectively.
The node that determines which nodes in the ring should get the request based on the
cluster configured snitch.
cross-data center forwarding
A technique for optimizing replication across data centers that sends data from one
data center to a node in another data center, and that node forwards the data to other nodes
in its data center.
A group of related nodes configured together within a cluster for replication and
workload-segregation purposes. Not necessarily a physical data center.
A peer-to-peer communication protocol for exchanging location and state information
Hadoop Distributed File System that stores data on nodes to improve performance. A
necessary component in addition to MapReduce in a Hadoop distribution.
An operation that can occur multiple times without changing the result, such as
Cassandra performing the same update multiple times without affecting the outcome.
A native Cassandra capability for finding a column in the database that does not
involve using the primary key.
A subset of the partition index
. By default, 1
partition key out of every 128 is sampled.
A namespace container that defines how data is replicated on nodes.
Hadoop's parallel processing engine that can process large data sets relatively quickly. A necessary component in addition to MapReduce in a Hadoop distribution.
A Cassandra table-specific, in-memory data structure that resembles a write-back
1) An upsert
. 2) A Thrift base class that has
abstract methods for reading and writing data input and output.
A process that makes all data on a replica consistent.
Distributes the data across the cluster. The types of partitioners are Murmur3Partitioner (default), RandomPartitioner, and OrderPreservingPartitioner.
The first column declared in the PRIMARY KEY definition, or in the case of a compound
key, multiple columns can declare those columns that form the primary key.
The limits of the partition that differ depending on the configured partitioner.
Murmur3Partitioner (default) range is -2⁶³ to +2⁶³ and
RandomPartitioner range is 0 to 2¹²⁷-1.
A list of primary keys and the start position of data.
The partition key. One or more columns that uniquely identify a row in a table
A process that updates Cassandra replicas with the most recent version of frequently-read data.
replica placement strategy
A specification that determines the replicas for each row of data.
A procedure that is performed during upgrading nodes in a cluster for zero downtime. Nodes are upgraded and restarted one at a time, while other nodes continue to operate online.
1) Columns that have the same primary key. 2) A collection of cells per combination
of columns in the storage engine.
A Thrift API term for a set of columns from a single row, described either by name or as a contiguous run of columns from a starting point.
The mapping from the IP addresses of nodes to physical and virtual locations, such as racks and data centers. There are several types of snitches. The type of snitch affects the request routing mechanism.
A sorted string table (SSTable) is an immutable data file to which Cassandra writes
memtables periodically. SSTables are stored on disk sequentially and maintained for each
When reading data, Cassandra performs read
before returning results.
By default, each installation of Cassandra includes a superuser account named cassandra whose password is also cassandra. A superuser grants initial permissions to access Cassandra data, and subsequently a user may or may not be given the permission to grant/revoke permissions.
A collection of ordered (by name) columns fetched by row. A row consists of columns
and have a primary key. The first part of the key is a column name. Subsequent parts of a
compound key are other column names that define the order of columns in the table.
An element on the ring that depends on the partitioner. A token determines the node's position on the ring and the portion of data it is responsible for. The range for the Murmur3Partitioner (default) is -2⁶³ to +2⁶³. The range for the RandomPartitioner is 0 to 2¹²⁷-1.
A marker in a row that indicates a column was deleted. During compaction, marked column are deleted.
Time-to-live. An optional expiration date for values inserted into a column. Also see
Expiring columns in Removing data
When reading data, Cassandra performs read
after returning results.
A data partition, which CQL transposes into familiar row-based resultsets.
A change in the database that updates a specified column in a row if the column exists or inserts the column if it does not exist.