Apache Cassandra 1.2 Documentation

About hinted handoff writes

The Cassandra 1.2 documentation is transitioning to a new format!
Please use the new Cassandra 1.2 documentation instead.
Back to Table of Contents
All Documents List     

Hinted handoff is an optional feature of Cassandra. Hinted handoff has two uses:

  • To reduce the time to restore a failed node to consistency after the failed node returns to the cluster
  • To ensure absolute write availability for applications that cannot tolerate a failed write, but can tolerate inconsistent reads

When a write is made, Cassandra attempts to write to all replicas for the affected row key. If Cassandra knows a replica is down at the time the write occurs, a corresponding live replica stores a hint. The hint consists of:

  • The location of the replica that is down
  • The row key that requires a replay
  • The actual data being written

If all replicas storing the affected row key are down, a write can succeed if write operation uses a write consistency level of ANY. Under this case, the hint and written data are stored on the coordinator node, but are not available to reads until the hint is written to the actual replicas that own the row. The ANY consistency level provides absolute write availability at the cost of consistency; there is no guarantee as to when write data is available to reads because availability depends on how long the replicas are down. The coordinator node stores hints for dead replicas regardless of consistency level unless hinted handoff is disabled. A TimedOutException is reported if the coordinator node cannot replay to the replica. In Cassandra, a timeout is not a failure for writes.

Note

By default, hints are only saved for three hours after a replica fails because if the replica is down longer than that, it is likely permanently dead. In this case, you should run a repair to re-replicate the data before the failure occurred. You can configure this time using the max_hint_window_in_ms property in the cassandra.yaml file.

Hint creation does not count towards any consistency level besides ANY. For example, if no replicas respond to a write at a consistency level of ONE, hints are created for each replica but the request is reported to the client as timed out. However, since hints are replayed at the earliest opportunity, a timeout here represents a write-in-progress, rather than failure. The only time Cassandra will fail a write entirely is when too few replicas are alive when the coordinator receives the request. For a complete explanation of how Cassandra deals with replica failure, see When a timeout is not a failure: how Cassandra delivers high availability.

When a replica that is storing hints detects via gossip that the failed node is alive again, it will begin streaming the missed writes to catch up the out-of-date replica.

Note

Hinted handoff does not completely replace the need for regular node repair operations. In addition to the time set by max_hint_window_in_ms, the coordinator node storing hints could fail before replay. You should always run a full repair after losing a node or disk.