A recent history of gossip
Over the past few months, Cassandra’s gossiper has received changes that both improve operational workflows and increase scalability, many around vnodes. I’ll cover some of these shortly, but first I want to be clear about what is meant by ‘gossip,’ since I see this misused or misunderstood often. Gossip is not, as you may think, how Cassandra communicates between nodes, for things like reads and writes. That is what is called the ‘storage protocol’ and is a superset of gossip. The gossiper is responsible for making sure every node in the system eventually knows important information about every other node’s state, including those that are unreachable or not yet in the cluster when any given state change occurs. An example of this might be when a node begins to bootstrap, it will broadcast this state via gossip to some nodes, which in turn gossip it to others until they are all aware that a node is joining. Gossip exchanges are also how the failure detector works and determines when it knows a node is down.
The first, and perhaps most important change, was the reworking of node replacement. The details are in the link, but the short story is that it was possible to get into a situation where you couldn’t replace the node (and instead need to remove it and add a new one) and had to specify the node by its UUID, making it slightly difficult for external automated systems to accomplish. Now, you can specify the node’s IP address or hostname as the replacement parameter, and Cassandra can do the rest. To enable this, a new gossip mechanism called ‘shadow gossip’ was created. In the past, there was no way to obtain gossip information without also modifying it; that is, to get gossip information about node Y, node X would have to insert some state for X into gossip first so that other nodes would gossip information back to it. With shadow gossip, this is no longer necessary and allows viewing complete gossip information without modifying or augmenting it. This is particularly useful for replacing a node since the old state can be copied and reused by the new node replacing the old one.
Shadow gossip also enabled further checks to prevent operator errors. Cassandra has long since required the cluster name to match when adding nodes, to prevent accidentally ‘merging’ two clusters together. Cassandra 2.0 takes this a step further and uses shadow gossip to see if the node is already known to the cluster when it attempts to join and bail out if it does, so it won’t just clobber the old state and possibly violate consistency. Before this, you could have a dead node and mistakenly bootstrap a new node over it with the same ip but different tokens, causing a silent shift in ring ownership.
We also identified some performance problems when a large number of nodes were present in the cluster, especially with vnodes. We addressed these, both at the micro and macro levels. Optimizations were made in failure detector calculation, gossiper CPU and memory usage, as well as logical improvements to both the failure detector and gossiper when starting a large cluster all at once. Finally, host id conflicts are handled deterministically, such that just like token conflicts, there is always a winner.
If you find yourself interested about how gossip works and wishing to know more, a good place to get an overview and code pointers is the wiki page about it.