I am getting strange behavior when starting up a mixed workload cluster.
6 nodes running vanilla apache cassandra v 0.8.1
3 nodes running brisk cassandra with hadoop trackers turned on.
3 of the vanilla cassandra nodes are in the seed list, 1 of the brisk nodes is in the seed list.
The cassandra.yaml config file has been updated to use the brisksimplesnitch on all nodes.
Starting all 6 vanilla cassandra nodes appears to work correctly. However, when the first brisk node is brought online, strange things start to happen.
nodetool ring shows all nodes in the cluster running.
- if you log into the brisk node with the cassandra-cli 3 of the vanilla cassandra nodes are unreachable.
- if you log into a reachable cassandra node with the cassandra-cli all 7 nodes show up (6 vanilla and 1 brisk)
- if you log into an unreachable cassandra node with the cassandra-cli the brisk node is unreachable
2 of the unreachable vanilla cassandra nodes are seeds, and 1 is a non seed.
There is an info message in the cassandra log on these nodes:
INFO [Thread-18] 2011-07-06 22:37:56,958 IncomingTcpConnection.java (line 110) Received connection from newer protocol version. Ignorning message.
This is not showing up on the reachable nodes.
I have tried bringing up the nodes in different order. When bring up 1 vanilla and 1 brisk I get a different problem. When the brisk node starts it chooses one of the down vanilla cassandra seed nodes as the job tracker. When all vanilla cassandra nodes are up, and the brisk node is started, it chooses the brisk node as the job tracker.
Any ideas on what is causing this behavior?
