We are experiencing an issue where we are losing Cassandra connections intermittently. Here is what happens in the logs:
(specific host footprint information removed from log file)
A request comes in and we get this ERROR entry ...
ERROR [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush transport (to be expected if the pool is shutting down) in close for client: XXXX
org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
At the end of this exception chain, the pool is marked down ...
ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS DOWN TRIGGERED for host XXXX
ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{XXXX}; IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49
ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown triggered on <ConcurrentCassandraClientPoolByHost>:{XXXX}
ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown complete on <ConcurrentCassandraClientPoolByHost>:{XXXX}
INFO [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host detected as down was added to retry queue: XXXX
WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Could not fullfill request on this host CassandraClient<XXXX>
... and it appears as if the connection has been lost.
Eventually a new incoming request will reset the connection and the connection pool will be reestablished, but that initial incoming request fails.
Subsequent requests work ok.
The connections are reestablished ...
INFO [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Downed Host retry status true with host: XXXX
INFO [me.prettyprint.cassandra.connection.HConnectionManager] - Added host XXXX to pool
INFO [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Downed Host retry status true with host: XXXX
INFO [me.prettyprint.cassandra.connection.HConnectionManager] - Added host XXXX to pool
Everything works ok for awhile again.
