Here you go:
The log had too much to paste into pastebin, so I started a new log. What I did was restart opscenterd, start the agents, and also tried to add the cluster through the opscenter interface.
Thanks for your help!
In an effort to consolidate free help offered for our products we have decided to move these forums to a more widely used forum. Please use one of the following queries (or any combination):
Just to give you some insight as to what's happening here: when you submit the form to add the cluster, it verifies that it is able to connect via thrift to all of the nodes specified. This is succeeding, or else we would not see the "Adding new cluster" entry in the log. The "Starting services for cluster prod" entry is where everything starts to initialize itself for that cluster in OpsCenter, most of which requires waiting for another successful thrift connection. This is what's failing (the "No candidate nodes to expand pool" entries).
Both thrift connection mechanisms share most of the same code, with the exception that the first verification does a single thrift connection to each node specified, with no attempt to retry on failure, whereas the second connection is actually a pool that will attempt to connect to as many of the nodes as it needs to, and will retry until successful.
I'm sort of at a loss for why the first connection would succeed, but the second would not. Can you try only specifying a single IP in the Add Cluster form, and see if you get different results?
I tried putting in one hostname or one IP address, the result is the same.
The agent output is showing the same "Sleeping for ??s before trying to determine IP over JMX again"
The opscenterd log is showing the same "Unable to find a matching cluster......" message as well as the "No candidate nodes to expand pool ..... " message.
Those 2 log messages are unrelated to the core problem here, which is issues with the thrift connection from opscenterd to Cassandra. Go ahead and turn off all of the agents while debugging this to avoid red herrings in the logs.
Can you verify that you can successfully connect to one of the Cassandra nodes using cassandra-cli from the machine opscenterd is running out? That will rule out any networking/connection issues (it's possible that the first succeeding thrift connection is a false positive).
Do you have any non-standard connection settings set (eg, authentication, ssl, etc)?
Also, enabled debug logging on opscenterd by commenting out the following line in opscenterd.conf: "#level = INFO", and changing it to "level = DEBUG". Then restart opscenterd and try to add the cluster again.
1) connecting from the opscenterd node to the cassandra node through cassandra-cli was successful through port 9160.
2) there's no non-standard connection settings set.
3) Here's the output resulting from the DEBUG setting.
The interface got the same "Error creating cluster:Call to /cluster-configs timed out." error.
opscenterd log recorded these entries: http://pastebin.com/241KrbDn
It looks like you have thrift authentication enabled on your cluster, so you'll need to specify the correct thrift username and password on the Add Cluster form. From your cassandra-cli invocation, it looks like the username should be "cassandra", and the password should be whatever you've set it to.
It looks like the main issue here is that our initial connection check does not require authentication, so it lets you add the cluster successfully, but the later connections do not succeed.
It is working now. Thanks very much for your help! It wasn't obvious that I had to put in the username/password... :)