Hi folks,
Im trying to understand how to create a backup plan, for some emergency scenarios
a) opscenter agent goes down: Restart the agent after killing any open processes still running if any
b) opscenter server goes down: This is really tricky. I have opscenter installed on 2 machines, and I can connect to the same cluster.
Only one has the agent connected to it. Some metrics do display and refresh in both opscenters - is this because the graphs are reading & writing to some opscenter keyspace column families & can be viewed because they are both connected to the same cluster? But I cant do any operation. This is fine as the documentation says that one cluster can be monitored by only one opscenter at a time.
I tried modifying the address.yaml file of the agent to a new opscenter server ip address to see if it will dynamically switch to a new opscenter - but it doesnt, looks like its read only on startup.
If I bring one opscenter server down, I cannot get the agent to connect to another opscenter server, even if i do bin/setup <new opscenter ip address> & restart the agent. I thought if I do these 2 steps it should be able to switch to a new opscenter server, but it gives an exception like this in the logs :-
ERROR [StompConnection receiver] 2012-09-17 14:09:09,942 failed subscribing to /1604614835/conf:
javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: signature check failed
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1358)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1370)
at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:44)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at org.jgroups.client.StompConnection.sendSubscribe(StompConnection.java:151)
at org.jgroups.client.StompConnection.subscribe(StompConnection.java:140)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:90)
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
at opsagent.messaging$subscribe.invoke(messaging.clj:44)
at opsagent.opsagent$on_connect.invoke(opsagent.clj:166)
at opsagent.opsagent$_main$fn__3603.invoke(opsagent.clj:191)
at opsagent.messaging$connect_callback$fn__3428.invoke(messaging.clj:37)
at opsagent.messaging.proxy$java.lang.Object$StompConnection$ConnectionCallback$67232781.onConnect(Unknown Source)
at org.jgroups.client.StompConnection.run(StompConnection.java:249)
at java.lang.Thread.run(Thread.java:680)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: signature check failed
at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1764)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
at com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:958)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1203)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:654)
at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:100)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at org.jgroups.client.StompConnection.sendConnect(StompConnection.java:125)
at org.jgroups.client.StompConnection.connect(StompConnection.java:334)
at org.jgroups.client.StompConnection.run(StompConnection.java:241)
... 1 more
Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: signature check failed
at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:289)
at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:263)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:184)
at sun.security.validator.Validator.validate(Validator.java:218)
at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:126)
at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:209)
at com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:249)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1185)
... 13 more
Caused by: java.security.cert.CertPathValidatorException: signature check failed
at sun.security.provider.certpath.PKIXMasterCertPathValidator.validate(PKIXMasterCertPathValidator.java:139)
at sun.security.provider.certpath.PKIXCertPathValidator.doValidate(PKIXCertPathValidator.java:330)
at sun.security.provider.certpath.PKIXCertPathValidator.engineValidate(PKIXCertPathValidator.java:178)
at java.security.cert.CertPathValidator.validate(CertPathValidator.java:250)
at sun.security.validator.PKIXValidator.doValidate(PKIXValidator.java:275)
... 20 more
Cause
It looks like an agent is 'bound ' to a server and one has to bring up the server with the same ip address?
What would be the manual steps to switch to another opscenter installation if one server goes down.
We are trying to figure out a way to automate failover, but Im not sure how it works in a manual process too . Please help!
