JON update from 3.3.2 to JON 3.3.3 does not remove all files introduced with JON 3.3.0 update 02
Environment
- Red Hat JBoss Operations Network (JON) 3.3.0
- Red Hat JBoss Operations Network 3.3 is updated first to Update 02 and then to Update 03;
- JON storage node is shut down and JON Server is in
MAINTENANCEmode and JON Agent is backfilled; - JON Storage node is started again;
Issue
- JON update from 3.3.2 to JON 3.3.3 does not remove all files introduced with JON 3.3.0 update 02
- Storage node recovery (restart) does not change the status of the JON Server and JON agent is not re-registered with the server.
- When storage node is down, the following message is repeated all the time and scheduling interval is very quickly increased:
10:49:25,086 ERROR [com.datastax.driver.core.ControlConnection] (Reconnection-1) [Control connection] Cannot connect to any host, scheduling retry in 512000 milliseconds
- JON Agent is not reconnecting to the JON Server once the JON storage node is restarted;
- The server.log file contains the following error messages:
11:07:56,207 ERROR [stderr] (StorageNode SessionAliveChecker) Exception in thread "StorageNode SessionAliveChecker" java.lang.NoSuchMethodError: org.rhq.server.metrics.MetricsDAO.checkLiveness(Ljava/lang/String;)Lcom/datastax/driver/core/ResultSet;
11:07:56,207 ERROR [stderr] (StorageNode SessionAliveChecker) at org.rhq.enterprise.server.storage.StorageClientManager$SessionAliveChecker.run(StorageClientManager.java:674)
Resolution
To clean up the environment, stop JON Server, Agent, and Storage Node and then remove/delete the extra files there are not included in 3.3 update 03 but were introduced in JON 3.3 update 02:
1. <RHQ_SERVER_HOME>/modules/org/rhq/server-startup/main/deployments/rhq.ear/lib/rhq-server-metrics-4.12.0.JON330GA-redhat-2.jar
2. <RHQ_SERVER_HOME>/modules/org/rhq/server-startup/main/deployments/rhq.ear/lib/rhq-cassandra-schema-4.12.0.JON330GA-redhat-1.jar
Once above files are deleted, start up all three JON components.
Root Cause
Files:
1. <RHQ_SERVER_HOME>/modules/org/rhq/server-startup/main/deployments/rhq.ear/lib/rhq-server-metrics-4.12.0.JON330GA-redhat-2.jar
2. <RHQ_SERVER_HOME>/modules/org/rhq/server-startup/main/deployments/rhq.ear/lib/rhq-cassandra-schema-4.12.0.JON330GA-redhat-1.jar
are introduced in JON 3.3.0 update 02 but not used in JON 3.3.0 update 03. However, they are not removed when upgrade is done from JON 3.3.0 update 02 to JON 3.3.0 update 03 and this causes reported issue.
Diagnostic Steps
- Check the server.log file. It should contain messages like:
08:45:32,078 ERROR [com.datastax.driver.core.ControlConnection] (Reconnection-0) [Control connection] Cannot connect to any host, scheduling retry in 2000 milliseconds
08:45:34,078 ERROR [com.datastax.driver.core.ControlConnection] (Reconnection-0) [Control connection] Cannot connect to any host, scheduling retry in 4000 milliseconds
08:45:38,079 ERROR [com.datastax.driver.core.ControlConnection] (Reconnection-0) [Control connection] Cannot connect to any host, scheduling retry in 8000 milliseconds
08:45:46,079 ERROR [com.datastax.driver.core.ControlConnection] (Reconnection-0) [Control connection] Cannot connect to any host, scheduling retry in 16000 milliseconds
08:46:02,080 ERROR [com.datastax.driver.core.ControlConnection] (Reconnection-0) [Control connection] Cannot connect to any host, scheduling retry in 32000 milliseconds
08:46:30,300 WARN [org.rhq.server.metrics.StorageSession] (http-/0.0.0.0:7080-7) Encountered NoHostAvailableException due to following error(s): {}
08:46:30,300 INFO [org.rhq.enterprise.server.storage.StorageClusterMonitor] (http-/0.0.0.0:7080-7) Storage cluster is down
...
08:54:02,081 ERROR [com.datastax.driver.core.ControlConnection] (Reconnection-0) [Control connection] Cannot connect to any host, scheduling retry in 512000 milliseconds
- Navigate to
<RHQ_SERVER_HOME>/modules/org/rhq/server-startup/main/deployments/rhq.earand execute:
$ find . -type f -exec md5sum {} \; >> md5.txt
-
Compare md5.txt files generated in two JON 3.3.3 environments where one is direct upgrade from JON 3.3.0 to JON 3.3.3 and the other one is upgrade from JON 3.3.0 to JON 3.3.2 and then finally to JON 3.3.3. The second md5.txt file (upgrade from JON 3.3.0 -> JON 3.3.2 -> JON 3.3.3) should contain the following files:
- ./lib/rhq-server-metrics-4.12.0.JON330GA-redhat-2.jar
- ./lib/rhq-cassandra-schema-4.12.0.JON330GA-redhat-1.jar
where direct upgrade from JON 3.3.0 to JON 3.3.3 does not contain above files.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.