JBoss ON 3.2 to 3.3 storage schema upgrade (rhqctl upgrade --storage-schema) takes many hours to complete

Solution Verified - Updated

Environment

  • Red Hat JBoss Operations Network (ON) 3.3
  • Upgrading from JBoss ON 3.2 to 3.3
  • Performing JBoss ON storage node schema upgrade step using command rhqctl upgrade --storage-schema

Issue

  • Storage node upgrade took several hours to complete

  • Metric data migration will take a week

  • rhq-installer.log shows only 18 metric schedules are being migrated per minute:

      14:37:04,763 INFO  [org.rhq.cassandra.schema.MigrateAggregateMetrics] There are 108560 remaining schedules for the one_hour data migration
      14:37:34,774 INFO  [org.rhq.cassandra.schema.MigrateAggregateMetrics] There are 108551 remaining schedules for the one_hour data migration
      14:38:04,795 INFO  [org.rhq.cassandra.schema.MigrateAggregateMetrics] There are 108542 remaining schedules for the one_hour data migration
      14:38:34,806 INFO  [org.rhq.cassandra.schema.MigrateAggregateMetrics] There are 108533 remaining schedules for the one_hour data migration
    
  • Storage schema upgrade and metric data migration when upgrading from JBoss ON 3.2 to 3.3 takes too long to process

Resolution

If the storage schema upgrade process is taking more then an hour, abort it by pressing CTRL+C. Then apply JBoss ON 3.3 update-02 to the server that the storage schema upgrade command is being issued. You can then rerun the storage schema upgrade command.

Root Cause

This issue is due to the serial nature of the storage schema upgrade process. During the upgrade process, a new storage schema is created and the existing data is copied from the old schema to the new schema. Data is read one key at a time. This results in very poor performance as the latency -- both with disk i/o and processing -- is too great.

Additionally, an attempt is made to read metric data for all metric schedules in the JBoss ON system. In most installations, a very large percentage of metrics are disabled and therefore these read attempts are wasteful. Combined with the serial nature of the upgrade the result is a schema upgrade that takes several hours instead of only a couple of hours.

This issue has been captured as This content is not included.Red Hat Bugzilla 1185375.

SBR
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.