JBoss ON alert may not be triggered if condition's measurement uses same collection schedule as other measurements from the same agent
Environment
- Red Hat JBoss Operations Network (ON) 3.2, 3.3
- Alert definition that includes a measurement that should result in a positive evaluation of the condition
- Measurement used in alert condition is collected using same measurement collection schedule as another metric
Issue
- Alert is not triggered or not fired when expected
- Some measurement data may not be processed by alerting
Resolution
This issue is resolved in JBoss ON 3.3 Update-05 and later.
To workaround this issue, change the collection schedule of the metric so that it is the only metric with that collection schedule.
Root Cause
This issue has been identified as This content is not included.Red Hat Bugzilla 1293368. Only one MeasurementData per unique timestamp, per measurement report, is forwarded for alerting evaluation.
If a measurement report contains multiple metrics with the same timestamp, only one will be sent to the alerting system for evaluation. In this situation if there is an alert definition dependent on the omitted data, it will not fire as needed.
This issue is exacerbated by many metrics being set for the same collection schedule.
To simulate the issue you can try the following:
- Create an alert definition on
Free Swap Spacefor a platform resource.
- Choose a threshold value that should always cause the alert to fire. For example, assuming that there is at least 1 byte of free swap space, the condition could be greater than 0 bytes. This would result in the condition always being true.- Set collection interval for
Free Swap Spaceto 1 minute
- Note: Assumed all default collection intervals, meaning that all of the other metrics should have much slower collection times.
- An alert should be triggered every minute- After several alerts have fired then select all of the active metrics, including
Free Swap Space(so it is synced up), and set collection the intervals for all of them to 1 minute (all together, not one at a time)
- The alerting will likely stop, as now all of those metrics likely have the same timestamp and only one is making it to alerting.- After a pause, reset all by Free Swap to 20 minutes.
- Alerting should resume.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.