Hawkular-metrics presenting issues communicating with cassandra
Environment
- Red Hat OpenShift Container Platform
- 3.9
Issue
- Metrics are not reporting and the hawkular-metrics pod is reporting errors about cassandra, but cassandra appears to be healthy
- Hawkular-metrics is reporting error like this:
[DATETIME] WARN [org.hawkular.metrics.scheduler.impl.SchedulerImpl] (RxComputationScheduler-3) Job execution of JobDetailsImpl{jobId=[HASH], jobType=TEMP_DATA_COMPRESSOR, jobName=TEMP_DATA_COMPRESSOR, parameters={}, trigger=RepeatingTrigger{triggerTime=1559638800000, interval=7200000, delay=60000}, status=NONE} for time slice 1559638800000 failed: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_ONE (1 replica were required but only 0 acknowledged the write)
...
[DATETIME] ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-4) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/[IP:PORT] (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/[IP:PORT]] Timed out waiting for server response))
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/[IP:PORT] (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/[IP:PORT]] Timed out waiting for server response))
[DATETIME] ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-7) HAWKMETRICS200010: Failed to process request: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/[IP:PORT] (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/[IP:PORT]] Timed out waiting for server response))
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/[IP:PORT] (com.datastax.driver.core.exceptions.OperationTimedOutException: [hawkular-cassandra/[IP:PORT]] Timed out waiting for server response))
Resolution
There are two solutions to this issue:
- If there are several thousand pods (3k+) running in this cluster, we can choose to scale up cassandra, so that there are more cassandra instances able to handle the traffic being generated by all of the pods,
Or,
- Otherwise, we will need to cleanup the metrics installation, also known as uninstall it, and then redeploy it.
SBR
Product(s)
Components
Category
Tags
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.