Why 'node_exporter.service' is in activating state after rolling update from RHCS 3 to RHCS 4 ?
Environment
- Red Hat Ceph Storage 4.1
- Red Hat Enterprise Linux 7
- Red Hat Enterprise Linux 8
Issue
- Why
node-exportercontainer is not online after rolling update from RHCS 3 to RHCS 4 ? - Why
node_exporter.serviceis in activating state after rolling update from RHCS 3 to RHCS 4 ?
Resolution
- Stop service
prometheus-node-exporter.service
$ systemctl stop prometheus-node-exporter.service
node-exportercontainer should come online automatically.
docker ps | grep node-exporter
2edfece1714a registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.1 "/bin/node_exporte..." 12 minutes ago Up 12 minutes node-exporter
- If needed, reload the daemon and restart 'node_exporter.service'.
$ systemctl daemon-reload
$ systemctl restart node_exporter.service
- To avoid this issue: you should stop
prometheus-node-exporter.servicebefore rolling update.
$ systemctl stop prometheus-node-exporter.service
Root Cause
cephmetrics-ansible/purge.ymlremovesprometheus-node-exporter.serviceunit files but doesn't stop the service.- A bugzilla This content is not included.Bug 1884571 has been open for this issue and is being worked by engineering. This article will be updated once issue is fixed
Diagnostic Steps
- Verify if cephmetrics dashboard was configured on RHCS 3 and it was purged before rolling upgrade.
- Verify if
node_exporter.serviceis in activating state.
systemctl status node_exporter.service
● node_exporter.service - Node Exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2020-10-12 14:49:42 EDT; 9s ago
Process: 24258 ExecStart=/usr/bin/docker run --rm --name=node-exporter --privileged -v /proc:/host/proc:ro -v /sys:/host/sys:ro --net=host registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.1 --path.procfs=/host/proc --path.sysfs=/host/sys --no-collector.timex --web.listen-address=:9100 (code=exited, status=1/FAILURE)
Process: 24249 ExecStartPre=/usr/bin/docker rm -f node-exporter (code=exited, status=1/FAILURE)
Main PID: 24258 (code=exited, status=1/FAILURE)
Oct 12 14:49:42 server601 systemd[1]: Unit node_exporter.service entered failed state.
Oct 12 14:49:42 server601 systemd[1]: node_exporter.service failed.
- Verify if
prometheus-node-exporter.serviceis already running on cluster nodes
$ systemctl status prometheus-node-exporter.service
● prometheus-node-exporter.service
Loaded: not-found (Reason: No such file or directory)
Active: active (running) since Fri 2020-10-02 03:38:36 EDT; 1 weeks 3 days ago
Main PID: 19571 (node_exporter)
CGroup: /system.slice/prometheus-node-exporter.service
└─19571 /usr/bin/node_exporter
- Verify if
prometheus-node-exporter.serviceunit-file is not present .
$ systemctl list-unit-files prometheus-node-exporter.service
UNIT FILE STATE
0 unit files listed.
SBR
Product(s)
Components
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.