Why 'node_exporter.service' is in activating state after rolling update from RHCS 3 to RHCS 4 ?

Solution Verified - Updated

Environment

  • Red Hat Ceph Storage 4.1
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8

Issue

  • Why node-exporter container is not online after rolling update from RHCS 3 to RHCS 4 ?
  • Why node_exporter.service is in activating state after rolling update from RHCS 3 to RHCS 4 ?

Resolution

  • Stop service prometheus-node-exporter.service
$ systemctl stop prometheus-node-exporter.service
  • node-exporter container should come online automatically.
docker ps | grep node-exporter
2edfece1714a        registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.1   "/bin/node_exporte..."   12 minutes ago      Up 12 minutes                           node-exporter
  • If needed, reload the daemon and restart 'node_exporter.service'.
$ systemctl daemon-reload

$ systemctl restart node_exporter.service
  • To avoid this issue: you should stop prometheus-node-exporter.service before rolling update.
$ systemctl stop prometheus-node-exporter.service

Root Cause

  • cephmetrics-ansible/purge.yml removes prometheus-node-exporter.service unit files but doesn't stop the service.
  • A bugzilla This content is not included.Bug 1884571 has been open for this issue and is being worked by engineering. This article will be updated once issue is fixed

Diagnostic Steps

  • Verify if cephmetrics dashboard was configured on RHCS 3 and it was purged before rolling upgrade.
  • Verify if node_exporter.service is in activating state.
systemctl status node_exporter.service
● node_exporter.service - Node Exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2020-10-12 14:49:42 EDT; 9s ago
  Process: 24258 ExecStart=/usr/bin/docker run --rm --name=node-exporter --privileged -v /proc:/host/proc:ro -v /sys:/host/sys:ro --net=host registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.1 --path.procfs=/host/proc --path.sysfs=/host/sys --no-collector.timex --web.listen-address=:9100 (code=exited, status=1/FAILURE)
  Process: 24249 ExecStartPre=/usr/bin/docker rm -f node-exporter (code=exited, status=1/FAILURE)
 Main PID: 24258 (code=exited, status=1/FAILURE)

Oct 12 14:49:42 server601 systemd[1]: Unit node_exporter.service entered failed state.
Oct 12 14:49:42 server601 systemd[1]: node_exporter.service failed.
  • Verify if prometheus-node-exporter.service is already running on cluster nodes
$ systemctl status prometheus-node-exporter.service
● prometheus-node-exporter.service
   Loaded: not-found (Reason: No such file or directory)
   Active: active (running) since Fri 2020-10-02 03:38:36 EDT; 1 weeks 3 days ago
 Main PID: 19571 (node_exporter)
   CGroup: /system.slice/prometheus-node-exporter.service
           └─19571 /usr/bin/node_exporter

  • Verify if prometheus-node-exporter.service unit-file is not present .
$ systemctl list-unit-files prometheus-node-exporter.service
UNIT FILE STATE

0 unit files listed.
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.