Adding nodes in IBM Power clusters can result in failed File Integrity node status

Solution Unverified - Updated 3 Jun 2024

Environment

Red Hat OpenShift Container Platform environments hosted on IBM Power hardware using the File Integrity Operator version 1.3.1.

Issue

When using the File Integrity Operator on IBM Power systems, you may observe Failed File Integrity node status for newly added nodes when scaling up the cluster.

This issue will manifest after scaling up the machine set. When the new node reaches a Provisioned and Running state, the File Integrity Operator will deploy AIDE to monitor the file system. At this point, any changes to the file system not explicitly excluded in the AIDE configuration file, will result in an alert and Failed node status:

$ oc get fileintegritynodestatuses.fileintegrity.openshift.io
NAME                           NODE          STATUS
example-fileintegrity-worker   worker-node   Failed

Resolution

You can work around this issue by annotating the File Integrity custom resource for the node that failed. This process is documented in the File Integrity Operator documentation.

Root Cause

After the File Integrity Operator deploys a monitoring agent on the node, something modifies the host file system (/etc/hosts), resulting in a Failed integrity status.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.