nmstate operator cannot implement DNS change (aborted)
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.10 ~
Issue
- While implementing an update to /etc/resolv.conf via nmstate operator nncp.yaml file, you are met with the following error message or similar:
conditions:
- lastHeartbeatTime: "2023-08-10T19:04:21Z"
lastTransitionTime: "2023-08-10T19:04:19Z"
reason: FailedToConfigure
status: "False"
type: Available
- lastHeartbeatTime: "2023-08-10T19:04:21Z"
lastTransitionTime: "2023-08-10T19:04:19Z"
message: 1/10 nodes failed to configure, 9 nodes aborted configuration
reason: FailedToConfigure
status: "True"
type: Degraded
- NNCE output for the failed implementation includes the following block (or similar)
2023-08-10T14:51:23.767989283-04:00 {"level":"error","ts":"2023-08-10T18:51:23.767Z","logger":"controllers.NodeNetworkConfigurationPolicy","msg":"","nodenetworkconfigurationpolicy":"/worker-ens192-dns-policy","error":"policy has failing enactments, aborting","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/openshift/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
- The nncp yaml you are attempting to implement seems to match the documentation:
spec:
desiredState:
dns-resolver:
config:
search:
- domain.net
- domain.com
server:
- xxx.xxx.xxx.200
- xxx.xxx.xxx.62
interfaces:
- ipv4:
auto-dns: false
dhcp: false
enabled: true
name: ens192 #<---this is the primary nic for your node(s) and may vary in your current situation
state: up
type: ethernet
Resolution
-
There is some confusion regarding the current implementation of DNS from the OpenShift docs, and a bug has been opened to clarify this issue:
This content is not included.This content is not included.https://issues.redhat.com/browse/OCPBUGS-18411 -
nmstate is unable to bounce the nic for the primary interface of the node, as this would result in a possible network outage. Since Nmstate operator must ensure that the node be able to return to previous/OK functionality if the implementation is unsuccessful, a requisite of the design changes is that it must never
put the node in a degraded state. the PRIMARY INTERFACE OF THE NODE therefore cannot be modified by nmstate operator, because restarting that nic could immediately disable access to the machine. Therefore modifying the primary interface is not supported and will result in an abort. -
In addition, DNS modification does NOT require specifying an interface. See the upstream nmstate.io documentation here: Content from nmstate.io is not included.Content from nmstate.io is not included.https://nmstate.io/examples.html#dns
dns-resolver: config: search: - example.com - example.org server: - 2001:4860:4860::8888 - 8.8.8.8 -
Therefore, you do not need to specify the interfaces here at all, if the objective is to just modify the /etc/resolv.conf file on the host nodes. A valid and complete implementation change based on the initial problem example is:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: worker-0-dns-testing
spec:
nodeSelector:
kubernetes.io/hostname: <target-node>
desiredState:
dns-resolver:
config:
search:
- domain.net
- domain.com
server:
- xxx.xxx.xxx.200
Root Cause
-
Nmstate operator cannot modify or restart the primary interface of the node and aborts, resulting in a vague error message regarding the ability to implement the change, despite the request appearing valid/complete and matching syntax outlined in the docs
-
Removing the primary interface and swapping in a SECONDARY interface, or removing the spec option for interface entirely, is required to move past this error.
-
Documentation is in the process of being updated to reflect this issue.
Diagnostic Steps
- Observe that implementing a valid DNS update via nmstate operator results in an unexpected abort when applying a change that specifies the primary interface of the node
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.