Running multus validation tool post ODF deployment
Environment
Please make sure that secondary networks used by ODF i.e. (odf-public) and (odf-cluster) are both non-routable networks or Layer-2 networks.
A test can be done before moving your ODF cluster to use Multus by provisioning the NADs in step (6) below to use fake IP addresses (192.168.10.0/24) & (192.168.20.0/24) that are different from the real CIDRs defined on the switches. Spin two test pods that are attached to these NADs and verify the ping succeeds between both pods. If the networks are true Layer-2 then the test will succeed even though the IP addresses of the NADs are different from the CIDR defined on the switch.
Issue
- How to run the multus validation tool post ODF deployment.
- We installed ODF before running the multus validation tool, how do we run this tool?
Resolution
- Make sure all the nodes have the worker role:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
<infra01-node-name> Ready infra,worker 60d v1.25.12+26bab08
<infra02-node-name> Ready infra,worker 60d v1.25.12+26bab08
<infra03-node-name> Ready infra,worker 60d v1.25.12+26bab08
<master01-node-name> Ready control-plane,master 61d v1.25.12+26bab08
<master02-node-name> Ready control-plane,master 61d v1.25.12+26bab08
<master03-node-name> Ready control-plane,master 61d v1.25.12+26bab08
<worker01-node-name> Ready app,worker 59d v1.25.12+26bab08
<worker02-node-name> Ready app,worker 59d v1.25.12+26bab08
<worker03-node-name> Ready app,worker 59d v1.25.12+26bab08
<worker04-node-name> Ready app,worker 59d v1.25.12+26bab08
- If the nodes are not labeled as workers, we need to add the worker label.
oc label nodes <node-name> node-role.kubernetes.io/worker=""
PS: We need to make control plane nodes schedulable (see step 4).
- Make sure no taints are blocking the scheduling of multus validation pods on worker nodes as the daemon set of the multus validation tool can't be edited.
oc adm taint nodes <node-name> node-role.kubernetes.io/infra:NoSchedule-
- Check the scheduler and make sure the settings match the desired behavior of the validation tool.
oc patch scheduler cluster --type=merge -p '{"spec": {"mastersSchedulable": true}}'
$ oc get schedulers.config.openshift.io -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
name: cluster
spec:
defaultNodeSelector: node-role.kubernetes.io/worker
mastersSchedulable: true
policy:
name: ""
status: {}
kind: List
metadata:
resourceVersion: ""
In some cases, the masters are already schedulable and thus they have already the worker role.
$ oc get scheduler cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
creationTimestamp: "2024-05-14T13:16:08Z"
generation: 2
name: cluster
resourceVersion: "278519"
uid: 443ddeea-d033-4344-8225-b4217a40ff50
spec:
mastersSchedulable: true
policy:
name: ""
status: {}
$
- As the multus validation tool will generate a large number of test pods, the maximum number of pods per node should be raised from the default number of 250 to 500. This can be done by adding kubelet configuration to worker nodes. Please note, this will reboot all the nodes that match the 'matchLabels'.
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: set-max-pods
spec:
machineConfigPoolSelector:
matchLabels:
pools.operator.machineconfiguration.openshift.io/worker: ""
kubeletConfig:
maxPods: 500
This will trigger a rolling reboot for the master nodes. The number of worker nodes rebooting at the same time depends on the maxUnavailable value configured in the worker machine config pool.
- If not installed, make sure that the NMState Operator is deployed in your cluster.
$ oc -n openshift-nmstate wait --for=jsonpath='{.status.state}'=AtLatestKnown subscription/kubernetes-nmstate-operator --timeout=300s
subscription.operators.coreos.com/kubernetes-nmstate-operator condition met
- The multus-validation-test-web-server pod requires to be scheduled on one of the workers, it needs to have connectivity to both odf-private and odf-public NADs. By default, the worker nodes are only connected to the odf-public as the clients only connect to the odf-public. Only the infra nodes (where storage is running) connect to both odf-cluster and odf-public.
To have a successful test, we need to add odf-cluster to the worker nodes. This could be done by adding a new bond manually in case of UPI by logging to each worker node or it could be achieved by adding it using the NMstate operator along with adding nncp (Node Network Configuration Policy) and nnce (Node Network Configuration Enactment).
Login to worker #1:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.31/24' ipv4.method manual ipv6.method disabled
Login to worker #2:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.32/24' ipv4.method manual ipv6.method disabled
Login to worker #3:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.33/24' ipv4.method manual ipv6.method disabled
Login to worker #4:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.34/24' ipv4.method manual ipv6.method disabled'
7'. Alternatively, the network configuration can be updated using NodeNetworkConfigurationPolicy.
- odf_public: bond1.2403 192.168.20.0/24
- odf_cluster: bond1.2402 192.168.10.0/24
```
$ cat OCP/MC/nncp.yml
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: odf-policy
#namespace:
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: bond1.2402
type: vlan
state: up
mtu: 8950
vlan:
base-iface: bond1
id: 63
ipv4:
enabled: false
ipv6:
enabled: false
- name: bond1.2403
type: vlan
state: up
mtu: 8950
vlan:
base-iface: bond1
id: 64
ipv4:
enabled: false
ipv6:
enabled: false
$
```
When the NNCP is applied, a node network configuration enactment is generated for every node matching the nodeSelector in the nncp
```
$ oc get nnce
NAME STATUS STATUS AGE REASON
master1.cwl-site1.npss.bos2.lab.odf-policy Available 2d17h SuccessfullyConfigured
master2.cwl-site1.npss.bos2.lab.odf-policy Available 2d15h SuccessfullyConfigured
master3.cwl-site1.npss.bos2.lab.odf-policy Available 2d15h SuccessfullyConfigured
worker1.cwl-site1.npss.bos2.lab.odf-policy Available 2d17h SuccessfullyConfigured
worker2.cwl-site1.npss.bos2.lab.odf-policy Available 2d17h SuccessfullyConfigured
worker3.cwl-site1.npss.bos2.lab.odf-policy Available 2d17h SuccessfullyConfigured
$
```
- Provision the needed network attachment definitions. There should be one used to replicate traffic (odf-cluster) and another used for clients to connect to their PVCs (odf-public)
$ oc get network-attachment-definitions.k8s.cni.cncf.io -n openshift-storage
NAME AGE
odf-cluster 59d
odf-public 59d
$ oc get network-attachment-definitions.k8s.cni.cncf.io -n openshift-storage odf-cluster -oyaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: odf-cluster
namespace: openshift-storage
spec:
config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "bond1.2403", "mode":
"bridge", "ipam": { "type": "whereabouts", "range": "192.168.20.0/24" } }'
$ oc get network-attachment-definitions.k8s.cni.cncf.io -n openshift-storage odf-public -oyaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: odf-public
namespace: openshift-storage
spec:
config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "bond1.2402", "mode":
"bridge", "ipam": { "type": "whereabouts", "range": "192.168.10.0/24" } }'
-
If the cluster is managed by ACS (Advanced Cluster Security) that has a policy to inhibit Alpine OS, make sure to disable this policy first.
-
Run the test on both networks:
rook multus validation run --cluster-network odf-cluster --public-network odf-public --namespace openshift-storage
$ rook multus validation run --cluster-network odf-cluster --public-network odf-public --namespace openshift-storage | tee -a multus-test.log
2023-12-22 03:57:37.426209 I | multus-validation: starting multus validation test with the following config:
2023-12-22 03:57:37.426316 I | multus-validation: namespace: "openshift-storage"
2023-12-22 03:57:37.426319 I | multus-validation: public network: "odf-public"
2023-12-22 03:57:37.426326 I | multus-validation: cluster network: "odf-cluster"
2023-12-22 03:57:37.426331 I | multus-validation: daemons per node: 16
2023-12-22 03:57:37.426341 I | multus-validation: resource timeout: 3m0s
2023-12-22 03:57:37.493941 I | multus-validation: continuing: expected number of image pull pods not yet ready: a daemonset expects zero scheduled pods
2023-12-22 03:57:39.498573 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:41.504050 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:43.509129 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:45.514441 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:47.520110 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:49.523923 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:51.530535 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:53.534698 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:55.539492 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:57.544660 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:59.548972 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:01.553260 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:03.558274 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:05.564045 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:07.568494 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:09.573851 I | multus-validation: expecting 7 image pull pods to be 'Ready'
2023-12-22 03:58:11.596328 I | multus-validation: cleaning up all 7 'Running' image pull pods
2023-12-22 03:58:13.603285 I | multus-validation: getting web server info for clients
2023-12-22 03:58:15.611281 I | multus-validation: starting 16 clients on each node
2023-12-22 03:58:18.818678 I | multus-validation: verifying 112 client pods begin 'Running'
2023-12-22 03:58:20.842955 I | multus-validation: continuing: all 112 client pods are not yet running: got 21 pods when 112 should exist
2023-12-22 03:58:22.868580 I | multus-validation: continuing: all 112 client pods are not yet running: got 29 pods when 112 should exist
2023-12-22 03:58:24.903231 I | multus-validation: continuing: all 112 client pods are not yet running: got 41 pods when 112 should exist
2023-12-22 03:58:26.936787 I | multus-validation: continuing: all 112 client pods are not yet running: got 49 pods when 112 should exist
2023-12-22 03:58:28.974470 I | multus-validation: continuing: all 112 client pods are not yet running: got 57 pods when 112 should exist
2023-12-22 03:58:31.016007 I | multus-validation: continuing: all 112 client pods are not yet running: got 69 pods when 112 should exist
2023-12-22 03:58:33.063860 I | multus-validation: continuing: all 112 client pods are not yet running: got 77 pods when 112 should exist
2023-12-22 03:58:35.126487 I | multus-validation: continuing: all 112 client pods are not yet running: got 85 pods when 112 should exist
2023-12-22 03:58:37.193274 I | multus-validation: continuing: all 112 client pods are not yet running: got 96 pods when 112 should exist
2023-12-22 03:58:39.249677 I | multus-validation: continuing: all 112 client pods are not yet running: got 102 pods when 112 should exist
2023-12-22 03:58:41.313133 I | multus-validation: continuing: all 112 client pods are not yet running
2023-12-22 03:58:43.374178 I | multus-validation: continuing: all 112 client pods are not yet running
2023-12-22 03:58:45.433244 I | multus-validation: continuing: all 112 client pods are not yet running
2023-12-22 03:58:47.491105 I | multus-validation: verifying all 112 'Running' client pods reach 'Ready' state
2023-12-22 03:58:49.548894 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:51.613178 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:53.673437 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:55.749718 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:57.807903 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:59.895865 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:59:01.953902 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:59:04.016441 I | multus-validation: continuing: number of ready clients [1] is not the number expected [112]
2023-12-22 03:59:06.119238 I | multus-validation: continuing: number of ready clients [5] is not the number expected [112]
2023-12-22 03:59:08.206974 I | multus-validation: continuing: number of ready clients [10] is not the number expected [112]
2023-12-22 03:59:10.295467 I | multus-validation: continuing: number of ready clients [20] is not the number expected [112]
2023-12-22 03:59:12.363861 I | multus-validation: continuing: number of ready clients [26] is not the number expected [112]
2023-12-22 03:59:14.449423 I | multus-validation: continuing: number of ready clients [38] is not the number expected [112]
2023-12-22 03:59:16.514838 I | multus-validation: continuing: number of ready clients [45] is not the number expected [112]
2023-12-22 03:59:18.576962 I | multus-validation: continuing: number of ready clients [55] is not the number expected [112]
2023-12-22 03:59:20.666370 I | multus-validation: continuing: number of ready clients [66] is not the number expected [112]
2023-12-22 03:59:22.723242 I | multus-validation: continuing: number of ready clients [75] is not the number expected [112]
2023-12-22 03:59:24.810056 W | multus-validation: network seems flaky; the time since clients started becoming ready until now is greater than 20s
2023-12-22 03:59:24.810090 I | multus-validation: continuing: number of ready clients [82] is not the number expected [112]
2023-12-22 03:59:26.889918 I | multus-validation: continuing: number of ready clients [88] is not the number expected [112]
2023-12-22 03:59:28.966359 I | multus-validation: continuing: number of ready clients [96] is not the number expected [112]
2023-12-22 03:59:31.023680 I | multus-validation: continuing: number of ready clients [104] is not the number expected [112]
2023-12-22 03:59:33.083662 I | multus-validation: continuing: number of ready clients [109] is not the number expected [112]
2023-12-22 03:59:35.190632 I | multus-validation: continuing: number of ready clients [111] is not the number expected [112]
2023-12-22 03:59:37.246727 I | multus-validation: all 112 clients are 'Ready'
RESULT: multus validation test succeeded, but there are suggestions
Suggested things to investigate before installing with Multus:
- not all clients became ready within 20s; the underlying network may be flaky or not have the bandwidth to support a production ceph cluster; even if the validation test passes, this could still be an issue
leaving multus validation test resources running for manual debugging
For assistance debugging, collect the following into an archive file:
- Output of this utility
- Network Attachment Definitions (NADs) used by this test
- A write-up describing the network configuration you are trying to achieve including the
intended network for Ceph public/client traffic, intended network for Ceph cluster traffic,
interface names and CIDRs for both networks, and any other details that are relevant.
- 'ifconfig' output from at least one Kubernetes worker node
- 'kubectl get pods -o wide' output from the test namespace
- 'kubectl describe pods' output from the test namespace
- 'kubectl get pods -o yaml' output from the test namespace
- 'kubectl get daemonsets' output from the test namespace
- 'kubectl describe daemonsets' output from the test namespace
- 'kubectl get daemonsets -o yaml' output from the test namespace
- 'kubectl logs multus-validation-test-web-server' output from the test namespace
- 'kubectl get nodes -o wide' output
-
Make sure the test tool is successful and it is not reporting latencies, while latencies are expected in a connected environment it is best to check with networking team for further optimizations. disconnected environments should not report latencies.
-
Clean up test resources
rook multus validation cleanup --namespace openshift-storage
- Remove the odf-cluster links added to the worker nodes. Log in to each of the worker nodes (worker 1,2,3,4) and remove the bond1.2403 interface
sudo nmcli connection del bond1.2403
-
Alternatively, NNCP can be used to set the state: absent to remove an interface.
-
Revert the max number of pods running on each node
oc patch --type merge kubeletconfig/set-max-pods -p '{"spec":{"kubeletConfig":{"maxPods": 250}}}'
oc patch --type merge kubeletconfig/set-max-pods-masters -p '{"spec":{"kubeletConfig":{"maxPods": 250}}}'
- If the nodes did not have the worker role and the worker label was added for the test, then we can go ahead and remove that worker label from these nodes now.
oc label nodes <master node-name> node-role.kubernetes.io/worker-
- If the master nodes weren't schedulable, then we need to revert the setting now.
oc patch scheduler cluster --type=merge -p '{"spec": {"mastersSchedulable": false}}'
Root Cause
OCP releases < 4.14 using multus (i.e. secondary networks) with ODF require a support exception. Red Hat provides a tool to test the platform to ensure the design and physical connectivity permits the use of multus with ODF.
The recommendation from Red Hat is to run the tool before ODF is installed for the following reasons:
- The nature of the multus validation tool will spawn a large number of pods per worker node. If ODF pods are installed, they will consume lots of IP addresses from the CIDR used by the NADs and it is highly likely that the test will fail.
- The test is run on two stages, one to validate the odf-cluster network that is used for CEPH internal replication traffic [./rook multus validation run --cluster-network odf-cluster --namespace default] and these pods run on the storage nodes only. This will require changing the default scheduler behavior to prefer the storage nodes.
As the tool generates a high amount of test pods, adding the multus validation pods will cause the storage nodes to run more than 250 pods which will lead to failure as the number of running pods needs to be increased by editing kubeled configuration and bumping up the maximum number of allowed running pods per node. - The second stage will run a command to validate the odf-public network used by the clients to connect to their backend PVCs [./rook multus validation run --public-network odf-public --namespace default].
While this is thought to be the best possible approach, real life scenarios have shown that most of our consultants and partners will forget and deploy the ODF first. In some cases, the consultants are not aware of the need to run the validation tool until the cluster is fully functional. Moreover, being able to run the validation tool while ODF pods are installed will help test the following:
- The network has enough bandwidth to run the multus validation pods side by side with OSD pods.
- The surge in traffic while introducing the new multus pods will help see if this will cause the OSD pods to crash or not.
- Usually, when we ask our consultants to redeploy the ODF it comes with much inconvenience to both consultants and the customer.
- The official approach will test each network separately, which will yield a lesser amount of pods (50 compared to 100+ when both networks are tested in the same command).
- Make sure that the CIDR has enough size to accept more pods i.e. more OSDs in the future.
- We will be able to see whether unused NADs are released and the IP addresses can be re-used after the multus validation pods are removed [oc get ippools.whereabouts.cni.cncf.io -n openshift-multus].
- In reality, OSD pods connect to both odf-cluster and odf-public at the same time. The pods generated using the recommended way will test only one network at a time and not both.
- Telco ZTP scenarios, we would deploy ODF with MULTUS, but ZTP would not be running the validation tool before deploying ODF.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.