Running multus validation tool post ODF deployment

Solution Verified - Updated 23 May 2024

Environment

Please make sure that secondary networks used by ODF i.e. (odf-public) and (odf-cluster) are both non-routable networks or Layer-2 networks.
A test can be done before moving your ODF cluster to use Multus by provisioning the NADs in step (6) below to use fake IP addresses (192.168.10.0/24) & (192.168.20.0/24) that are different from the real CIDRs defined on the switches. Spin two test pods that are attached to these NADs and verify the ping succeeds between both pods. If the networks are true Layer-2 then the test will succeed even though the IP addresses of the NADs are different from the CIDR defined on the switch.

Issue

How to run the multus validation tool post ODF deployment.
We installed ODF before running the multus validation tool, how do we run this tool?

Resolution

Make sure all the nodes have the worker role:

$ oc get nodes
NAME                          STATUS   ROLES                  AGE   VERSION
<infra01-node-name>    Ready    infra,worker           60d   v1.25.12+26bab08
<infra02-node-name>    Ready    infra,worker           60d   v1.25.12+26bab08
<infra03-node-name>    Ready    infra,worker           60d   v1.25.12+26bab08
<master01-node-name>   Ready    control-plane,master   61d   v1.25.12+26bab08
<master02-node-name>   Ready    control-plane,master   61d   v1.25.12+26bab08
<master03-node-name>   Ready    control-plane,master   61d   v1.25.12+26bab08
<worker01-node-name>   Ready    app,worker             59d   v1.25.12+26bab08
<worker02-node-name>   Ready    app,worker             59d   v1.25.12+26bab08
<worker03-node-name>   Ready    app,worker             59d   v1.25.12+26bab08
<worker04-node-name>   Ready    app,worker             59d   v1.25.12+26bab08

If the nodes are not labeled as workers, we need to add the worker label.

oc label nodes <node-name> node-role.kubernetes.io/worker=""

PS: We need to make control plane nodes schedulable (see step 4).

Make sure no taints are blocking the scheduling of multus validation pods on worker nodes as the daemon set of the multus validation tool can't be edited.

oc adm taint nodes <node-name> node-role.kubernetes.io/infra:NoSchedule-

Check the scheduler and make sure the settings match the desired behavior of the validation tool.

oc patch scheduler cluster --type=merge -p '{"spec": {"mastersSchedulable": true}}'

$ oc get schedulers.config.openshift.io -oyaml
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: Scheduler
  metadata:
    name: cluster
  spec:
    defaultNodeSelector: node-role.kubernetes.io/worker
    mastersSchedulable: true
    policy:
      name: ""
  status: {}
kind: List
metadata:
  resourceVersion: ""

In some cases, the masters are already schedulable and thus they have already the worker role.

$ oc get scheduler cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Scheduler
metadata:
  creationTimestamp: "2024-05-14T13:16:08Z"
  generation: 2
  name: cluster
  resourceVersion: "278519"
  uid: 443ddeea-d033-4344-8225-b4217a40ff50
spec:
  mastersSchedulable: true
  policy:
    name: ""
status: {}

$

As the multus validation tool will generate a large number of test pods, the maximum number of pods per node should be raised from the default number of 250 to 500. This can be done by adding kubelet configuration to worker nodes. Please note, this will reboot all the nodes that match the 'matchLabels'.

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: set-max-pods 
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: "" 
  kubeletConfig:
    maxPods: 500

This will trigger a rolling reboot for the master nodes. The number of worker nodes rebooting at the same time depends on the maxUnavailable value configured in the worker machine config pool.

If not installed, make sure that the NMState Operator is deployed in your cluster.

$ oc -n openshift-nmstate wait --for=jsonpath='{.status.state}'=AtLatestKnown subscription/kubernetes-nmstate-operator --timeout=300s
subscription.operators.coreos.com/kubernetes-nmstate-operator condition met

The multus-validation-test-web-server pod requires to be scheduled on one of the workers, it needs to have connectivity to both odf-private and odf-public NADs. By default, the worker nodes are only connected to the odf-public as the clients only connect to the odf-public. Only the infra nodes (where storage is running) connect to both odf-cluster and odf-public.
To have a successful test, we need to add odf-cluster to the worker nodes. This could be done by adding a new bond manually in case of UPI by logging to each worker node or it could be achieved by adding it using the NMstate operator along with adding nncp (Node Network Configuration Policy) and nnce (Node Network Configuration Enactment).

Login to worker #1:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.31/24' ipv4.method manual ipv6.method disabled

Login to worker #2:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.32/24' ipv4.method manual ipv6.method disabled

Login to worker #3:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.33/24' ipv4.method manual ipv6.method disabled

Login to worker #4:
sudo nmcli connection add type vlan con-name bond1.2403 ifname bond1.2403 dev bond1 id 2403 802-3-ethernet.mtu 9000 ipv4.addresses '192.168.20.34/24' ipv4.method manual ipv6.method disabled'

7'. Alternatively, the network configuration can be updated using NodeNetworkConfigurationPolicy.
- odf_public: bond1.2403 192.168.20.0/24
- odf_cluster: bond1.2402 192.168.10.0/24

```

    $ cat OCP/MC/nncp.yml 
    apiVersion: nmstate.io/v1
    kind: NodeNetworkConfigurationPolicy
    metadata:
      name: odf-policy 
      #namespace: 
    spec:
      nodeSelector: 
        node-role.kubernetes.io/worker: ''
      desiredState:
        interfaces:
        - name: bond1.2402
          type: vlan
          state: up
          mtu: 8950
          vlan:
            base-iface: bond1
            id: 63
          ipv4:
            enabled: false
          ipv6:
            enabled: false
        - name: bond1.2403
          type: vlan
          state: up 
          mtu: 8950 
          vlan:     
            base-iface: bond1
            id: 64    
          ipv4:     
            enabled: false
          ipv6:     
            enabled: false

    $   
```


When the NNCP is applied, a node network configuration enactment is generated for every node matching the nodeSelector in the nncp

```
    $ oc get nnce
    NAME                                         STATUS      STATUS AGE   REASON
    master1.cwl-site1.npss.bos2.lab.odf-policy   Available   2d17h        SuccessfullyConfigured
    master2.cwl-site1.npss.bos2.lab.odf-policy   Available   2d15h        SuccessfullyConfigured
    master3.cwl-site1.npss.bos2.lab.odf-policy   Available   2d15h        SuccessfullyConfigured
    worker1.cwl-site1.npss.bos2.lab.odf-policy   Available   2d17h        SuccessfullyConfigured
    worker2.cwl-site1.npss.bos2.lab.odf-policy   Available   2d17h        SuccessfullyConfigured
    worker3.cwl-site1.npss.bos2.lab.odf-policy   Available   2d17h        SuccessfullyConfigured

    $ 
```

Provision the needed network attachment definitions. There should be one used to replicate traffic (odf-cluster) and another used for clients to connect to their PVCs (odf-public)

$ oc get network-attachment-definitions.k8s.cni.cncf.io -n openshift-storage
NAME          AGE
odf-cluster   59d
odf-public    59d

$ oc get network-attachment-definitions.k8s.cni.cncf.io -n openshift-storage odf-cluster -oyaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: odf-cluster
  namespace: openshift-storage
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "bond1.2403", "mode":
    "bridge", "ipam": { "type": "whereabouts", "range": "192.168.20.0/24" } }'

$ oc get network-attachment-definitions.k8s.cni.cncf.io -n openshift-storage odf-public -oyaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: odf-public
  namespace: openshift-storage
spec:
  config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "bond1.2402", "mode":
    "bridge", "ipam": { "type": "whereabouts", "range": "192.168.10.0/24" } }'

If the cluster is managed by ACS (Advanced Cluster Security) that has a policy to inhibit Alpine OS, make sure to disable this policy first.
Run the test on both networks:

rook  multus validation run --cluster-network odf-cluster --public-network odf-public --namespace openshift-storage

$ rook  multus validation run --cluster-network odf-cluster --public-network odf-public --namespace openshift-storage | tee -a multus-test.log
2023-12-22 03:57:37.426209 I | multus-validation: starting multus validation test with the following config:
2023-12-22 03:57:37.426316 I | multus-validation:   namespace: "openshift-storage"
2023-12-22 03:57:37.426319 I | multus-validation:   public network: "odf-public"
2023-12-22 03:57:37.426326 I | multus-validation:   cluster network: "odf-cluster"
2023-12-22 03:57:37.426331 I | multus-validation:   daemons per node: 16
2023-12-22 03:57:37.426341 I | multus-validation:   resource timeout: 3m0s
2023-12-22 03:57:37.493941 I | multus-validation: continuing: expected number of image pull pods not yet ready: a daemonset expects zero scheduled pods
2023-12-22 03:57:39.498573 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:41.504050 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:43.509129 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:45.514441 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:47.520110 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:49.523923 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:51.530535 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:53.534698 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:55.539492 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:57.544660 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:57:59.548972 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:01.553260 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:03.558274 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:05.564045 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:07.568494 I | multus-validation: waiting to ensure num expected image pull pods to stabilize at 7
2023-12-22 03:58:09.573851 I | multus-validation: expecting 7 image pull pods to be 'Ready'
2023-12-22 03:58:11.596328 I | multus-validation: cleaning up all 7 'Running' image pull pods
2023-12-22 03:58:13.603285 I | multus-validation: getting web server info for clients
2023-12-22 03:58:15.611281 I | multus-validation: starting 16 clients on each node
2023-12-22 03:58:18.818678 I | multus-validation: verifying 112 client pods begin 'Running'
2023-12-22 03:58:20.842955 I | multus-validation: continuing: all 112 client pods are not yet running: got 21 pods when 112 should exist
2023-12-22 03:58:22.868580 I | multus-validation: continuing: all 112 client pods are not yet running: got 29 pods when 112 should exist
2023-12-22 03:58:24.903231 I | multus-validation: continuing: all 112 client pods are not yet running: got 41 pods when 112 should exist
2023-12-22 03:58:26.936787 I | multus-validation: continuing: all 112 client pods are not yet running: got 49 pods when 112 should exist
2023-12-22 03:58:28.974470 I | multus-validation: continuing: all 112 client pods are not yet running: got 57 pods when 112 should exist
2023-12-22 03:58:31.016007 I | multus-validation: continuing: all 112 client pods are not yet running: got 69 pods when 112 should exist
2023-12-22 03:58:33.063860 I | multus-validation: continuing: all 112 client pods are not yet running: got 77 pods when 112 should exist
2023-12-22 03:58:35.126487 I | multus-validation: continuing: all 112 client pods are not yet running: got 85 pods when 112 should exist
2023-12-22 03:58:37.193274 I | multus-validation: continuing: all 112 client pods are not yet running: got 96 pods when 112 should exist
2023-12-22 03:58:39.249677 I | multus-validation: continuing: all 112 client pods are not yet running: got 102 pods when 112 should exist
2023-12-22 03:58:41.313133 I | multus-validation: continuing: all 112 client pods are not yet running
2023-12-22 03:58:43.374178 I | multus-validation: continuing: all 112 client pods are not yet running
2023-12-22 03:58:45.433244 I | multus-validation: continuing: all 112 client pods are not yet running
2023-12-22 03:58:47.491105 I | multus-validation: verifying all 112 'Running' client pods reach 'Ready' state
2023-12-22 03:58:49.548894 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:51.613178 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:53.673437 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:55.749718 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:57.807903 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:58:59.895865 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:59:01.953902 I | multus-validation: continuing: number of ready clients [0] is not the number expected [112]
2023-12-22 03:59:04.016441 I | multus-validation: continuing: number of ready clients [1] is not the number expected [112]
2023-12-22 03:59:06.119238 I | multus-validation: continuing: number of ready clients [5] is not the number expected [112]
2023-12-22 03:59:08.206974 I | multus-validation: continuing: number of ready clients [10] is not the number expected [112]
2023-12-22 03:59:10.295467 I | multus-validation: continuing: number of ready clients [20] is not the number expected [112]
2023-12-22 03:59:12.363861 I | multus-validation: continuing: number of ready clients [26] is not the number expected [112]
2023-12-22 03:59:14.449423 I | multus-validation: continuing: number of ready clients [38] is not the number expected [112]
2023-12-22 03:59:16.514838 I | multus-validation: continuing: number of ready clients [45] is not the number expected [112]
2023-12-22 03:59:18.576962 I | multus-validation: continuing: number of ready clients [55] is not the number expected [112]
2023-12-22 03:59:20.666370 I | multus-validation: continuing: number of ready clients [66] is not the number expected [112]
2023-12-22 03:59:22.723242 I | multus-validation: continuing: number of ready clients [75] is not the number expected [112]
2023-12-22 03:59:24.810056 W | multus-validation: network seems flaky; the time since clients started becoming ready until now is greater than 20s
2023-12-22 03:59:24.810090 I | multus-validation: continuing: number of ready clients [82] is not the number expected [112]
2023-12-22 03:59:26.889918 I | multus-validation: continuing: number of ready clients [88] is not the number expected [112]
2023-12-22 03:59:28.966359 I | multus-validation: continuing: number of ready clients [96] is not the number expected [112]
2023-12-22 03:59:31.023680 I | multus-validation: continuing: number of ready clients [104] is not the number expected [112]
2023-12-22 03:59:33.083662 I | multus-validation: continuing: number of ready clients [109] is not the number expected [112]
2023-12-22 03:59:35.190632 I | multus-validation: continuing: number of ready clients [111] is not the number expected [112]
2023-12-22 03:59:37.246727 I | multus-validation: all 112 clients are 'Ready'

RESULT: multus validation test succeeded, but there are suggestions

Suggested things to investigate before installing with Multus:
    - not all clients became ready within 20s; the underlying network may be flaky or not have the bandwidth to support a production ceph cluster; even if the validation test passes, this could still be an issue

leaving multus validation test resources running for manual debugging

For assistance debugging, collect the following into an archive file:
  - Output of this utility
  - Network Attachment Definitions (NADs) used by this test
  - A write-up describing the network configuration you are trying to achieve including the
      intended network for Ceph public/client traffic, intended network for Ceph cluster traffic,
      interface names and CIDRs for both networks, and any other details that are relevant.
  - 'ifconfig' output from at least one Kubernetes worker node
  - 'kubectl get pods -o wide' output from the test namespace
  - 'kubectl describe pods' output from the test namespace
  - 'kubectl get pods -o yaml' output from the test namespace
  - 'kubectl get daemonsets' output from the test namespace
  - 'kubectl describe daemonsets' output from the test namespace
  - 'kubectl get daemonsets -o yaml' output from the test namespace
  - 'kubectl logs multus-validation-test-web-server' output from the test namespace
  - 'kubectl get nodes -o wide' output

Make sure the test tool is successful and it is not reporting latencies, while latencies are expected in a connected environment it is best to check with networking team for further optimizations. disconnected environments should not report latencies.
Clean up test resources

rook multus validation cleanup --namespace openshift-storage

Remove the odf-cluster links added to the worker nodes. Log in to each of the worker nodes (worker 1,2,3,4) and remove the bond1.2403 interface

sudo nmcli connection del bond1.2403

Alternatively, NNCP can be used to set the state: absent to remove an interface.
Revert the max number of pods running on each node

oc patch --type merge kubeletconfig/set-max-pods -p '{"spec":{"kubeletConfig":{"maxPods": 250}}}'

oc patch --type merge kubeletconfig/set-max-pods-masters -p '{"spec":{"kubeletConfig":{"maxPods": 250}}}'

If the nodes did not have the worker role and the worker label was added for the test, then we can go ahead and remove that worker label from these nodes now.

oc label nodes <master node-name> node-role.kubernetes.io/worker-

If the master nodes weren't schedulable, then we need to revert the setting now.

oc patch scheduler cluster --type=merge -p '{"spec": {"mastersSchedulable": false}}'

Root Cause

OCP releases < 4.14 using multus (i.e. secondary networks) with ODF require a support exception. Red Hat provides a tool to test the platform to ensure the design and physical connectivity permits the use of multus with ODF.

The recommendation from Red Hat is to run the tool before ODF is installed for the following reasons:

The nature of the multus validation tool will spawn a large number of pods per worker node. If ODF pods are installed, they will consume lots of IP addresses from the CIDR used by the NADs and it is highly likely that the test will fail.
The test is run on two stages, one to validate the odf-cluster network that is used for CEPH internal replication traffic [./rook multus validation run --cluster-network odf-cluster --namespace default] and these pods run on the storage nodes only. This will require changing the default scheduler behavior to prefer the storage nodes.
As the tool generates a high amount of test pods, adding the multus validation pods will cause the storage nodes to run more than 250 pods which will lead to failure as the number of running pods needs to be increased by editing kubeled configuration and bumping up the maximum number of allowed running pods per node.
The second stage will run a command to validate the odf-public network used by the clients to connect to their backend PVCs [./rook multus validation run --public-network odf-public --namespace default].

While this is thought to be the best possible approach, real life scenarios have shown that most of our consultants and partners will forget and deploy the ODF first. In some cases, the consultants are not aware of the need to run the validation tool until the cluster is fully functional. Moreover, being able to run the validation tool while ODF pods are installed will help test the following:

The network has enough bandwidth to run the multus validation pods side by side with OSD pods.
The surge in traffic while introducing the new multus pods will help see if this will cause the OSD pods to crash or not.
Usually, when we ask our consultants to redeploy the ODF it comes with much inconvenience to both consultants and the customer.
The official approach will test each network separately, which will yield a lesser amount of pods (50 compared to 100+ when both networks are tested in the same command).
Make sure that the CIDR has enough size to accept more pods i.e. more OSDs in the future.
We will be able to see whether unused NADs are released and the IP addresses can be re-used after the multus validation pods are removed [oc get ippools.whereabouts.cni.cncf.io -n openshift-multus].
In reality, OSD pods connect to both odf-cluster and odf-public at the same time. The pods generated using the recommended way will test only one network at a time and not both.
Telco ZTP scenarios, we would deploy ODF with MULTUS, but ZTP would not be running the validation tool before deploying ODF.

Product(s)

Category

Configure

Tags

storage

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.