Understanding and Validating MTU setting with OpenShift Container Platform 4.x

Updated 16 May 2023

In many situations OpenShift deployments span logical networks boundaries (sometimes run or managed by different teams or organizations). In these situations understanding how to identify the MTU values of the networks your OpenShift members will run on and set a proper MTU value for OpenShift improves the clusters operational SLA's/SLO's and reduces service-affecting event and instability of the cluster.

Example:

If you are deploying a stretched cluster (across large geographic distances), or using a mix of IaaS or Cloud provider resources to structure your cluster. IE: if you run a virtualization platform, which is configured for an MTU of 1500, but you also use bare-metal servers from your data center with MTU configuration of 9000, and have Cloud resources on a network with MTU of 4000.

In this example the minimum effective MTU across the infrastructure should be the maximum MTU used for the deployment (using a lower MTU is okay). Or for this deployment the MTU configuration of the cluster must use an MTU configuration of 1500 or less.

MTU discovery and validation

Start by validating that the configured OpenShift network MTU is less than the node-level MTU.
Determining the OpenShift Network MTU

# oc get network.config cluster -o jsonpath='{.status.clusterNetworkMTU}{"\n"}'
1400

Determining the node-level MTU

# oc debug node/m2
Starting pod/m2-debug ...
To use host binaries, run `chroot /host`
Pod IP: 198.18.111.14
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ip -o link show dev br-ex | cut -d ' ' -f5
1500

Discovering transport MTU

To discover the effective MTU on a particular path between two nodes, use tracepath tool included in RHCOS.

sh-4.4# tracepath -n <destination_node> -m <max_num_hops>

Discovering MTU in the same Layer2 domain

sh-4.4# tracepath -n node2.ocp.lab -m 5
 1?: [LOCALHOST]                      pmtu 1500
 1:  198.18.111.12                                  6.935ms reached
 1:  198.18.111.12                                  2.399ms reached
     Resume: pmtu 1500 hops 1 back 1

Discovering MTU across Layer3 domains

sh-4.4# tracepath -n node2.ocp.lab -m 10
 1?: [LOCALHOST]                      pmtu 1500
 1:  198.18.111.1                                   1.826ms
 1:  198.18.111.1                                   1.182ms
 2:  192.168.1.1                                    6.555ms
 3:  71.179.179.1                                  12.475ms
 4:  100.41.23.30                                  14.776ms
 5:  140.222.9.85                                  15.904ms asymm  6
 6:  204.148.170.210                               12.241ms asymm  8
 7:  no reply
 8:  no reply
 9:  52.93.28.192                                  16.233ms asymm 11
10:  no reply
     Too many hops: pmtu 1500
     Resume: pmtu 1500

Note: in the example above we limited the number of hops to 10 (default is 30).

To do the MTU validation, execute a ping with the don’t fragment flag as described in KCS 2440411

Example:

When testing for MTU of 1500, after subtracting the overhead of 28 bytes (see KCS 2440411) it will look similar to:

sh-4.4# ping -c 10 -M do -s 1472 198.18.111.14
PING m2.ocp.lab (198.18.111.14) 1472(1500) bytes of data.
1480 bytes from m2 (198.18.111.14): icmp_seq=1 ttl=58 time=24.6 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=2 ttl=58 time=15.0 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=3 ttl=58 time=16.8 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=4 ttl=58 time=15.5 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=5 ttl=58 time=17.1 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=6 ttl=58 time=19.10 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=7 ttl=58 time=18.3 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=8 ttl=58 time=16.4 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=9 ttl=58 time=15.3 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=10 ttl=58 time=12.1 ms

--- m2.ocp.lab ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9013ms
rtt min/avg/max/mdev = 12.107/17.124/24.614/3.195 ms

SBR

Product(s)

Red Hat OpenShift Container Platform

Category

Configure

Article Type

Reference Architecture