Understanding and Validating MTU setting with OpenShift Container Platform 4.x
In many situations OpenShift deployments span logical networks boundaries (sometimes run or managed by different teams or organizations). In these situations understanding how to identify the MTU values of the networks your OpenShift members will run on and set a proper MTU value for OpenShift improves the clusters operational SLA's/SLO's and reduces service-affecting event and instability of the cluster.
Example:
If you are deploying a stretched cluster (across large geographic distances), or using a mix of IaaS or Cloud provider resources to structure your cluster. IE: if you run a virtualization platform, which is configured for an MTU of 1500, but you also use bare-metal servers from your data center with MTU configuration of 9000, and have Cloud resources on a network with MTU of 4000.
In this example the minimum effective MTU across the infrastructure should be the maximum MTU used for the deployment (using a lower MTU is okay). Or for this deployment the MTU configuration of the cluster must use an MTU configuration of 1500 or less.
MTU discovery and validation
Start by validating that the configured OpenShift network MTU is less than the node-level MTU.
Determining the OpenShift Network MTU
# oc get network.config cluster -o jsonpath='{.status.clusterNetworkMTU}{"\n"}'
1400
Determining the node-level MTU
# oc debug node/m2
Starting pod/m2-debug ...
To use host binaries, run `chroot /host`
Pod IP: 198.18.111.14
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ip -o link show dev br-ex | cut -d ' ' -f5
1500
Discovering transport MTU
To discover the effective MTU on a particular path between two nodes, use tracepath tool included in RHCOS.
sh-4.4# tracepath -n <destination_node> -m <max_num_hops>
Discovering MTU in the same Layer2 domain
sh-4.4# tracepath -n node2.ocp.lab -m 5
1?: [LOCALHOST] pmtu 1500
1: 198.18.111.12 6.935ms reached
1: 198.18.111.12 2.399ms reached
Resume: pmtu 1500 hops 1 back 1
Discovering MTU across Layer3 domains
sh-4.4# tracepath -n node2.ocp.lab -m 10
1?: [LOCALHOST] pmtu 1500
1: 198.18.111.1 1.826ms
1: 198.18.111.1 1.182ms
2: 192.168.1.1 6.555ms
3: 71.179.179.1 12.475ms
4: 100.41.23.30 14.776ms
5: 140.222.9.85 15.904ms asymm 6
6: 204.148.170.210 12.241ms asymm 8
7: no reply
8: no reply
9: 52.93.28.192 16.233ms asymm 11
10: no reply
Too many hops: pmtu 1500
Resume: pmtu 1500
- Note: in the example above we limited the number of hops to 10 (default is 30).
To do the MTU validation, execute a ping with the don’t fragment flag as described in KCS 2440411
Example:
When testing for MTU of 1500, after subtracting the overhead of 28 bytes (see KCS 2440411) it will look similar to:
sh-4.4# ping -c 10 -M do -s 1472 198.18.111.14
PING m2.ocp.lab (198.18.111.14) 1472(1500) bytes of data.
1480 bytes from m2 (198.18.111.14): icmp_seq=1 ttl=58 time=24.6 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=2 ttl=58 time=15.0 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=3 ttl=58 time=16.8 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=4 ttl=58 time=15.5 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=5 ttl=58 time=17.1 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=6 ttl=58 time=19.10 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=7 ttl=58 time=18.3 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=8 ttl=58 time=16.4 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=9 ttl=58 time=15.3 ms
1480 bytes from m2 (198.18.111.14): icmp_seq=10 ttl=58 time=12.1 ms
--- m2.ocp.lab ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9013ms
rtt min/avg/max/mdev = 12.107/17.124/24.614/3.195 ms