Port collisions between pod and cluster IPs on OpenShift 4 with OVN-Kubernetes

Solution Verified - Updated

Environment

  • Red Hat Openshift Container Platform (OCP) 4.x with OVN Kubernetes

Issue

  • pod-to-pod communication is suddenly broken after a while, even if both source/destination pods are hosted on the same Openshift worker node; if pods are redeployed, they can communicate again for some time before the issue arises again
  • The OVS pod are showing failed (Invalid argument) on packet logs:
2021-03-12T08:12:40.670Z|00004|dpif(handler10)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19590) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:14c4 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0

Resolution

  • The issue is tracked in This content is not included.Bugzilla 1939676 and This content is not included.Bugzilla 1939045 and a permanent fix is expected in an Openshift 4.6.z release
  • With that bugzillas potential fix, we would SNAT(0.0.0.0) "null snat" in conntrack, this preserves the IP address and it will only modify the port if there is a collision in the conntrack table. The caveat of this is that the packet may arrive at the server with a different source port it was sent with, which may or may not be desirable.
  • As a temporary workaround, your application shouldn't use hardcoded ports causing such collision; the application behavior should be changed to not use together ClusterIPs and backed PodIPs with the same ports (more details in the Root Cause section).
    This issue will not occur with randomized source port, because the client would have chosen a different source port for both connections

Root Cause

There's a potential scenario in Kubernetes where a pod trying to talk to another pod and a service backed by that pod could result in a port collision and a conntrack will refuse to commit the connection. Consider the following UDP traffic flow between a client and server:

client (10.129.2.7:5054) -> server (10.129.2.8:5088)

This will create an entry in conntrack similar to this:

udp,orig=(src=10.129.2.7,dst=10.129.2.8,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=85

Around the same time, the client sends a packet (using the same source port) to a Kubernetes service, which is backend in the same server. In OVN this is treated as a load_balancer, and let's assume the VIP on this LB is 172.30.9.90.

client (10.129.2.7:5054) -> 172.30.9.90 (OVN LB DNAT) -> server (10.129.2.8:5088)

This results in a conntrack entry similar to:

udp,orig=(src=10.129.2.7,dst=172.30.9.90,sport=5054,dport=5088),reply=(src=10.129.2.8,dst=10.129.2.7,sport=5088,dport=5054),zone=84,labels=0x2

The problem here is the reply tuple is the same between both sessions. This will result in errors in OVS due to a failure to commit the conntrack entries because of collisions:

2021-03-12T08:12:40.670Z|00004|dpif(handler10)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19590) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:14c4 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0
2021-03-12T08:15:12.680Z|00008|dpif(handler13)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19697) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:d000 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0

In order to avoid this scenario, openshift-sdn has implemented a SNAT(0.0.0.0) CT flow that will ensure if there is a port collision, that the port is changed:
Content from github.com is not included.SDN Pull Request

With opened bugzillas, the request is to implement a similar fix in OVN, so that if there are such port collisions in traffic flow than we workaround it.

Diagnostic Steps

  • Collect a network must-gather:
$ oc adm must-gather -- /usr/bin/bash -c '/usr/bin/gather_network_logs ; /usr/bin/gather_service_logs worker'
  • Check for conntrack logs issue on ovs_vswitchd service on <must-gather dir dump>/network_logs/<worker nodename>_ovs_vswitchd_log :
2021-03-12T08:12:40.670Z|00004|dpif(handler10)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19590) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:14c4 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0
2021-03-12T08:15:12.680Z|00008|dpif(handler13)|WARN|system@ovs-system: execute ct(commit,zone=84,label=0/0x1),ct(zone=85),recirc(0x19697) failed (Invalid argument) on packet udp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:07,dl_dst=0a:58:0a:81:02:08,nw_src=10.129.2.7,nw_dst=10.129.2.8,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=5054,tp_dst=5088 udp_csum:d000 with metadata skb_priority(0),skb_mark(0),ct_state(0x21),ct_zone(0x54),ct_tuple4(src=10.129.2.7,dst=10.129.2.8,proto=17,tp_src=5054,tp_dst=5088),in_port(16) mtu 0
SBR
Components
Category
Tags

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.