Openshift onboard applications performing reserve dns query may face performance issue after migrating OCP cluster from SDN to OVN.
Environment
- Openshift 4.X with OVN
- Application container performing reverse dns query
Issue
- Openshift onboard applications performing reverse DNS is observing latency after migration to OVN.
Resolution
- Ensure infra DNS servers respond quickly either with
NXDOMAINorServFailfor these reverse dns queries application is generating. With OVN, applications will see OVN internal ip addresses as source ip. While performing reverse dns query, applications will try to resolve reverse dns for these OVN internal ips i.e100.64.x.x. Some DNS servers do not respond correctly to these ip series reverse lookup and may only generateServFailwith latency. This latency would greatly reduce application performance as application wait for dns responses. - Red Hat engineering team is aware of this behaviour via This content is not included.RFE 4732 and exploring option to improve CoreDNS behaviour by answering this reverse dns query locally and not forwarding these reverse dns queries to upstream dns servers (for the OVN internal ips). As per DNS Content from datatracker.ietf.org is not included.rfc standards, DNS server should not forward reverse dns queries to global dns servers.
- For more clarity please reach out to Red Hat technical support.
Workaround
- Reverse lookup at application level can be
disabledif not required or not in use. This eventually significantly improve application performance as the application doesn't have to wait for dns response.
Root Cause
- If application is performing reverse dns query, with OVN CNI, container application will see all traffic coming from reserved ip range i.e 100.64.x.x due to natting (unless service ETP policy is set to
local). Traffic reaching CoreDNS will be forwarded to upstream DNS. If upstream DNS is not responding withNXDOMAINorServFailon time, this delay will appear as latency for client applications.
Diagnostic Steps
- Outside openshift run
dig -x 100.64.0.1to check how upstream dns is responding. Ensure there is no latency reported by dig or in tcpdump. - If the cluster admin change this reserved range at OVN level, their dns must ensure to respond reverse dns queries for the same range.
- Sometime only
ServFailtakes time,NXDOMAINis in most cases delivered fast.
SBR
Product(s)
Components
Category
Tags
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.