Troubleshooting Openshift Container Platform DNS in 3.6+

Updated


Table of contents

Overview

Openshift Container Platform needs internal DNS services to resolve endpoints and services. This has been widely discussed and all the benefits of an internal DNS can be checked in the docs

Beware! Use this guide only to debug OCP installations version 3.6 or higher.

Components

The internal DNS structure is built on dnsmasq and skydns, and the configuration is managed via NetworkManager dispatchers.

dnsmasq is a caching DNS server. This means that dnsmasq can answer queries from cache or forward them to an external real DNS. It's installed on every master and node.

skydns is a DNS server built on top of etcd, it's embedded in the master and node daemons.

NetworkManager launches the origin dispatcher /etc/NetworkManager/dispatcher.d/99-origin-dns.sh to configure /etc/resolv.conf and some other files.

DNS Flow

When a query is originated from within a pod or a host it will follow this flow:

  1. [query] /etc/resolv.conf is queried, it should contain a nameserver with the host's internal ip and also the search domains to complete domains (cluster.local and others).
    • Node private ip is used as a nameserver, and dnsmasq is listening on that ip and will be used to resolve the query.
  2. [cache] dnsmasq answers the queries if it knows it from cache. Otherwise, go to step 3.
  3. [routing decision] Is it a cluster.local or in-addr.arpa query? If so, forward it to skydns (listening on 127.0.0.1:53). Otherwise, forward it to upstream DNS servers.

If the query is forwarded to skydns, then it's the node daemon's responsibility to answer it. If it's forwarded to the upstream DNS servers, it's an external domain and will be resolved by it.

Troubleshooting

NetworkManager

  • Check NetworkManager service: it should be up and running.
  • Check /etc/NetworkManager/dispatcher.d/99-origin-dns.sh: it should be executable.

resolv.conf

  • It should contain the host's internal ip and the proper searchdomains. Note that this should be generated by NetworkManager.
# nameserver updated by /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# Generated by NetworkManager
search cluster.local my.lab.example.com
nameserver 10.20.30.41

dnsmasq

  • dnsmasq configuration lives in /etc/dnsmasq.d and it should contain the following files:

/etc/dnsmasq.d/origin-dns.conf configures dnsmasq itself, where it is listening and some other parameters.

  • Configuration in v3.6: dnsmasq only listens on the internal ip address
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
bind-interfaces
listen-address=10.20.30.41
  • Configuration in v3.7 or higher: dnsmasq listens on every interface except 127.0.0.1 (to avoid collisions)
no-resolv
domain-needed
no-negcache
max-cache-ttl=1
enable-dbus
dns-forward-max=10000
cache-size=10000
min-port=1024
bind-dynamic
except-interface=lo

/etc/dnsmasq.d/origin-upstream-dns.conf configures the upstream DNS servers.

server=1.1.1.1
server=1.0.0.1

/etc/dnsmasq.d/node-dnsmasq.conf configures the forwarding of internal queries. Note that the node-dnsmasq.conf file no longer exists in 3.10 as OpenShift supports dynamic dnsmasq configuration via dbus and reflect the configuration directly.

server=/in-addr.arpa/127.0.0.1
server=/cluster.local/127.0.0.1
  • With this configuration, dnsmasq will be started and listening on all host's interfaces except loopback (127.0.0.1) which is reserved for skydns, and forwards upstream queries to 1.1.1.1,1.0.0.1 and internal queries to 127.0.0.1:53

  • Check dnsmasq service:
    systemctl status dnsmasq -l

Output:

● dnsmasq.service - DNS caching server.
   Loaded: loaded (/usr/lib/systemd/system/dnsmasq.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2018-04-02 04:39:54 EDT; 1 weeks 3 days ago
 Main PID: 8828 (dnsmasq)
   Memory: 760.0K
   CGroup: /system.slice/dnsmasq.service
           └─8828 /usr/sbin/dnsmasq -k

Apr 12 07:54:09 master-0.my.lab.example.com dnsmasq[8828]: setting upstream servers from DBus

### The following servers are the upstream DNS servers dnsmasq will use to forward queries it can't resolve (external domains mainly)
Apr 12 07:54:09 master-0.my.lab.example.com dnsmasq[8828]: using nameserver 1.1.1.1#53
Apr 12 07:54:09 master-0.my.lab.example.com dnsmasq[8828]: using nameserver 1.0.0.1#53

### For this domains, dnsmasq will redirect the queries to 127.0.0.1:53
Apr 12 07:54:09 master-0.my.lab.example.com dnsmasq[8828]: using nameserver 127.0.0.1#53 for domain in-addr.arpa
Apr 12 07:54:09 master-0.my.lab.example.com dnsmasq[8828]: using nameserver 127.0.0.1#53 for domain cluster.local

Check the Cache

In some situations is important to check what is in the DNSMasq cache. To do this simply get the PID of the dnsmasq service and send SIGUSR1 to the process.

# kill -s SIGUSR1 $PID 

This will push out the contents of the cache to the process logs, so you can view them with commands listed above.

skydns

  • SkyDNS configuration lives in /etc/origin/node/node-config.yml

    • In master nodes, two skydns instances will be up: one for the node daemon (listening on 127.0.0.1:53) and another for the master daemon (listening on 0.0.0.0:8053).
    • The configuration of the master's skydns is in /etc/origin/master/master-config.yml
  • SkyDNS will be automatically started when the node or the master daemon starts. Note that in master nodes both daemons should be started.

/etc/origin/node/node-config.yml configures the skydns instance listening on the node daemon with the following parameters:

dnsBindAddress: 127.0.0.1:53
dnsRecursiveResolvConf: /etc/origin/node/resolv.conf
dnsDomain: cluster.local
dnsIP: 10.20.30.41

dnsBindAddress: configures the ip:port for skydns to listen.
dnsRecursiveResolvConf: is used for some recursive resolvs, those resolvs which needs external forwarding. The /etc/origin/node/resolv.conf file should exist and contain the upstream nameserver. It is created by the NetworkManager dispatcher.
dnsDomain: configures the domain skydns will resolve.
dnsIP: is the host internal ip.

/etc/origin/master/master-config.yml configures the master daemon skydns instance with the following parameters:

dnsConfig:
  bindAddress: 0.0.0.0:8053
  bindNetwork: tcp4

Master is listening on 0.0.0.0:8053 to avoid port collisions.

  • A netstat -tulpn | grep 53 should show every daemon listening on the correct port:
tcp        0      0 0.0.0.0:8053            0.0.0.0:*               LISTEN      19076/openshift     
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      38076/openshift     
tcp        0      0 10.128.0.1:53           0.0.0.0:*               LISTEN      8828/dnsmasq        
tcp        0      0 10.74.157.231:53        0.0.0.0:*               LISTEN      8828/dnsmasq        
tcp        0      0 172.17.0.1:53           0.0.0.0:*               LISTEN      8828/dnsmasq

0.0.0.0:8053: master skydns
127.0.0.1:53: node skydns
dnsmasq is listening on all the available IPs except 127.0.0.1 because we set bind-dynamic and except-interface=lo in the config.

SBR
Category
Components
Article Type