Troubleshooting OpenShift Container Platform 3.x: DNS
Environment
- OpenShift Containger Platform 3.x
Issue
- I think that I am having issue resolving from with in my OpenShift Cluster.
Resolution
- DNS in OpenShift 3.6+
- DNS in OpenShift 3.2 - 3.5
- DNS in OpenShift 3.1 and lower
- DNS on an Openshift Master
- DNS on an OpenShift Node
- DNS troubleshooting from the hosts
- DNS troubleshooting from a container/pod
###DNS in OpenShift 3.6 {#ose36}
- Troubleshooting DNS 3.6+
- See the following blog post on detailed information on how dns works with 3.6+ : This content is not included.OCP dns deep dive
- Starting in 3.6, dnsmasq is required and new installs will fail if it is disabled.
- In the Ansible hosts file,
openshift_node_env_varsshould not contain--disable=dns,proxy - In the Ansible hosts file,
openshift_use_dnsmasqmust be set to true (this is the default)
- In the Ansible hosts file,
DNS in OpenShift 3.2 - 3.5
In OpenShift 3.2 [DNSMASQ was introduced](https://docs.openshift.com/enterprise/3.2/release_notes/ose_3_2_release_notes.html#ose-32-dns-changes).
-
By default, new nodes installed with OpenShift Enterprise 3.2 will have Dnsmasq installed and configured as the default nameserver for both the host and pods.
-
By default, new masters installed with OpenShift Enterprise 3.2 will run SkyDNS on port 8053 rather than 53. Network access controls must allow nodes to connect to masters on port 8053. This is necessary so that Dnsmasq may be configured on all nodes.
DNS in OpenShift 3.1 and lower
-
With OpenShift 3.1 pod and containers running on a node would get SkyDNS's IP appended to the resolv.conf when started. For example is a node had resolv.conf that looked like the following.
-
resolv.conf from a Node
# cat /etc/resolv.conf nameserver 8.8.8.8 nameserver 208.67.222.222 search redhat.com
-
-
The resolv.conf in a pod/container in OpenShift would look like the below.
- resolv.conf from a container
# cat /etc/resolv.conf nameserver 172.30.0.1 nameserver 8.8.8.8 nameserver 208.67.222.222 search default.svc.cluster.local svc.cluster.local cluster.local redhat.com options ndots:5
DNS on an Openshift Master
-
The master is where SKYDNS will be running on port 8053 by default as to not interfere with DNSMASQ.
# grep dns -A1 /etc/origin/master/master-config.yaml- If this value is changed the master services will need to be restarted
systemctl restart atomic-openshift-master*
- If this value is changed the master services will need to be restarted
DNS on an OpenShift Node
-
To confirm DNSMASQ is working and the node is able to resolve service queries try the following:
# dig @127.0.0.1 kubernetes.default.svc.cluster.local -
NetworkManager is required and nameservers must be configured by NetworkManager
# nmcli connection show eth0 | grep IP4.DNS-
You might see an error when referencing
eth0, if you do, please collect the same information on whatever connections do exist on that system. -
If no dns nameservers are shown, add your external dns nameservers with the following commands per Using the NetworkManager Command Line Tool:
Add new nameservers: # nmcli con mod eth0 ipv4.dns "8.8.8.8 8.8.4.4" Append nameservers: # nmcli con mod my-con-em1 +ipv4.dns "8.8.8.8" To apply changes after a modified connection using nmcli, activate again the connection by entering this command: #nmcli con up con-name
-
-
A NetworkManager Dispatch script is added to the node during the OpenShift Install. This script configures DNSMASQ, setting the the nodes DNS nameservers as forwarding servers. Also overwrites the nodes resolv.conf to only use itself as a nameserver.
# cat /etc/resolv.conf
# cat /etc/dnsmasq.d/origin-*- If the node does not have the NetworkManager Dispatch installed please see the solution OpenShift 3 manually adding NetworkManager DNSMASQ dispatcher script
# ls /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
- If the node does not have the NetworkManager Dispatch installed please see the solution OpenShift 3 manually adding NetworkManager DNSMASQ dispatcher script
-
In the OpenShift node-config.yaml OpenShift passes the value for dnsIP to the pod/container's resolv.conf. This value should be equal to the nodes IP address when using DNSMASQ.
# ip route get 8.8.8.8 | awk 'NR==1 {print $NF}'
# grep dnsIP -A1 /etc/origin/node/node-config.yaml- If DNSMASQ is not being used either set to kubernetes service IP which is usually
172.30.0.1, or remove entirely as it defaults to this value.
- If DNSMASQ is not being used either set to kubernetes service IP which is usually
-
In some versions the following variables shall also be set in
node-config.yaml:
dnsBindAddress: 127.0.0.1:53
dnsRecursiveResolvConf: /etc/origin/node/resolv.conf
Troubleshooting
If opening up at This content is not included.support ticket please provide or be prepared to provide the following:
-
Login in as system:admin on the master host
# oc login -u system:admin --config=/etc/origin/master/admin.kubeconfig -
Collect the following from the master:
# oadm diagnostics
# netstat -tupln | grep -e 53 -e 8053
# cat /etc/resolv.conf
# oc describe endpoints kubernetes -n default
# grep dns -A1 /etc/origin/master/master-config.yaml
# for ip in $(oc get endpoints kubernetes -n default \
-o 'jsonpath={.subsets[*].addresses[*].ip}'); \
do dig -p $(oc get endpoints kubernetes -n default \
-o 'jsonpath={.subsets[*].ports[?(@.name=="dns")].port}') \
@$ip kubernetes.default.svc.cluster.local +short ; done
-
Please also collect the debug script, as described in our documentation
-
Remote to a OpenShift Node and collect the following:
# systemctl status dnsmasq
# netstat -tupla | grep 53
# nmcli connection show eth0 | grep IP4.DNS
# cat /etc/resolv.conf
# cat /etc/dnsmasq.d/origin-*
# ls -la /etc/NetworkManager/dispatcher.d/99-origin-dns.sh
# grep dnsIP -A1 /etc/origin/node/node-config.yaml
# ip route get 8.8.8.8 | awk 'NR==1 {print $NF}'
- You might see an error when referencing
eth0, if you do, please collect the same information on whatever connections do exist on that system.
DNS troubleshooting a from container/pod
# docker ps
# cid=<docker-container-id>
# sudo nsenter -n -i -p -t $(sudo docker inspect --format "{{ .State.Pid }}" "$cid") <<NSEOF
dig kubernetes.default.svc.cluster.local +short
dig access.redhat.com +short
cat /etc/resolv.conf
NSEOF
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.