Configure OpenShift internal CoreDNS logging in OCP 4

Solution Unverified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • CoreDNS

Issue

  • It's needed to set up more verbose logging for CoreDNS.
  • How to increase the log verbosity for CoreDNS in OpenShift 4?
  • CoreDNS is sending to many health check queries to upstream name-servers.

Resolution

OpenShift 4.10

OpenShift 4.9 and older

  • For OpenShift 4.9 and above the DNS operator can be placed in an unmanaged state in which the DNS operator will not manage CoreDNS configuration:

      	$ oc patch dnses.operator.openshift.io default --type=json -p '[{"op": "add", "path": "/spec/managementState", "value":"Unmanaged"}]'
    
  • For OpenShift 4.8 and below the DNS operator needs to be scaled down:

    		$ cat <<EOF >version-patch-add-override.yaml
    		- op: add
    		  path: /spec/overrides
    		  value:
    		  - kind: Deployment
    		    group: apps/v1
    		    name: dns-operator
    		    namespace: openshift-dns-operator
    		    unmanaged: true
    		EOF
    
    		$ oc patch clusterversion version --type json -p "$(cat version-patch-add-override.yaml)"
    
    		$ oc scale deployment --replicas=0 dns-operator -n openshift-dns-operator  
    
      **Note: This override will need to be removed to upgrade the cluster**
    
  • Once the operator is in an unmanaged state or scaled down the CoreDNS configuration can be modified:

      	$ oc edit cm dns-default -o yaml -n openshift-dns
    
  • A restart of the pods is not needed, CoreDNS will reload with the new configuration after 20-30 seconds.

  • To revert back to default managed state refer below steps:

    • For OpenShift 4.9+:
      	$ oc patch dnses.operator.openshift.io default --type=json -p '[{"op": "remove", "path": "/spec/managementState"}]' 
    
    • For 4.8 and below:
      	$ oc patch clusterversion version --type json -p '[{"op":"remove", "path":"/spec/overrides"}]'
    
  • Refer to CoreDNS docs for more configurations:

Root Cause

In OpenShift 4.9 and older versions, it's needed to configure the DNS operator as unmanaged. Starting with OpenShift 4.10, some configurations like the log level can be done via the operator without setting the operator as unmanaged.

Diagnostic Steps

  • After disabling operator, changes can be made to configuration. Below is an example of enabling the log plugin and setting health checks on upstream to 5s vs the default 0.5s in OpenShift 4.9 and previous versions:
   $ oc edit cm dns-default -o yaml -n openshift-dns
   apiVersion: v1
   data:
     Corefile: |
       .:5353 {
   	bufsize 512
   	errors
   	health {
   	    lameduck 20s
   	}
   	ready
   	kubernetes cluster.local in-addr.arpa ip6.arpa {
   	    pods insecure
   	    fallthrough in-addr.arpa ip6.arpa
   	}
   	prometheus 127.0.0.1:9153
   	forward . /etc/resolv.conf {
   	    policy sequential
   	    health_check 5s
   	}
   	log
   	cache 900 {
   	    denial 9984 30
   	}
   	reload
       }
   kind: ConfigMap
  • Confirm Changes:
   $ oc get cm dns-default --template '{{index .data "Corefile"}}'
  • View Logs:
   $ oc logs -c dns dns-default-4j5rz 
   .:5353
   [INFO] plugin/reload: Running configuration MD5 = eb791f1fb4e1f964e4a7377f6b122c87
   CoreDNS-1.8.4
   linux/amd64, go1.16.12, 
   [INFO] Reloading
   [INFO] plugin/health: Going into lameduck mode for 20s
   [INFO] plugin/reload: Running configuration MD5 = 35fb47bf4ef4c4b427182149bb3a5d0c
   [INFO] Reloading complete
   [INFO] 10.129.0.18:51163 - 999 "AAAA IN console-openshift-console.apps.ocp49.redhat.com. udp 103 false 512" NOERROR qr,rd,ra 221 0.001559006s
   [INFO] 10.129.0.18:54696 - 9503 "A IN console-openshift-console.apps.ocp49.redhat.com. udp 103 false 512" NOERROR qr,rd,ra 588 0.002329169s
   [INFO] 10.129.0.26:50438 - 49655 "A IN kubernetes.default.svc.cluster.local. udp 65 false 512" NOERROR qr,aa,rd 106 0.000117205s
   [INFO] 10.128.0.39:52886 - 49576 "A IN registry.access.redhat.com. tcp 55 false 65535" NOERROR qr,aa,rd,ra 1234 0.001120846s
   [INFO] 10.128.0.39:52958 - 4188 "AAAA IN quay.io. tcp 36 false 65535" NOERROR qr,aa,rd,ra 747 0.000085856s
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.