How to change the pids_limit/podPidsLimit for pods in OpenShift 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Kubelet
  • podPidsLimit

Issue

  • How to change the value of pids_limit in OpenShift 4?

  • How to configure podPidsLimit using KubeletConfig in OpenShift 4?

  • What is the default value of podPidsLimit in OpenShift 4?

  • Getting one of the following exceptions in applications when migrating from OCP3 to OCP4:

    java.lang.OutOfMemoryError: unable to create new native thread
    java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
    

Resolution

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

The PID limits is configured at Kubelet level, so it is configured in the nodes via KubeletConfig resource (in older versions, it was also needed to configure in the ContainerRuntimeConfig resource, but starting with OCP 4.11, This content is not included.the configuration in CRI-O is deprecated in favor of the configuration in the KubeletConfig, and the default podPidsLimit in Kubelet changed to 4096).
Refer to understanding process ID limits for additional information.

IMPORTANT NOTE: as explained in the above linked documentation, the maximum value for podPidsLimit is 16,384.

Configuring PID limits


>**Note:** For OSD and ROSA, refer to [How to change the pids_limit in OSD or ROSA](https://access.redhat.com/solutions/6769641).

It is possible to increase the pids_limit in both the KubeletConfig and ContainerRuntimeConfig.

  1. The steps below assume the cluster has no current kubeletconfigs. Verify this is true before proceeding:

    $ oc get kubeletconfigs
    No resources found in openshift-etcd namespace
    

    Note: if there are kubeletconfigs already in the cluster, please refer to the considerations in creating a KubeletConfig CR to edit kubelet parameters to check if the best option is to create a new kubeletconfig or edit an existent one.

  2. Label the worker pool:

    $ oc label machineconfigpool worker custom-crio=high-pid-limit custom-kubelet=small-pods
    
  3. Create a custom KubeletConfig with the following command, setting the POD_PIDS_LIMIT value as needed (the default configuration starting with OCP 4.11 is 4096):

    $ POD_PIDS_LIMIT=4096
    $ oc apply -f - <<EOF
    apiVersion: machineconfiguration.openshift.io/v1
    kind: KubeletConfig
    metadata:
      name: worker-kubeconfig-fix
    spec:
      machineConfigPoolSelector:
        matchLabels:
          custom-kubelet: small-pods 
      kubeletConfig:
          podPidsLimit: $POD_PIDS_LIMIT 
    EOF
    

    IMPORTANT: Running the oc apply commands will result in a new Machine Config (MC) being created. This MC will be rolled out to each worker node. Each node will cordon, drain, apply config, reboot, and uncordon; so it is expected for the nodes to reboot.

  4. (Only for OCP 4.10 and older) Create a custom ContainerRuntimeConfig with the following command, setting the PIDS_LIMIT value according to the OpenShift version as explained below:

    $ PIDS_LIMIT=4096
    $ oc apply -f - <<EOF
    apiVersion: machineconfiguration.openshift.io/v1
    kind: ContainerRuntimeConfig
    metadata:
     name: set-pids-limit
    spec:
     machineConfigPoolSelector:
       matchLabels:
         custom-crio: high-pid-limit
     containerRuntimeConfig:
       pidsLimit: $PIDS_LIMIT
    EOF
    

    Note: as the configuration of pidsLimit via the ContainerRuntimeConfig is deprecated in current OpenShift releases, if there is a ContainerRuntimeConfig only to configure the pidsLimit, the resource can be removed. If additional configurations are being done via the ContainerRuntimeConfig, the pidsLimit can be removed from the resource.

Notes:

  • Use pidsLimit: -1 value for OpenShift versions 4.5 to 4.7. pidsLimit: -1 means no limit will be enforced by CRI-O, and limits set by the kubelet will be honored.
  • In OpenShift 4.4 and older versions, pidsLimit: -1 Content from github.com is not included.has no effect and it is needed to specify a matching value to podPidsLimit in the KubeletConfig.
  • In OpenShift 4.8 and newer versions, pidsLimit: -1 is not valid. Per Red Hat This content is not included.BZ 2039187, it must be set to a value greater than 20. It is needed to specify a matching value to podPidsLimit in the KubeletConfig.
    The initial creation of a cluster may interfere with this configuration taking hold and sometimes may need more time before applying. Red Hat investigated this issue in bug report This content is not included.BZ 2100894 and delivered a fix in OpenShift 4.10.25 through errata RHSA-2022:5730. If this issue still occurs in your environment after updating, This content is not included.open a support case in the Red Hat Customer Portal referring to this solution.

Root Cause

This solution explains the mechanism to increase the PID container's limit (defaults to 4096 on Openshift 4.11+), therefore it can avoid OutOfMemoryException in Java application or core dumps crases in .NET application. However, the increase does not solve any underlying condition such as: misconfigurations, threads leaks, inadequate containerized applications, or cgroups not being detected. A subsequent investigation must be followed for this due diligence - the scope of the investigation goes beyond this solution.

The KubeletConfig is the configuration for pods, while ContainerRuntimeConfig is for configuration for containers. In versions before OCP 4.11, it is needed to configure both the KubeletConfig and ContainerRuntimeConfig resources.

Finally, starting with OCP 4.11 This content is not included.the configuration in CRI-O is deprecated in favor of the configuration in the KubeletConfig, and the default podPidsLimit changed to 4096.

Diagnostic Steps

  1. Monitor /sys/fs/cgroup/pids/pids.current when the application is running to verify java.lang.OutOfMemoryError: unable to create new native thread or similar errors happen when it hits 1024 (or 4096 in OCP 4.11+).

  2. For OCP 4.10 and previous releases, check if the CRI-O pids_limit is being set on the node where the application container is running:

    $ crio config | grep pids_limit
    INFO[2022-01-31 12:14:27.407346183Z] Starting CRI-O, version: 1.21.4-4.rhaos4.8.git84fa55d.el8, git: () 
    INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL 
    pids_limit = 4096
    

    If not, the default (1024) applies (or 4096 in OCP 4.11+).

  3. Verify the kubelet podPidsLimit is being set in /etc/kubernetes/kubelet.conf and SupportPodPidsLimit (only in 4.10 and older) is set running the following command:

        $ oc debug node/[node_name] -- cat /host/etc/kubernetes/kubelet.conf | jq '.podPidsLimit, .featureGates'
        Starting pod/[node_name]-debug ...
        To use host binaries, run `chroot /host`
        Removing debug pod ...
    
        2048
        {
          "LegacyNodeRoleBehavior": false,
          "NodeDisruptionExclusion": true,
          "RotateKubeletServerCertificate": true,
          "SCTPSupport": true,
          "ServiceNodeExclusion": true,
          "SupportPodPidsLimit": true
        }
    

    In newer releases, it's not a json file, so use the following command instead:

    $ oc debug node/[node_name] -- cat /host/etc/kubernetes/kubelet.conf | grep podPidsLimit
    podPidsLimit: 4096
    

    If not configured, the default (1024) applies (or 4096 in OCP 4.11+).

  4. Verify the labels for the ContainerRuntimeConfig (only in OCP 4.10 and previous release) and KubeletConfig were created and applied:

        $ oc get kubeletconfig,containerruntimeconfig
        NAME                                 AGE
        kubeletconfig/worker-kubeconfig-fix  9d
    
        NAME                                   AGE
        containerruntimeconfig/set-pids-limit  15d
    
        $ oc get mcp/worker -ojson | jq '.metadata.labels'
        {
          "custom-crio": "high-pid-limit",
          "custom-kubelet": "small-pods",
          "machineconfiguration.openshift.io/mco-built-in": "",
          "pools.operator.machineconfiguration.openshift.io/worker": ""
        }
    
        $ oc get kubeletconfig/worker-kubeconfig-fix -ojson | jq '.status.conditions[]'
        {
          "lastTransitionTime": "2022-02-10T04:46:17Z",
          "message": "Success",
          "status": "True",
          "type": "Success"
        }
    
SBR

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.