How to change the pids_limit/podPidsLimit for pods in OpenShift 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Kubelet
podPidsLimit
Issue
-
How to change the value of
pids_limitin OpenShift 4? -
How to configure
podPidsLimitusingKubeletConfigin OpenShift 4? -
What is the default value of
podPidsLimitin OpenShift 4? -
Getting one of the following exceptions in applications when migrating from OCP3 to OCP4:
java.lang.OutOfMemoryError: unable to create new native thread java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
Resolution
Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.
The PID limits is configured at Kubelet level, so it is configured in the nodes via KubeletConfig resource (in older versions, it was also needed to configure in the ContainerRuntimeConfig resource, but starting with OCP 4.11, This content is not included.the configuration in CRI-O is deprecated in favor of the configuration in the KubeletConfig, and the default podPidsLimit in Kubelet changed to 4096).
Refer to understanding process ID limits for additional information.
IMPORTANT NOTE: as explained in the above linked documentation, the maximum value for
podPidsLimitis 16,384.
Configuring PID limits
>**Note:** For OSD and ROSA, refer to [How to change the pids_limit in OSD or ROSA](https://access.redhat.com/solutions/6769641).
It is possible to increase the pids_limit in both the KubeletConfig and ContainerRuntimeConfig.
-
The steps below assume the cluster has no current
kubeletconfigs. Verify this is true before proceeding:$ oc get kubeletconfigs No resources found in openshift-etcd namespaceNote: if there are
kubeletconfigsalready in the cluster, please refer to the considerations in creating a KubeletConfig CR to edit kubelet parameters to check if the best option is to create a newkubeletconfigor edit an existent one. -
Label the worker pool:
$ oc label machineconfigpool worker custom-crio=high-pid-limit custom-kubelet=small-pods -
Create a custom
KubeletConfigwith the following command, setting thePOD_PIDS_LIMITvalue as needed (the default configuration starting with OCP 4.11 is4096):$ POD_PIDS_LIMIT=4096 $ oc apply -f - <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: worker-kubeconfig-fix spec: machineConfigPoolSelector: matchLabels: custom-kubelet: small-pods kubeletConfig: podPidsLimit: $POD_PIDS_LIMIT EOFIMPORTANT: Running the
oc applycommands will result in a new Machine Config (MC) being created. This MC will be rolled out to each worker node. Each node will cordon, drain, apply config, reboot, and uncordon; so it is expected for the nodes to reboot. -
(Only for OCP 4.10 and older) Create a custom
ContainerRuntimeConfigwith the following command, setting thePIDS_LIMITvalue according to the OpenShift version as explained below:$ PIDS_LIMIT=4096 $ oc apply -f - <<EOF apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: set-pids-limit spec: machineConfigPoolSelector: matchLabels: custom-crio: high-pid-limit containerRuntimeConfig: pidsLimit: $PIDS_LIMIT EOFNote: as the configuration of
pidsLimitvia theContainerRuntimeConfigis deprecated in current OpenShift releases, if there is aContainerRuntimeConfigonly to configure thepidsLimit, the resource can be removed. If additional configurations are being done via theContainerRuntimeConfig, thepidsLimitcan be removed from the resource.
Notes:
- Use
pidsLimit: -1value for OpenShift versions 4.5 to 4.7.pidsLimit: -1means no limit will be enforced by CRI-O, and limits set by the kubelet will be honored.- In OpenShift 4.4 and older versions,
pidsLimit: -1Content from github.com is not included.has no effect and it is needed to specify a matching value topodPidsLimitin theKubeletConfig.- In OpenShift 4.8 and newer versions,
pidsLimit: -1is not valid. Per Red Hat This content is not included.BZ 2039187, it must be set to a value greater than 20. It is needed to specify a matching value topodPidsLimitin theKubeletConfig.
The initial creation of a cluster may interfere with this configuration taking hold and sometimes may need more time before applying. Red Hat investigated this issue in bug report This content is not included.BZ 2100894 and delivered a fix in OpenShift 4.10.25 through errata RHSA-2022:5730. If this issue still occurs in your environment after updating, This content is not included.open a support case in the Red Hat Customer Portal referring to this solution.
Root Cause
This solution explains the mechanism to increase the PID container's limit (defaults to 4096 on Openshift 4.11+), therefore it can avoid OutOfMemoryException in Java application or core dumps crases in .NET application. However, the increase does not solve any underlying condition such as: misconfigurations, threads leaks, inadequate containerized applications, or cgroups not being detected. A subsequent investigation must be followed for this due diligence - the scope of the investigation goes beyond this solution.
The KubeletConfig is the configuration for pods, while ContainerRuntimeConfig is for configuration for containers. In versions before OCP 4.11, it is needed to configure both the KubeletConfig and ContainerRuntimeConfig resources.
Finally, starting with OCP 4.11 This content is not included.the configuration in CRI-O is deprecated in favor of the configuration in the KubeletConfig, and the default podPidsLimit changed to 4096.
Diagnostic Steps
-
Monitor
/sys/fs/cgroup/pids/pids.currentwhen the application is running to verifyjava.lang.OutOfMemoryError: unable to create new native threador similar errors happen when it hits1024(or4096in OCP 4.11+). -
For OCP 4.10 and previous releases, check if the CRI-O
pids_limitis being set on the node where the application container is running:$ crio config | grep pids_limit INFO[2022-01-31 12:14:27.407346183Z] Starting CRI-O, version: 1.21.4-4.rhaos4.8.git84fa55d.el8, git: () INFO Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL pids_limit = 4096If not, the default (1024) applies (or
4096in OCP 4.11+). -
Verify the kubelet
podPidsLimitis being set in/etc/kubernetes/kubelet.confandSupportPodPidsLimit(only in 4.10 and older) is set running the following command:$ oc debug node/[node_name] -- cat /host/etc/kubernetes/kubelet.conf | jq '.podPidsLimit, .featureGates' Starting pod/[node_name]-debug ... To use host binaries, run `chroot /host` Removing debug pod ... 2048 { "LegacyNodeRoleBehavior": false, "NodeDisruptionExclusion": true, "RotateKubeletServerCertificate": true, "SCTPSupport": true, "ServiceNodeExclusion": true, "SupportPodPidsLimit": true }In newer releases, it's not a
jsonfile, so use the following command instead:$ oc debug node/[node_name] -- cat /host/etc/kubernetes/kubelet.conf | grep podPidsLimit podPidsLimit: 4096If not configured, the default (1024) applies (or
4096in OCP 4.11+). -
Verify the labels for the
ContainerRuntimeConfig(only in OCP 4.10 and previous release) andKubeletConfigwere created and applied:$ oc get kubeletconfig,containerruntimeconfig NAME AGE kubeletconfig/worker-kubeconfig-fix 9d NAME AGE containerruntimeconfig/set-pids-limit 15d $ oc get mcp/worker -ojson | jq '.metadata.labels' { "custom-crio": "high-pid-limit", "custom-kubelet": "small-pods", "machineconfiguration.openshift.io/mco-built-in": "", "pools.operator.machineconfiguration.openshift.io/worker": "" } $ oc get kubeletconfig/worker-kubeconfig-fix -ojson | jq '.status.conditions[]' { "lastTransitionTime": "2022-02-10T04:46:17Z", "message": "Success", "status": "True", "type": "Success" }
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.