Using a separate disk for node container storage on OpenShift 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.12 or higher
Issue
- How to mount separate disk to
/var/lib/containerson Red Hat OpenShift 4 nodes.
Resolution
Mounting a separate/secondary disk for container storage in OpenShift Container Platform 4 is done via the Machine Config Operator using MachineConfigs.
If all nodes, including masters and/or workers, are identical, such as in the case of many Cloud instances or racks of matching bare-metal servers, then this configuration change can be applied to the default worker and/or master MachineConfigPools. If some hardware is different, such as having a different number of storage drives or device names, or if you don't want this change applied, for example, to all workers, then a separate MachineConfigPool will need to be created. An example of how to create a separate MachineConfigPool can be found in the following solution: Openshift 4 create infra machines.
IMPORTANT NOTE: Starting with OpenShift 4.13, the naming for the disks needs to use consistent naming. Please, refer to the example for 4.13 and newer.
Procedure for OpenShift 4.12 and older
Use the following example to mount `/dev/sdb` to the container storage root directory, `/var/lib/containers`
-
Create a new file, such as mymc.yaml, with the following MachineConfig defined:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 98-var-lib-containers spec: config: ignition: version: 3.1.0 systemd: units: - contents: | [Unit] Description=Make File System on /dev/sdb DefaultDependencies=no BindsTo=dev-sdb.device After=dev-sdb.device var.mount Before=systemd-fsck@dev-sdb.service [Service] Type=oneshot RemainAfterExit=yes ExecStart=-/bin/bash -c "/bin/rm -rf /var/lib/containers/*" ExecStart=/usr/lib/systemd/systemd-makefs xfs /dev/sdb TimeoutSec=0 [Install] WantedBy=var-lib-containers.mount enabled: true name: systemd-mkfs@dev-sdb.service - contents: | [Unit] Description=Mount /dev/sdb to /var/lib/containers Before=local-fs.target Requires=systemd-mkfs@dev-sdb.service After=systemd-mkfs@dev-sdb.service [Mount] What=/dev/sdb Where=/var/lib/containers Type=xfs Options=defaults,prjquota [Install] WantedBy=local-fs.target enabled: true name: var-lib-containers.mount - contents: | [Unit] Description=Restore recursive SELinux security contexts DefaultDependencies=no After=var-lib-containers.mount Before=crio.service [Service] Type=oneshot RemainAfterExit=yes ExecStart=/sbin/restorecon -R /var/lib/containers/ TimeoutSec=0 [Install] WantedBy=multi-user.target graphical.target enabled: true name: restorecon-var-lib-containers.serviceEnsure that the following values match your environment:
- metadata.labels["machineconfiguration.openshift.io/role"] should match your MachineConfigPool (
master,worker, or a custom pool) - sdb should match your nodes secondary storage device (ie,
/dev/sdb)- Be sure to change this reference everywhere it occurs in the file
- metadata.labels["machineconfiguration.openshift.io/role"] should match your MachineConfigPool (
-
Create the new MachineConfig
$ oc create -f mymc.yaml machineconfig.machineconfiguration.openshift.io/98-var-lib-containers created
Once the new MachineConfig is rendered, the applicable nodes will begin to be updated and rebooted. On reboot a new XFS filesytem will be created on the specified disk, the old container storage will be cleared, and the disk will be mounted to /var/lib/containers
Procedure for OpenShift 4.13 and newer
Use the following example to mount a block device, which doesn't have metadata associated that can be read with blkid command, to the container storage root directory, /var/lib/containers
- Create a file, i.e.: find-secondary-device, with the following content:
#!/bin/bash
set -uo pipefail
for device in <device_type_glob>; do
/usr/sbin/blkid "${device}" &> /dev/null
if [ $? == 2 ]; then
echo "secondary device found ${device}"
echo "creating filesystem for containers mount"
mkfs.xfs -L var-lib-cont -f "${device}" &> /dev/null
udevadm settle
touch /etc/var-lib-containers-mount
exit
fi
done
echo "Couldn't find secondary block device!" >&2
exit 77
Replace <device_type_glob> with a shell glob for your block device type.
For SCSI or SATA drives, use exactly /dev/sd*;
For virtual drives, use exactly/dev/vd*;
For NVMe drives, use exactly /dev/nvme*[0-9]*n*.
Encode the script with base64 which you will use in the next MachineConfig definition.
You have to place the entire base64 string replacing <encoded_etc_find_secondary_device_script>.
You use the echo <CONTENT> | base64 -w0 command to generate the base64 encoded string.
- Create the
MachineConfigdefinition file with the following content. Replaceaccordingly. Also you might need to adjust the .spec.config.ignition.versionto match the version in your cluster.
$ cat mc.second.disk.var.lib.containers.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: <ROLE>
name: 98-var-lib-containers
spec:
config:
ignition:
version: 3.4.0
storage:
files:
- path: /etc/find-secondary-device
mode: 0755
contents:
source: data:text/plain;charset=utf-8;base64,<encoded_etc_find_secondary_device_script>
systemd:
units:
- name: find-secondary-device.service
enabled: true
contents: |
[Unit]
Description=Find secondary device
DefaultDependencies=false
After=systemd-udev-settle.service
Before=local-fs-pre.target
ConditionPathExists=!/etc/var-lib-containers-mount
[Service]
RemainAfterExit=yes
ExecStart=/etc/find-secondary-device
RestartForceExitStatus=77
[Install]
WantedBy=multi-user.target
- name: var-lib-containers.mount
enabled: true
contents: |
[Unit]
Description=Mount /var/lib/containers
Before=local-fs.target
[Mount]
What=/dev/disk/by-label/var-lib-cont
Where=/var/lib/containers
Type=xfs
TimeoutSec=120s
[Install]
RequiredBy=local-fs.target
- name: restorecon-var-lib-containers.service
enabled: true
contents: |
[Unit]
Description=Restore recursive SELinux security contexts
DefaultDependencies=no
After=var-lib-containers.mount
Before=crio.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/restorecon -R /var/lib/containers/
TimeoutSec=0
[Install]
WantedBy=multi-user.target graphical.target
- Create the new
MachineConfig:
$ oc create -f mc.second.disk.var.lib.containers.yaml
machineconfig.machineconfiguration.openshift.io/98-var-lib-containers created
Once the new MachineConfig is rendered, the applicable nodes will begin to be updated and rebooted.
On reboot, a new XFS filesytem will be created on the specified disk with the var-lib-cont label and the disk will be mounted to /var/lib/containers.
- As soon as this finishes and confirmed that is working as expected, edit the same
machineConfigand disable therestorecon-var-lib-containers.serviceto avoid doing arestoreconevery time the node reboots:
$ oc edit mc/98-var-lib-containers
[omitted]
- name: restorecon-var-lib-containers.service
enabled: false --> change this to false so the service is disabled by systemd.
contents: |
[Unit]
Description=Restore recursive SELinux security contexts
DefaultDependencies=no
After=var-lib-containers.mount
Before=crio.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/restorecon -R /var/lib/containers/
TimeoutSec=0
[Install]
WantedBy=multi-user.target graphical.target
The nodes will reboot once again with the partition mount on /var/lib/containers, but this time without doing restorecon, since all SELinux contexts will be maintained from previous first run.
Root Cause
While this process can be used to mount additional disks to a RHEL CoreOS node, support is limited to no more than one additional disk/partition. Please see Understanding OpenShift File System Monitoring (eviction conditions) for more information.
Also note that this procedure can not be used to moved parts of the root filesystem, such as /var, to another disk/partition on an already installed node. Provisioning a node with a separate /var filesystem is a separate process from what is described here.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.