Using a separate disk for node container storage on OpenShift 4

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.12 or higher

Issue

  • How to mount separate disk to /var/lib/containers on Red Hat OpenShift 4 nodes.

Resolution

Mounting a separate/secondary disk for container storage in OpenShift Container Platform 4 is done via the Machine Config Operator using MachineConfigs.

If all nodes, including masters and/or workers, are identical, such as in the case of many Cloud instances or racks of matching bare-metal servers, then this configuration change can be applied to the default worker and/or master MachineConfigPools. If some hardware is different, such as having a different number of storage drives or device names, or if you don't want this change applied, for example, to all workers, then a separate MachineConfigPool will need to be created. An example of how to create a separate MachineConfigPool can be found in the following solution: Openshift 4 create infra machines.

IMPORTANT NOTE: Starting with OpenShift 4.13, the naming for the disks needs to use consistent naming. Please, refer to the example for 4.13 and newer.

Procedure for OpenShift 4.12 and older


Use the following example to mount `/dev/sdb` to the container storage root directory, `/var/lib/containers`
  1. Create a new file, such as mymc.yaml, with the following MachineConfig defined:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      labels:
        machineconfiguration.openshift.io/role: worker
      name: 98-var-lib-containers
    spec:
      config:
        ignition:
          version: 3.1.0
        systemd:
          units:
          - contents: |
              [Unit]
              Description=Make File System on /dev/sdb
              DefaultDependencies=no
              BindsTo=dev-sdb.device
              After=dev-sdb.device var.mount
              Before=systemd-fsck@dev-sdb.service
    
              [Service]
              Type=oneshot
              RemainAfterExit=yes
              ExecStart=-/bin/bash -c "/bin/rm -rf /var/lib/containers/*"
              ExecStart=/usr/lib/systemd/systemd-makefs xfs /dev/sdb
              TimeoutSec=0
    
              [Install]
              WantedBy=var-lib-containers.mount
            enabled: true
            name: systemd-mkfs@dev-sdb.service
          - contents: |
              [Unit]
              Description=Mount /dev/sdb to /var/lib/containers
              Before=local-fs.target
              Requires=systemd-mkfs@dev-sdb.service
              After=systemd-mkfs@dev-sdb.service
    
              [Mount]
              What=/dev/sdb
              Where=/var/lib/containers
              Type=xfs
              Options=defaults,prjquota
    
              [Install]
              WantedBy=local-fs.target
            enabled: true
            name: var-lib-containers.mount
          - contents: |
              [Unit]
              Description=Restore recursive SELinux security contexts
              DefaultDependencies=no
              After=var-lib-containers.mount
              Before=crio.service
    
              [Service]
              Type=oneshot
              RemainAfterExit=yes
              ExecStart=/sbin/restorecon -R /var/lib/containers/
              TimeoutSec=0
    
              [Install]
              WantedBy=multi-user.target graphical.target
            enabled: true
            name: restorecon-var-lib-containers.service
    

    Ensure that the following values match your environment:

    • metadata.labels["machineconfiguration.openshift.io/role"] should match your MachineConfigPool (master,worker, or a custom pool)
    • sdb should match your nodes secondary storage device (ie, /dev/sdb)
      • Be sure to change this reference everywhere it occurs in the file
  2. Create the new MachineConfig

    $ oc create -f mymc.yaml
    machineconfig.machineconfiguration.openshift.io/98-var-lib-containers created
    

Once the new MachineConfig is rendered, the applicable nodes will begin to be updated and rebooted. On reboot a new XFS filesytem will be created on the specified disk, the old container storage will be cleared, and the disk will be mounted to /var/lib/containers

Procedure for OpenShift 4.13 and newer

Use the following example to mount a block device, which doesn't have metadata associated that can be read with blkid command, to the container storage root directory, /var/lib/containers

  1. Create a file, i.e.: find-secondary-device, with the following content:
#!/bin/bash
set -uo pipefail

for device in <device_type_glob>; do 
/usr/sbin/blkid "${device}" &> /dev/null
 if [ $? == 2  ]; then
    echo "secondary device found ${device}"
    echo "creating filesystem for containers mount"
    mkfs.xfs -L var-lib-cont -f "${device}" &> /dev/null
    udevadm settle
    touch /etc/var-lib-containers-mount
    exit
 fi
done
echo "Couldn't find secondary block device!" >&2
exit 77

Replace <device_type_glob> with a shell glob for your block device type.

For SCSI or SATA drives, use exactly /dev/sd*; 
For virtual drives, use exactly/dev/vd*;
For NVMe drives, use exactly /dev/nvme*[0-9]*n*.

Encode the script with base64 which you will use in the next MachineConfig definition.
You have to place the entire base64 string replacing <encoded_etc_find_secondary_device_script>.

You use the echo <CONTENT> | base64 -w0 command to generate the base64 encoded string.

  1. Create the MachineConfig definition file with the following content. Replace accordingly. Also you might need to adjust the .spec.config.ignition.version to match the version in your cluster.
 $ cat mc.second.disk.var.lib.containers.yaml

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: <ROLE>
  name: 98-var-lib-containers
spec:
  config:
    ignition:
      version: 3.4.0
    storage:
      files:
        - path: /etc/find-secondary-device
          mode: 0755
          contents:
            source: data:text/plain;charset=utf-8;base64,<encoded_etc_find_secondary_device_script>
    systemd:
      units:
        - name: find-secondary-device.service
          enabled: true
          contents: |
            [Unit]
            Description=Find secondary device
            DefaultDependencies=false
            After=systemd-udev-settle.service
            Before=local-fs-pre.target
            ConditionPathExists=!/etc/var-lib-containers-mount

            [Service]
            RemainAfterExit=yes
            ExecStart=/etc/find-secondary-device

            RestartForceExitStatus=77

            [Install]
            WantedBy=multi-user.target
        - name: var-lib-containers.mount
          enabled: true
          contents: |
            [Unit]
            Description=Mount /var/lib/containers
            Before=local-fs.target

            [Mount]
            What=/dev/disk/by-label/var-lib-cont
            Where=/var/lib/containers
            Type=xfs
            TimeoutSec=120s

            [Install]
            RequiredBy=local-fs.target
        - name: restorecon-var-lib-containers.service
          enabled: true
          contents: |
            [Unit]
            Description=Restore recursive SELinux security contexts
            DefaultDependencies=no
            After=var-lib-containers.mount
            Before=crio.service

            [Service]
            Type=oneshot
            RemainAfterExit=yes
            ExecStart=/sbin/restorecon -R /var/lib/containers/
            TimeoutSec=0

            [Install]
            WantedBy=multi-user.target graphical.target
  1. Create the new MachineConfig:
 $ oc create -f mc.second.disk.var.lib.containers.yaml
machineconfig.machineconfiguration.openshift.io/98-var-lib-containers created

Once the new MachineConfig is rendered, the applicable nodes will begin to be updated and rebooted.
On reboot, a new XFS filesytem will be created on the specified disk with the var-lib-cont label and the disk will be mounted to /var/lib/containers.

  1. As soon as this finishes and confirmed that is working as expected, edit the same machineConfig and disable the restorecon-var-lib-containers.service to avoid doing a restorecon every time the node reboots:
 $ oc edit mc/98-var-lib-containers
[omitted]
        - name: restorecon-var-lib-containers.service
          enabled: false --> change this to false so the service is disabled by systemd.
          contents: |
            [Unit]
            Description=Restore recursive SELinux security contexts
            DefaultDependencies=no
            After=var-lib-containers.mount
            Before=crio.service

            [Service]
            Type=oneshot
            RemainAfterExit=yes
            ExecStart=/sbin/restorecon -R /var/lib/containers/
            TimeoutSec=0

            [Install]
            WantedBy=multi-user.target graphical.target

The nodes will reboot once again with the partition mount on /var/lib/containers, but this time without doing restorecon, since all SELinux contexts will be maintained from previous first run.

Root Cause

While this process can be used to mount additional disks to a RHEL CoreOS node, support is limited to no more than one additional disk/partition. Please see Understanding OpenShift File System Monitoring (eviction conditions) for more information.

Also note that this procedure can not be used to moved parts of the root filesystem, such as /var, to another disk/partition on an already installed node. Provisioning a node with a separate /var filesystem is a separate process from what is described here.

Components

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.