Login using ssh not working after RHOCP 4.13 upgrade

Solution Verified - Updated

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4.12 (preparing to upgrade to 4.13+)
    • 4.13
    • 4.14

Issue

  • After upgrading from OpenShift 4.12 to 4.13 it was not possible to ssh into the nodes.

  • Accessing a node using SSH failing with the follwoing message after openshift upgrade:

    $ ssh -i sshkey core@10.0.0.1
    core@10.0.0.1: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
    

Resolution

There are different reasons that could lead to this same issue after upgrading to OpenShift 4.13:

  • If the file /etc/ssh/sshd_config was customized, ensure that the following line is present at the beginning of the file, or better, add the customization to a new file within /etc/ssh/sshd_config.d/ and revert the customization in file /etc/ssh/sshd_config:

    Include /etc/ssh/sshd_config.d/*.conf
    
  • If the issue appeared after applying compliance remediations, it has been reported to Red Hat Engineering and it has been tracked in This content is not included.bug OCPBUGS-18331, and already fixed in OpenShift Compliance operator by errata RHBA-2024:1830.

    • A workaround for older versions of the operator is to create an additional MachineConfig resource to overwrite the sshd configuration generated by the OpenShift Compliance operator to add the following line:

    • Check How to add Include statement in the /etc/ssh/sshd_config file

      ```
      Include /etc/ssh/sshd_config.d/*.conf
      ```
      
      >**Note:** The created `MachineConfig` should be removed after upgrading to a version of the operator that includes the fix.
      
  • If the sshd configuration is not intentionally altered and is correct, make sure the issue is not caused by the use of a RSA key as explained in the solution Failing ssh access to nodes using RSA key after RHOCP 4 upgrade.

Recommendation for preparing the upgrade from 4.12 to 4.13+


If the `/etc/ssh/sshd_config` file was already modified, before upgrading the 4.12 cluster:
  • Create an additional MachineConfig resource populating the mandatory directory and include file

  • Create the /etc/ssh/sshd_config.d directory

    $ mkdir /etc/ssh/sshd_config.d
    
  • Create one file by touching an empty *.conf to be found by glob

    $ touch /etc/ssh/sshd_config.d/empty_include.conf
    

Root Cause

In OpenShift 4.13 the location for ssh keys changed, as reported by the This page is not included, but the link has been rewritten to point to the nearest parent document.release notes. By default the following sshd configuration is present for retrieving users keys starting with that version:

AuthorizedKeysCommand /usr/libexec/ssh-key-dir %u

In OpenShift 4.12 the /etc/ssh/sshd_config.d directory is absent. Applying the steps above will lead to Segmentation fault when connecting to the ssh service. An emtpy .conf is mandatory as the Segementation fault would still occure without.

Diagnostic Steps

Access the nodes using oc debug and confirm the line Include /etc/ssh/sshd_config.d/*.conf is not present in the sshd configuration:

$ for NODE in $(oc get nodes -o custom-columns=:metadata.name); do echo ---- $NODE ----; oc debug -q node/${NODE} -- chroot /host /bin/bash -c "grep Include /etc/ssh/sshd_config"; echo; done
---- master-0 ----

---- master-1 ----

---- master-2 ----

[...]
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.