How to start hosted-engine VM if the host certificates are expired

Solution Verified - Updated

Environment

  • Red Hat Virtualization 4.4.

Issue

  • All the hosted-engine hosts vdsm certificates are expired. The hosted-engine VM is down and it's unable to start it using hosted-engine --vm-start because of expired certificates.

Resolution

  • Since CA of the RHV environment is the RHV manager which is down, we have to get the CA certificates and keys to the host to renew the host certificates.

Option 1: Obtain CA keys and certificates from the engine-backup.

  • All the PKI files of the engine are collected while taking engine-backup. So if the environment has got recent engine-backup, the CA files can be obtained from this.
# tar -xvf <engine-backup>
# tar -xvf files
  • The etc/pki/ovirt-engine/ will be having the CA files. This directory has to be copied to the host for renewing the host certificates.

Option 2: Copy out certificates from the hosted-engine disk.

  • If the engine-backup is not available, it's possible to copy the CA files directly from the hosted-engine disk to the host filesystem.
  • Find the hosted-engine disk:
# egrep "vm_disk_vol_id|sdUUID|vm_disk_id" /etc/ovirt-hosted-engine/hosted-engine.conf
vm_disk_id=7ac7f073-e5bc-4f6a-b483-68a16fd9fe25
vm_disk_vol_id=d82c83a7-0957-469d-8e62-d537407c8993
sdUUID=1bfb1005-b98d-4592-9c1d-5c04292584ed
  • If it's block based storage domain (iSCSI/FC), active the LV.
# lvchange --config 'devices {filter = ["a|/dev/mapper/*|","r|.*|"]}' -ay 1bfb1005-b98d-4592-9c1d-5c04292584ed/d82c83a7-0957-469d-8e62-d537407c8993
  • Copy the files out to /root/ of host using virt-copy-out:
Tell libguestfs not to use libvirt since the libvirtd service might be down because of expired certs.

 # export LIBGUESTFS_BACKEND=direct

Copy out the files

Block based storage domain:
# virt-copy-out -a /dev/1bfb1005-b98d-4592-9c1d-5c04292584ed/d82c83a7-0957-469d-8e62-d537407c8993 /etc/pki/ovirt-engine/ /root/
username: vdsm@ovirt
password: shibboleth

File-based storage domain:

# virt-copy-out -a /rhev/data-center/SPUUID/1bfb1005-b98d-4592-9c1d-5c04292584ed/images/7ac7f073-e5bc-4f6a-b483-68a16fd9fe25/d82c83a7-0957-469d-8e62-d537407c8993 /etc/pki/ovirt-engine/ /root/

Or

# virt-copy-out -a /rhev/data-center/mnt/NFS_share/1bfb1005-b98d-4592-9c1d-5c04292584ed/images/7ac7f073-e5bc-4f6a-b483-68a16fd9fe25/d82c83a7-0957-469d-8e62-d537407c8993 /etc/pki/ovirt-engine/ /root/

Renew the host certificates.

  • Create a CSR for the host using the key.
# cd /root/ovirt-engine/
# openssl req -new  -key /etc/pki/vdsm/keys/vdsmkey.pem -out /tmp/test_host_vdsm.csr -passin "pass:mypass" -passout "pass:mypass" -batch -subj "/"
  • Find the subject of the old certificate.
# openssl x509 -in  /etc/pki/vdsm/certs/vdsmcert.pem -noout  -subject
  • Sign the CSR using the engine CA. Make sure that the subj is replaced with the output from the command above.
# cd /root/ovirt-engine/

# openssl ca -batch -policy policy_match -config openssl.conf -cert ca.pem -keyfile  private/ca.pem -days +1825 -in  /tmp/test_host_vdsm.csr -out /tmp/test_host_vdsm.cer -startdate "$(date --utc --date "now -1 days" +"%y%m%d%H%M%SZ")" -subj "/C=US/O=Test/CN=test.redhat.com" -utf8
  • Copy the signed certificate.
# tar cfJ /tmp/vdsm_pki.tar.xz /etc/pki/vdsm/
# cp /tmp/test_host_vdsm.cer /etc/pki/vdsm/certs/vdsmcert.pem
# cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/vdsm/libvirt-spice/server-cert.pem
# cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/vdsm/libvirt-vnc/server-cert.pem
# cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/libvirt/clientcert.pem
  • Restart the libvirtd and vdsmd services.
# systemctl restart vdsmd
# systemctl restart libvirtd
  • Wait for around 5 minutes and try to start the VM.
# hosted-engine --vm-start

Diagnostic Steps

  • hosted-engine --vm-start was failing with the error below:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 214, in <module>
    args.command(args)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 42, in func
    f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", line 91, in checkVmStatus
    cli = ohautil.connect_vdsm_json_rpc()
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 472, in connect_vdsm_json_rpc
    __vdsm_json_rpc_connect(logger, timeout)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 415, in __vdsm_json_rpc_connect
    timeout=VDSM_MAX_RETRY * VDSM_DELAY
RuntimeError: Couldn't  connect to VDSM within 60 seconds
  • The communication between ha services and vdsm is failing because of expired certificates:
2022-04-14 13:04:52,660+0200 INFO  (Reactor thread) [ProtocolDetector.AcceptorImpl] Accepted connection from ::1:50478 (protocoldetector:61)
2022-04-14 13:04:52,665+0200 ERROR (Reactor thread) [ProtocolDetector.SSLHandshakeDispatcher] ssl handshake: SSLError, address: ::1 (sslutils:269)
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.