OpenShift Virtualization: VM Operations fail after upgrade to 4.18 from 4.17.27 or above
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4.18
- Red Hat OpenShift Virtualization Operator
- After upgrading the OpenShift Virtualization operator from 4.17.27 to 4.18.13 or below
Issue
- Management operations for pre-existing, running Virtual Machines (VMs) fail.
- Actions such as live migration or a graceful VM shutdown are unresponsive.
- The virt-handler logs on the worker nodes repeatedly show the error:
unable to create virt-launcher client connection: can not add ghost record when entry already exists with differing socket file location.
Resolution
This has been fixed in OpenShift Virtualization 4.18.17, which is now available in the stable channel.
A workaround is to manually correct the ghost-records of the affected VMs.
- Get the affected VMI UUID:
# oc get vmi rhel9-blue-crawdad-23 -n test-cnv -o yaml |yq '.metadata.uid'
eb0fb0aa-96d9-449d-8e24-cf94eb1ab73b
- Correct the ghost file from the node where the VM is running:
# sed -i 's|"socketFile":"/pods|"socketFile":"//pods|g' /var/run/kubevirt-private/ghost-records/eb0fb0aa-96d9-449d-8e24-cf94eb1ab73b
If there are multiple VMs affected, sed commands to correct the ghost record can be generated using the following command:
# oc logs <VIRT-HANDLER POD OF NODE HERE> | grep "can not add ghost record when entry already exists with differing socket file location" | egrep -o '[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}' | awk -F ' ' '{print "sed -i \047s|\"socketFile\":\"/pods|\"socketFile\":\"//pods|g\047 /var/run/kubevirt-private/ghost-records/"$1}' | sort | uniq
- Restart the virt-handler pod running on this node:
# oc delete pod <virt-handler-pod-name> -n openshift-cnv
Root Cause
The issue is because of mismatch in how the virt-launcher socket file path is formatted in state files (called "ghost records") between the two specific OpenShift Virtualization versions. A patch Content from github.com is not included.Kubevirt PR 15522 was backported to version 4.17.27 that caused the socket path in the ghost record to be saved with a single leading slash (e.g., /pods/...). This same patch was not released in stable 4.18 release although available in 4.18.16 candidate version. The virt-handler in version 4.18.13 discovers the active socket path on the host with a double leading slash (e.g., //pods/...). After the upgrade, the new 4.18.13 virt-handler starts and reads the "ghost" records left by the old 4.17.27 virt-handler. It then compares the path from the old record (/pods/...) with the the double slash (//pods/...). Because the path strings do not match exactly, the validation check fails, and virt-handler is unable to establish a connection to manage these VMs.
This content is not included.Jira CNV-69894 is tracking the release of the availability of the fix in the stable channel for 4.18.
Note: Attempting to avoid the issue by avoid the OpenShift Virtualization operator to 4.17.27, for example staying on operator version 4.17.21 prior to the Operator 4.18 upgrade, does not work because the upgrade process requires going to 4.17.27 before it will allow upgrading to 4.18.
Note: Care was taken to make sure these versions refer specifically to the operator, NOT the versions of the RHOCP cluster. Be careful not to confuse the two.
Diagnostic Steps
Inspect the virt-handler logs on any worker node hosting a pre-existing VM. Confirm the presence of the recurring error:
{"component":"virt-handler","kind":"VirtualMachineInstance","level":"error","msg":"Synchronizing the VirtualMachineInstance failed.","name":"rhel9-blue-crawdad-23","namespace":"nijin-cnv","pos":"vm.go:2154","reason":"unable to create virt-launcher client connection: can not add ghost record when entry already exists with differing socket file location","timestamp":"2025-09-26T13:29:29.544392Z","uid":"eb0fb0aa-96d9-449d-8e24-cf94eb1ab73b"}
{"component":"virt-handler","kind":"VirtualMachineInstance","level":"error","msg":"Updating the VirtualMachineInstance status failed.","name":"rhel9-blue-crawdad-23","namespace":"nijin-cnv","pos":"vm.go:2160","reason":"can not add ghost record when entry already exists with differing socket file location","timestamp":"2025-09-26T13:29:29.549923Z","uid":"eb0fb0aa-96d9-449d-8e24-cf94eb1ab73b"}
Virtual machine's virt-launcher is logging following error:
{"component":"virt-launcher","level":"info","msg":"failed to dial notify socket: /var/run/kubevirt/domain-notify-pipe.sock","pos":"client.go:149","reason":"context deadline exceeded","timestamp":"2025-09-26T13:30:42.153100Z"}
{"component":"virt-launcher","level":"error","msg":"Failed to connect to notify server","pos":"client.go:209","reason":"context deadline exceeded","timestamp":"2025-09-26T13:30:42.153257Z"}
{"component":"virt-launcher","level":"info","msg":"failed to dial notify socket: /var/run/kubevirt/domain-notify-pipe.sock","pos":"client.go:149","reason":"context deadline exceeded","timestamp":"2025-09-26T13:30:44.154055Z"}
On a worker node, view the content of a ghost record for an affected VM. The file is located at /var/run/kubevirt-private/ghost-records/<vmi-uid>.
# cat /var/run/kubevirt-private/ghost-records/e82e6991-51f0-4f37-9957-780f505b4705
{"name":"test-vm", ... ,"socketFile":"/pods/a71ec310-1273-4e24-8713-3f6605bd6448/..."}
Note that the socketFile path starts with a single slash (/pods).
Start a new VM on the upgraded cluster and inspect its ghost record.
# cat /var/run/kubevirt-private/ghost-records/f0ea7779-a114-4a37-a6c0-ef09458dd6f2
{"name":"new-vm", ... ,"socketFile":"//pods/f3060015-570e-4e61-947a-5da46c0fc9fa/..."}
Note that the socketFile path for the new VM starts with a double slash (//pods). This difference confirms the formatting mismatch causing the issue.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.