[Xen HVM] kernel-4.18.0-553.50.1.el8_10 panic at 'xen_time_init()' during boot.
Environment
- Standalone Xen + Intel CPU
- AWS legacy instances
kernel-4.18.0-553.50.1.el8_10
Issue
- The
kernel-4.18.0-553.50.1.el8_10fails to boot and panic atxen_time_init()function during boot, specifically on standalone Xen + intel CPU, and legacy AWS instances. - The issue started reproducing after updating the system from
4.18.0-553.47.1.el8_10to4.18.0-553.50.1.el8_10as part of CVE-2024-53241.
[ 0.262012] invalid opcode: 0000 [#1] SMP PTI
[ 0.263000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-553.50.1.el8_10.x86_64 #1
[ 0.263000] Hardware name: Xen HVM domU, BIOS 4.13 12/03/2021
[ 0.263000] RIP: 0010:xen_time_init+0x66/0x1f2
..
[ 0.263000] Call Trace:
[ 0.263000] native_smp_prepare_cpus+0xc4/0x166
[ 0.263000] xen_hvm_smp_prepare_cpus+0xc/0x65
[ 0.263000] kernel_init_freeable+0xd3/0x23a
[ 0.263000] kernel_init+0xa/0x10a
[ 0.263000] ret_from_fork+0x1f/0x40
[ 0.263000] Modules linked in:
[ 0.487004] ---[ end trace 2762b0c1555c06ae ]---
[ 0.490001] RIP: 0010:xen_time_init+0x66/0x1f2
Resolution
-
The issue has been resolved with the errata: RHBA-2025:4337. Hence, update the kernel to
kernel-4.18.0-553.51.1.el8_10to fix the issue. -
Apply the errata to resolve the issue.
Temporary Workaround
- Boot the machine with an older kernel.
Root Cause
- Here is the commit that fix the issue.
commit 0e3031681b579e4ed982ff9304425d3b60663066
x86/xen: use the whole RCX when picking the right hypercall function
Upstream Status: RHEL-only
RHEL8 commit d479f7e0929b ("x86/xen: use new hypercall functions instead
of hypercall page") had a RHEL-only implementation for picking the right
hypercall function. 'x86_vendor' is 'u8' in 'struct cpuinfo_x86' and thus
inline assember optimizes the constraint and only uses 'CL' part but full
'RCX' is used in the comparison,
And aparently the upper part of 'RCX' is not always zero. Unfortunately,
the code was only tested on AWS Xen where both 'vmcall' and 'vmmcall'
instustions are tolerated.
Fixes: d479f7e0929b ("x86/xen: use new hypercall functions instead of hypercall page")
-#define __HYPERCALL_ENTRY(x) "a" (x), "c" (boot_cpu_data.x86_vendor)
+#define __HYPERCALL_ENTRY(x) "a" (x), "c" ((u64)boot_cpu_data.x86_vendor)
Diagnostic Steps
- The kernel panic with
invalid opcodeerror atxen_time_init+0x66function.
[ 0.262012] invalid opcode: 0000 [#1] SMP PTI
[ 0.263000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-553.50.1.el8_10.x86_64 #1
[ 0.263000] Hardware name: Xen HVM domU, BIOS 4.13 12/03/2021
[ 0.263000] RIP: 0010:xen_time_init+0x66/0x1f2
[ 0.263000] Code: c7 c0 20 c1 01 00 48 8b 14 d5 80 68 7c bc 8a 0d 7d 5b bc ff 48 63 34 02 31 d2 b8 18 00 00 00 48 83 f9 00 75 05 0f 01 c1 eb 03 <0f> 01 d9 85 c0 75 17 48 c7 c7 88 ac 6d bc e8 47 18 d7 fd 48 c7 05
- Here is the respective source code where it got panic.
# eu-addr2line -e ./usr/lib/debug/lib/modules/4.18.0-553.50.1.el8_10.x86_64/vmlinux xen_time_init+0x66
/usr/src/debug/kernel-4.18.0-553.50.1.el8_10/linux-4.18.0-553.50.1.el8_10.x86_64/./arch/x86/include/asm/xen/hypercall.h:431:9
./arch/x86/xen/time.c
*************************
468 static void __init xen_time_init(void)
469 {
480 if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(cpu),
481 NULL) == 0) {
arch/x86/include/asm/xen/hypercall.h
**************************************
429 HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args)
430 {
431 return _hypercall3(int, vcpu_op, cmd, vcpuid, extra_args);
432 }
188 #define _hypercall3(type, name, a1, a2, a3) \
95 #define __HYPERCALL \
96 "cmpq $"__XEN_XSTR(X86_VENDOR_INTEL)", %%rcx \n"\
97 "jne 1f \n" \
98 "vmcall \n" \
99 "jmp 2f \n" \
100 "1: vmmcall \n" \
101 "2: \n"
102
103 #define __HYPERCALL_ENTRY(x) "a" (x), "c" (boot_cpu_data.x86_vendor) <<----
- Disassemble the code where it got panic.
$ echo "Code: c7 c0 20 c1 01 00 48 8b 14 d5 80 68 7c bc 8a 0d 7d 5b bc ff 48 63 34 02 31 d2 b8 18 00 00 00 48 83 f9 00 75 05 0f 01 c1 eb 03 <0f> 01 d9 85 c0 75 17 48 c7 c7 88 ac 6d bc e8 47 18 d7 fd 48 c7 05" |scripts/decodecode
e: 8a 0d 7d 5b bc ff mov -0x43a483(%rip),%cl # 0xffffffffffbc5b91
14: 48 63 34 02 movslq (%rdx,%rax,1),%rsi
18: 31 d2 xor %edx,%edx
1a: b8 18 00 00 00 mov $0x18,%eax
1f: 48 83 f9 00 cmp $0x0,%rcx <<---
23: 75 05 jne 0x2a
25: 0f 01 c1 vmcall
28: eb 03 jmp 0x2d
2a:* 0f 01 d9 vmmcall <-- trapping instruction
2d: 85 c0 test %eax,%eax
2f: 75 17 jne 0x48
31: 48 c7 c7 88 ac 6d bc mov $0xffffffffbc6dac88,%rdi
38: e8 47 18 d7 fd call 0xfffffffffdd71884
3d: 48 rex.W
3e: c7 .byte 0xc7
3f: 05 .byte 0x5
SBR
Product(s)
Category
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.