[Xen HVM] kernel-4.18.0-553.50.1.el8_10 panic at 'xen_time_init()' during boot.

Solution Verified - Updated

Environment

  • Standalone Xen + Intel CPU
  • AWS legacy instances
  • kernel-4.18.0-553.50.1.el8_10

Issue

  • The kernel-4.18.0-553.50.1.el8_10 fails to boot and panic at xen_time_init() function during boot, specifically on standalone Xen + intel CPU, and legacy AWS instances.
  • The issue started reproducing after updating the system from 4.18.0-553.47.1.el8_10 to 4.18.0-553.50.1.el8_10 as part of CVE-2024-53241.
[    0.262012] invalid opcode: 0000 [#1] SMP PTI
[    0.263000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-553.50.1.el8_10.x86_64 #1
[    0.263000] Hardware name: Xen HVM domU, BIOS 4.13 12/03/2021
[    0.263000] RIP: 0010:xen_time_init+0x66/0x1f2
..
[    0.263000] Call Trace:
[    0.263000]  native_smp_prepare_cpus+0xc4/0x166
[    0.263000]  xen_hvm_smp_prepare_cpus+0xc/0x65
[    0.263000]  kernel_init_freeable+0xd3/0x23a
[    0.263000]  kernel_init+0xa/0x10a
[    0.263000]  ret_from_fork+0x1f/0x40
[    0.263000] Modules linked in:
[    0.487004] ---[ end trace 2762b0c1555c06ae ]---
[    0.490001] RIP: 0010:xen_time_init+0x66/0x1f2

Resolution

Temporary Workaround

  • Boot the machine with an older kernel.

Root Cause

  • Here is the commit that fix the issue.
commit 0e3031681b579e4ed982ff9304425d3b60663066
x86/xen: use the whole RCX when picking the right hypercall function
    
    Upstream Status: RHEL-only
    
    RHEL8 commit d479f7e0929b ("x86/xen: use new hypercall functions instead
     of hypercall page") had a RHEL-only implementation for picking the right
    hypercall function. 'x86_vendor' is 'u8' in 'struct cpuinfo_x86' and thus
    inline assember optimizes the constraint and only uses 'CL' part but full
    'RCX' is used in the comparison,  

    And aparently the upper part of 'RCX' is not always zero. Unfortunately,
    the code was only tested on AWS Xen where both 'vmcall' and 'vmmcall'
    instustions are tolerated.
    
    Fixes: d479f7e0929b ("x86/xen: use new hypercall functions instead of hypercall page")

-#define __HYPERCALL_ENTRY(x)   "a" (x), "c" (boot_cpu_data.x86_vendor)
+#define __HYPERCALL_ENTRY(x)   "a" (x), "c" ((u64)boot_cpu_data.x86_vendor)

Diagnostic Steps

  • The kernel panic with invalid opcode error at xen_time_init+0x66 function.
[    0.262012] invalid opcode: 0000 [#1] SMP PTI
[    0.263000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-553.50.1.el8_10.x86_64 #1
[    0.263000] Hardware name: Xen HVM domU, BIOS 4.13 12/03/2021
[    0.263000] RIP: 0010:xen_time_init+0x66/0x1f2
[    0.263000] Code: c7 c0 20 c1 01 00 48 8b 14 d5 80 68 7c bc 8a 0d 7d 5b bc ff 48 63 34 02 31 d2 b8 18 00 00 00 48 83 f9 00 75 05 0f 01 c1 eb 03 <0f> 01 d9 85 c0 75 17 48 c7 c7 88 ac 6d bc e8 47 18 d7 fd 48 c7 05
  • Here is the respective source code where it got panic.
# eu-addr2line -e ./usr/lib/debug/lib/modules/4.18.0-553.50.1.el8_10.x86_64/vmlinux xen_time_init+0x66
/usr/src/debug/kernel-4.18.0-553.50.1.el8_10/linux-4.18.0-553.50.1.el8_10.x86_64/./arch/x86/include/asm/xen/hypercall.h:431:9

./arch/x86/xen/time.c
*************************
   468  static void __init xen_time_init(void)
   469  {

   480          if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, xen_vcpu_nr(cpu),
   481                                 NULL) == 0) {

arch/x86/include/asm/xen/hypercall.h
**************************************
   429  HYPERVISOR_vcpu_op(int cmd, int vcpuid, void *extra_args)
   430  {
   431          return _hypercall3(int, vcpu_op, cmd, vcpuid, extra_args);
   432  }

   188  #define _hypercall3(type, name, a1, a2, a3)                             \

    95  #define __HYPERCALL                                     \
    96          "cmpq $"__XEN_XSTR(X86_VENDOR_INTEL)", %%rcx \n"\
    97          "jne 1f \n"                                     \
    98          "vmcall \n"                                     \
    99          "jmp 2f \n"                                     \
   100          "1: vmmcall \n"                                 \
   101          "2: \n"
   102  
   103  #define __HYPERCALL_ENTRY(x)    "a" (x), "c" (boot_cpu_data.x86_vendor)   <<----
  • Disassemble the code where it got panic.
$ echo "Code: c7 c0 20 c1 01 00 48 8b 14 d5 80 68 7c bc 8a 0d 7d 5b bc ff 48 63 34 02 31 d2 b8 18 00 00 00 48 83 f9 00 75 05 0f 01 c1 eb 03 <0f> 01 d9 85 c0 75 17 48 c7 c7 88 ac 6d bc e8 47 18 d7 fd 48 c7 05" |scripts/decodecode 

   e:    8a 0d 7d 5b bc ff        mov    -0x43a483(%rip),%cl        # 0xffffffffffbc5b91
  14:    48 63 34 02              movslq (%rdx,%rax,1),%rsi
  18:    31 d2                    xor    %edx,%edx
  1a:    b8 18 00 00 00           mov    $0x18,%eax
  1f:    48 83 f9 00              cmp    $0x0,%rcx   <<---
  23:    75 05                    jne    0x2a
  25:    0f 01 c1                 vmcall
  28:    eb 03                    jmp    0x2d
  2a:*    0f 01 d9                 vmmcall        <-- trapping instruction
  2d:    85 c0                    test   %eax,%eax
  2f:    75 17                    jne    0x48
  31:    48 c7 c7 88 ac 6d bc     mov    $0xffffffffbc6dac88,%rdi
  38:    e8 47 18 d7 fd           call   0xfffffffffdd71884
  3d:    48                       rex.W
  3e:    c7                       .byte 0xc7
  3f:    05                       .byte 0x5
SBR
Components
Category

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.