Implications of KASLR on vmcore analysis with 'crash'
Introduction
Starting on 7.5, RHEL kernels will feature KASLR (Kernel Address Space Linear Randomization) enabled by default. KASLR is a security feature that enables the kernel to relocate itself to a random location on each boot, making writing exploits that depend on local resources significantly harder.
As a side effect, debugging tools like crash may encounter some trouble trying to open vmcores from KASLR-enabled kernels.
Understanding KASLR impact on vmcore analysis
Prior to KASLR introduction, the kernel would usually be located at well known physical and virtual addresses. Thanks to this, vmcore analysis tools like `crash` were able, if needed, to locate for specific data, such as the `linux_banner`, at specific offsets. Additionally, symbol tables found in the kernel's debugging information packages, which link symbols with virtual addresses, could be used directly to look for the actual data structures contained in a vmcore.
But on KASLR-enabled kernels, both the physical location of the kernel in the computer's RAM and the virtual address base offset change between boots. This means that data like the linux_banner are no longer located at well-known offsets and objects pointed by symbols in the kernel's debugging information won't be found at the expected virtual addresses.
The introduction of vmcoreinfo
To overcome the difficulties in core dump analysis introduced by KASLR, a specialized data section was added to the vmcores, named `vmcoreinfo`. This section is written by the `crashkernel` while collecting the dump, and contains, among other information, the kernel's physical base (the physical offset where the kernel was relocated on boot) and the KASLR offset (the difference between the original base virtual address and the base virtual address after the relocation).
This information allows utilities like crash to calculate the offsets needed to find both data located at well-known physical addresses and symbols from the kernel's debugging information.
Determining if crash has found the vmcoreinfo section
When `crash` finds a `vmcoreinfo` section, it prints a message that debugging symbols are being patched to accommodate the fact that the kernel has been relocated. This message looks like this, with different *XXX* and *YYY* values:
WARNING: kernel relocated [XXXMB]: patching YYY gdb minimal_symbol values
On the other hand, if crash tries to open a vmcore from a KASLR-enabled kernel without vmcoreinfo, and it's a version prior to the introduction of KASLR offset calculation (see Analyzing VM dumps without vmcoreinfo), or the dump is missing the state of the vCPU registers, it may print a message like the following one and/or fail to locate some symbol(s):
WARNING: cannot determine physical base address: defaulting to 0
Finally, the contents of the vmcoreinfo section can be dumped using the help -D command of crash:
crash> help -D
diskdump_data:
filename: vmcore
flags: c6 (KDUMP_CMPRS_LOCAL|ERROR_EXCLUDED|LZO_SUPPORTED|SNAPPY_SUPPORTED)
dfd: 3
ofp: 7f0530779400
machine_type: 62 (EM_X86_64)
(...)
sub_header_kdump: 24e7ff0
phys_base: 3c800000
dump_level: 31 (0x1f) (DUMP_EXCLUDE_ZERO|DUMP_EXCLUDE_CACHE|DUMP_EXCLUDE_CACHE_PRI|DUMP_EXCLUDE_USER_DATA|DUMP_EXCLUDE_FREE)
split: 0
start_pfn: (unused)
end_pfn: (unused)
offset_vmcoreinfo: 4936 (0x1348)
size_vmcoreinfo: 1767 (0x6e7)
OSRELEASE=3.10.0-830.el7.x86_64
PAGESIZE=4096
SYMBOL(init_uts_ns)=ffffffff98a16280
SYMBOL(node_online_map)=ffffffff98b439c0
SYMBOL(swapper_pg_dir)=ffffffff98a0e000
SYMBOL(_stext)=ffffffff97e00000
SYMBOL(vmap_area_list)=ffffffff98a91050
SYMBOL(mem_section)=ffffffff98fd8580
LENGTH(mem_section)=4096
SIZE(mem_section)=32
OFFSET(mem_section.section_mem_map)=0
SIZE(page)=64
SIZE(pglist_data)=157056
SIZE(zone)=2048
SIZE(free_area)=104
SIZE(list_head)=16
SIZE(nodemask_t)=128
OFFSET(page.flags)=0
OFFSET(page._count)=28
OFFSET(page.mapping)=8
OFFSET(page.lru)=32
OFFSET(page._mapcount)=24
OFFSET(page.private)=48
OFFSET(pglist_data.node_zones)=0
OFFSET(pglist_data.nr_zones)=156736
OFFSET(pglist_data.node_start_pfn)=156744
OFFSET(pglist_data.node_spanned_pages)=156760
OFFSET(pglist_data.node_id)=156768
OFFSET(zone.free_area)=144
OFFSET(zone.vm_stat)=1496
OFFSET(zone.spanned_pages)=1896
OFFSET(free_area.free_list)=0
OFFSET(list_head.next)=0
OFFSET(list_head.prev)=8
OFFSET(vmap_area.va_start)=0
OFFSET(vmap_area.list)=48
LENGTH(zone.free_area)=11
SYMBOL(log_buf)=ffffffff98a436e0
SYMBOL(log_buf_len)=ffffffff98a436dc
SYMBOL(log_first_idx)=ffffffff98ebd8e8
SYMBOL(log_next_idx)=ffffffff98ebd8d8
SIZE(log)=16
OFFSET(log.ts_nsec)=0
OFFSET(log.len)=8
OFFSET(log.text_len)=10
OFFSET(log.dict_len)=12
LENGTH(free_area.free_list)=6
NUMBER(NR_FREE_PAGES)=0
NUMBER(PG_lru)=5
NUMBER(PG_private)=11
NUMBER(PG_swapcache)=16
NUMBER(PG_slab)=7
NUMBER(PG_hwpoison)=23
NUMBER(PG_head_mask)=16384
NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-128
SYMBOL(free_huge_page)=ffffffff97fe0510
NUMBER(phys_base)=1015021568
SYMBOL(init_level4_pgt)=ffffffff98a0e000
SYMBOL(node_data)=ffffffff98b3e6c0
LENGTH(node_data)=1024
KERNELOFFSET=16e00000
NUMBER(KERNEL_IMAGE_SIZE)=1073741824
CRASHTIME=1523005087
KASLR and VM dumps
As described above, the `vmcoreinfo` data section was introduced to help vmcore analysis tools to locate physical objects in the dump and properly resolve debugging symbols. As this section is written by the crashkernel when a panic is triggered in the system, by default `vmcoreinfo` will not be present in vmcores extracted from VM dumps, because the `crashkernel` is not involved in this scenario.
QEMU's vmcoreinfo device
Upcoming QEMU versions will support a specialized device also named `vmcoreinfo`, which is presented to the VM. If the Guest's kernel has successfully written the section to this device, QEMU will automatically include it in the VM dump. This device is not enabled by default and must be explicitly specified in the list of arguments.
On the other hand, libVirt will also support this device in its domain definition format, but it will not enable it by default either. This implies that either the user or the management software of a layered product must explicitly add it to the domain's definition for the device to be actually present in the VM.
This section will be updated when the aforementioned versions are publicly available.
Analyzing VM dumps without vmcoreinfo
Recently, `crash` has gained the ability to calculate both the physical base and the KASLR offset from a vmcore, even if the vmcoreinfo section is missing. This is being done using a technique developed by Takao Indoh (Fujitsu), and was introduced with the following commits:
- Content from github.com is not included.Content from github.com is not included.https://github.com/crash-utility/crash/commit/907196e93dc94df104df21ba51a42a5de9277958
- Content from github.com is not included.Content from github.com is not included.https://github.com/crash-utility/crash/commit/5d172b230cf46e7e1344b517746d868c9a8e2fd0
Please note: These are not yet available in the current downstream version of crash, specifically crash-7.2.0-6.el7.x86_64.
For this technique to work, the VM dump must include the state of the vCPU registers at the moment of taking the dump. The following dump formats are known to work:
-
QEMU netdump/diskdump (both ELF and compressed formats)
- Example:
virsh dump --memory-only <domain_name>
- Example:
-
VMware VMSS (including snapshots with separated vmem files)
And these are known not to work:
-
vmcores extracted from QEMU core dumps
- Example:
gcore [-a] $PID - QEMU's core dump doesn't include the state of the vCPU registers.
- If you want to collect both QEMU and Guest states, please consider running
gcorefirst (without '-a', we just want QEMU's internal mappings) and thenvirsh dump --memory-only. The order is important, asvirsh dumpwill send an IPI to the vCPUs, potentially altering their respective states.
- Example:
-
VMware VMSS files converted to vmcore using
vmss2core- vCPU Control Registers (CR) are not included in the conversion
Disabling KASLR on the Guest
If there's no option to use any of the supported VM dumps formats, and a panic can't be generated from inside the VM either, there's still the option of disabling KASLR from inside the Guest to fallback to the pre-7.5 behavior.
This can be done by adding the nokaslr option to the kernel's command line. The recommended way to do this is by editing /etc/sysconfig/grub and regenerating /boot/grub2/grub.cfg:
- Original
/etc/sysconfig/grub(KASLR-enabled)
# cat /etc/sysconfig/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
- Modified
/etc/sysconfig/grub(KASLR-disabled)
# cat /etc/sysconfig/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet nokaslr"
GRUB_DISABLE_RECOVERY="true"
- Regenerating
/boot/grub2/grub.cfg
# grub2-mkconfig > /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-830.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-830.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-91d6af6286ab4d3dbc2fde5fe96ac63c
Found initrd image: /boot/initramfs-0-rescue-91d6af6286ab4d3dbc2fde5fe96ac63c.img
done