NULL pointer dereference while using Wine, amdgpu

Summary:

While running 007 First Light through a script package, the kernel intermittently (~80%) panics within amdgpu_hmm_invalidate_gfx, both failing to load and causes significant system instability until reboot. The issue has occurred at launch and at close; took some time to start suspecting the kernel rather than the app but this doesn’t make sense as an app failure.

System Information:

❯ uname -r
7.1.1-2-cachyos
❯ lspci -nnk | grep -A5 VGA
10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 [Radeon RX 9070/9070 XT/9070 GRE] [1002:7550] (rev c0)
Subsystem: Sapphire Technology Limited Device [1da2:e489]
Kernel driver in use: amdgpu
Kernel modules: amdgpu

❯ pacman -Qi mesa
Installed From : cachyos
Name : mesa
Version : 3:26.1.2-1

❯ pacman -Qi linux-firmware
Installed From : cachyos
Name : linux-firmware
Version : 1:20260622-1

❯ wine --version
wine-11.11

Steps to Reproduce:

  1. Boot system
  2. Launch script (calls gamescope, bubblewrap, dwarfs) and get unlucky.
  3. Launch fails, ps and pgrep notably hang, other instability. System takes ~4m to shutdown on a reboot now call.

Additional observations:

The script normally mounts a DwarFS mount, but config changes have resulted in full extract and a lack of mounting instead. DwarFS filesystem likely not involved given issue persists.

Relevant section of log, via journalctl -k -b -1 | grep -A40 -B15 amdgpu_hmm_invalidate_gfx:

Jun 28 20:55:58 cachyos-x8664 kernel: [UFW BLOCK] IN=enp8s0 OUT= MAC=a8:a1:59:99:b2:63:48:b4:23:a6:8e:c8:08:00 SRC=192.168.86.24 DST=192.168.86.39 LEN=324 TOS=0x00 PREC=0x00 TTL=64 ID=49055 DF PROTO=UDP SPT=1900 DPT=37950 LEN=304
Jun 28 20:56:18 cachyos-x8664 kernel: umip_printk: 111 callbacks suppressed
Jun 28 20:56:18 cachyos-x8664 kernel: umip: 007FirstLight.e[52697] ip:14803d94a sp:105bd0: SGDT instruction cannot be used by applications.
Jun 28 20:56:18 cachyos-x8664 kernel: umip: 007FirstLight.e[52697] ip:14803d94a sp:105bd0: For now, expensive software emulation returns the result.
Jun 28 20:56:18 cachyos-x8664 kernel: umip: 007FirstLight.e[52697] ip:14aa51363 sp:105bb0: SGDT instruction cannot be used by applications.
Jun 28 20:56:18 cachyos-x8664 kernel: umip: 007FirstLight.e[52697] ip:14aa51363 sp:105bb0: For now, expensive software emulation returns the result.
Jun 28 20:56:18 cachyos-x8664 kernel: umip: 007FirstLight.e[52697] ip:14d3e4c20 sp:105ba0: SGDT instruction cannot be used by applications.
Jun 28 20:56:18 cachyos-x8664 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Jun 28 20:56:18 cachyos-x8664 kernel: #PF: supervisor read access in kernel mode
Jun 28 20:56:18 cachyos-x8664 kernel: #PF: error_code(0x0000) - not-present page
Jun 28 20:56:18 cachyos-x8664 kernel: PGD 88b55a067 P4D 88b55a067 PUD 9f7667067 PMD 9f7665067 PTE 0
Jun 28 20:56:18 cachyos-x8664 kernel: Oops: Oops: 0000 [#1] SMP NOPTI
Jun 28 20:56:18 cachyos-x8664 kernel: CPU: 6 UID: 1000 PID: 52843 Comm: 007FirstLight.e Tainted: G OE 7.1.1-2-cachyos #1 PREEMPT(full) 2e307dde255357a39e9ab0cfb397fcc9f1cae78a
Jun 28 20:56:18 cachyos-x8664 kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Jun 28 20:56:18 cachyos-x8664 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Taichi, BIOS P4.30 04/14/2021
Jun 28 20:56:18 cachyos-x8664 kernel: RIP: 0010:amdgpu_hmm_invalidate_gfx+0x22/0xb0 [amdgpu]
Jun 28 20:56:18 cachyos-x8664 kernel: Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 8b 5e 18 f6 c3 01 74 7a 49 89 d6 48 8b 47 f0 <48> 8b 00 4c 8b a8 70 02 00 00 4c 8d bf a0 fd ff ff 41 bc a0 e5 04
Jun 28 20:56:18 cachyos-x8664 kernel: RSP: 0018:ffffd00ed5a2b930 EFLAGS: 00010202
Jun 28 20:56:18 cachyos-x8664 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Jun 28 20:56:18 cachyos-x8664 kernel: RDX: 0000000000000007 RSI: ffffd00ed5a2b9b0 RDI: ffff8eadc934ae60
Jun 28 20:56:18 cachyos-x8664 kernel: RBP: ffffd00ed5a2b968 R08: 0000000000000200 R09: 0000000000000008
Jun 28 20:56:18 cachyos-x8664 kernel: R10: 0000000000000000 R11: ffffffffc0d31ab0 R12: ffff8eadc934ae60
Jun 28 20:56:18 cachyos-x8664 kernel: R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000007
Jun 28 20:56:18 cachyos-x8664 kernel: FS: 0000000101caf6c0(0000) GS:ffff8eccf1607000(0000) knlGS:00000000054c4000
Jun 28 20:56:18 cachyos-x8664 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 28 20:56:18 cachyos-x8664 kernel: CR2: 0000000000000000 CR3: 00000006b5af8000 CR4: 0000000000350ef0
Jun 28 20:56:18 cachyos-x8664 kernel: Call Trace:
Jun 28 20:56:18 cachyos-x8664 kernel:
Jun 28 20:56:18 cachyos-x8664 kernel: __mmu_notifier_invalidate_range_start+0x1af/0x1d0
Jun 28 20:56:18 cachyos-x8664 kernel: __split_huge_pmd+0x145/0x1c0
Jun 28 20:56:18 cachyos-x8664 kernel: vma_adjust_trans_huge+0xf0/0x100
Jun 28 20:56:18 cachyos-x8664 kernel: __split_vma+0x281/0x4b0
Jun 28 20:56:18 cachyos-x8664 kernel: vma_modify+0x77d/0x1070
Jun 28 20:56:18 cachyos-x8664 kernel: ? schedule+0x55/0x180
Jun 28 20:56:18 cachyos-x8664 kernel: vma_modify_flags+0xbf/0x110
Jun 28 20:56:18 cachyos-x8664 kernel: mprotect_fixup+0x121/0x380
Jun 28 20:56:18 cachyos-x8664 kernel: do_mprotect_pkey+0x30d/0x510
Jun 28 20:56:18 cachyos-x8664 kernel: __x64_sys_mprotect+0x22/0x30
Jun 28 20:56:18 cachyos-x8664 kernel: do_syscall_64+0xa6/0x3e0
Jun 28 20:56:18 cachyos-x8664 kernel: ? srso_return_thunk+0x5/0x5f
Jun 28 20:56:18 cachyos-x8664 kernel: ? do_syscall_64+0xe4/0x3e0
Jun 28 20:56:18 cachyos-x8664 kernel: ? srso_return_thunk+0x5/0x5f
Jun 28 20:56:18 cachyos-x8664 kernel: ? srso_return_thunk+0x5/0x5f
Jun 28 20:56:18 cachyos-x8664 kernel: ? srso_return_thunk+0x5/0x5f
Jun 28 20:56:18 cachyos-x8664 kernel: ? __x64_sys_rt_sigprocmask+0x14c/0x1a0
Jun 28 20:56:18 cachyos-x8664 kernel: ? srso_return_thunk+0x5/0x5f
Jun 28 20:56:18 cachyos-x8664 kernel: ? do_syscall_64+0xe4/0x3e0
Jun 28 20:56:18 cachyos-x8664 kernel: ? srso_return_thunk+0x5/0x5f
Jun 28 20:56:18 cachyos-x8664 kernel: ? do_syscall_64+0x65/0x3e0
Jun 28 20:56:18 cachyos-x8664 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jun 28 20:56:18 cachyos-x8664 kernel: RIP: 0033:0x7f8809545e3b
Jun 28 20:56:18 cachyos-x8664 kernel: Code: fd 75 cc 0f b6 04 2f 3c 3d 77 ca 49 0f a3 c4 73 c4 48 83 c4 08 48 89 f8 5b 5d 41 5c 41 5d c3 f3 0f 1e fa b8 0a 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd 1e 10 00 f7 d8 64 89 01 48
Jun 28 20:56:18 cachyos-x8664 kernel: RSP: 002b:0000000101cae6e8 EFLAGS: 00000206 ORIG_RAX: 000000000000000a
Jun 28 20:56:18 cachyos-x8664 kernel: RAX: ffffffffffffffda RBX: 0000000000003000 RCX: 00007f8809545e3b
Jun 28 20:56:18 cachyos-x8664 kernel: RDX: 0000000000000007 RSI: 0000000000100000 RDI: 0000000060780000
Jun 28 20:56:18 cachyos-x8664 kernel: RBP: 0000000101cae870 R08: 0000000060780000 R09: 0000000000000003
Jun 28 20:56:18 cachyos-x8664 kernel: R10: 0000000060780000 R11: 0000000000000206 R12: 0000000000100000
Jun 28 20:56:18 cachyos-x8664 kernel: R13: 000000005fd4f710 R14: 000000005fd4f718 R15: 0000000000000000
Jun 28 20:56:18 cachyos-x8664 kernel:
Jun 28 20:56:18 cachyos-x8664 kernel: Modules linked in: snd_seq_dummy rfcomm snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE nft_chain_nat nf_nat bridge stp llc v4l2loopback(OE) algif_hash algif_skcipher af_alg bnep vfat fat amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_atihdmi snd_hda_codec_generic snd_hda_codec_hdmi kvm_amd uvcvideo snd_hda_intel snd_usb_audio snd_hda_codec uvc btusb snd_usbmidi_lib videobuf2_vmalloc snd_hda_core kvm videobuf2_memops iwlwifi btmtk snd_ump snd_intel_dspcfg videobuf2_v4l2 btbcm snd_rawmidi snd_intel_sdw_acpi videobuf2_common btintel snd_seq_device snd_hwdep irqbypass btrtl snd_pcm aesni_intel videodev gf128mul snd_timer cfg80211 mousedev ee1004 joydev razermouse(OE) bluetooth mc igb aead snd ptp ccp rfkill rapl i2c_piix4 soundcore wmi_bmof mxm_wmi pcspkr pps_core dca i2c_smbus k10temp mac_hid ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_comment xt_multiport nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack
Jun 28 20:56:18 cachyos-x8664 kernel: nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat x_tables nf_tables nfnetlink dm_mod ntsync i2c_dev pkcs8_key_parser crypto_user zram 842_decompress 842_compress lz4hc_compress lz4_compress amdgpu drm_panel_backlight_quirks drm_buddy drm_suballoc_helper video i2c_algo_bit drm_exec gpu_sched amdxcp drm_ttm_helper ttm nvme drm_display_helper nvme_core cec nvme_keyring nvme_auth wmi
Jun 28 20:56:18 cachyos-x8664 kernel: CR2: 0000000000000000
Jun 28 20:56:18 cachyos-x8664 kernel: —[ end trace 0000000000000000 ]—
Jun 28 20:56:18 cachyos-x8664 kernel: RIP: 0010:amdgpu_hmm_invalidate_gfx+0x22/0xb0 [amdgpu]
Jun 28 20:56:18 cachyos-x8664 kernel: Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 8b 5e 18 f6 c3 01 74 7a 49 89 d6 48 8b 47 f0 <48> 8b 00 4c 8b a8 70 02 00 00 4c 8d bf a0 fd ff ff 41 bc a0 e5 04
Jun 28 20:56:18 cachyos-x8664 kernel: RSP: 0018:ffffd00ed5a2b930 EFLAGS: 00010202
Jun 28 20:56:18 cachyos-x8664 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Jun 28 20:56:18 cachyos-x8664 kernel: RDX: 0000000000000007 RSI: ffffd00ed5a2b9b0 RDI: ffff8eadc934ae60
Jun 28 20:56:18 cachyos-x8664 kernel: RBP: ffffd00ed5a2b968 R08: 0000000000000200 R09: 0000000000000008
Jun 28 20:56:18 cachyos-x8664 kernel: R10: 0000000000000000 R11: ffffffffc0d31ab0 R12: ffff8eadc934ae60
Jun 28 20:56:18 cachyos-x8664 kernel: R13: 0000000000000000 R14: 0000000000000007 R15: 0000000000000007
Jun 28 20:56:18 cachyos-x8664 kernel: FS: 0000000101caf6c0(0000) GS:ffff8eccf1607000(0000) knlGS:00000000054c4000
Jun 28 20:56:18 cachyos-x8664 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 28 20:56:18 cachyos-x8664 kernel: CR2: 0000000000000000 CR3: 00000006b5af8000 CR4: 0000000000350ef0
Jun 28 20:56:18 cachyos-x8664 kernel: note: 007FirstLight.e[52843] exited with irqs disabled

I’m seeing the same issue running the same game on 7.1.1-2 & 7.1.2-2 and can’t even run ps. It just hangs. The process cannot even be killed so I had to reboot each time.

Re-installing the 7.0 kernel fixed it for me (archive)

# example using cachyos-v4
sudo pacman -U linux-cachyos-7.0.12-1-x86_64_v4.pkg.tar.zst linux-cachyos-headers-7.0.12-1-x86_64_v4.pkg.tar.zst

Not sure if it helps but below is an unverified report generated by AI, pointing to upstream amdgpu_hmm.c#n70

AI Investigation Report

Traceback Log

Jun 29 04:24:33 kernel: Sched_ext: lavd_1.1.1_x86_64_unknown_linux_gnu (enabled+all), task: runnable_at=+0ms
Jun 29 04:24:33 kernel: RIP: 0010:amdgpu_hmm_invalidate_gfx+0x39/0xb0 [amdgpu]
Jun 29 04:24:33 kernel: Code: 55 41 54 53 8b 5e 18 f6 c3 01 74 7a 48 8b 47 f0 41 bc a0 e5 04 00 4c 03 a7 58 ff ff ff 49 89 d6 4c 8d bf a0 fd ff ff 48 89 fd <48> 8b 00 4c 89 e7 4c 8b a8 70 02 00 00 e8 b5 e8 ec f2 4c 89 75 50
Jun 29 04:24:33 kernel: RSP: 0018:ffffd3ca4836f9a0 EFLAGS: 00010286
Jun 29 04:24:33 kernel: RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
Jun 29 04:24:33 kernel: RDX: 0000000000000005 RSI: ffffd3ca4836fa20 RDI: ffff8dca1da3de60
Jun 29 04:24:33 kernel: RBP: ffff8dca1da3de60 R08: 0000000000000002 R09: 000000000005d930
Jun 29 04:24:33 kernel: R10: 0000000000000000 R11: ffffffffc0b377f0 R12: ffff8dc9d9c5d330
Jun 29 04:24:33 kernel: R13: 0000000000000000 R14: 0000000000000005 R15: ffff8dca1da3dc00
Jun 29 04:24:33 kernel: FS:  00000001027ff6c0(0000) GS:ffff8dd7302c3000(0000) knlGS:000000005c3a0000
Jun 29 04:24:33 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 29 04:24:33 kernel: CR2: 0000000000000000 CR3: 00000002cead5000 CR4: 0000000000f50ef0
...
Jun 29 04:24:33 kernel: Call Trace:
Jun 29 04:24:33 kernel:  <TASK>
Jun 29 04:24:33 kernel:  __mmu_notifier_invalidate_range_start+0x1af/0x1d0
Jun 29 04:24:33 kernel:  __split_huge_pmd+0x142/0x1c0
Jun 29 04:24:33 kernel:  vma_adjust_trans_huge+0xf0/0x100
Jun 29 04:24:33 kernel:  __split_vma+0x281/0x4b0
Jun 29 04:24:33 kernel:  vma_modify+0x77d/0x1060
Jun 29 04:24:33 kernel:  vma_modify_flags+0xbf/0x110
Jun 29 04:24:33 kernel:  mprotect_fixup+0x121/0x370
Jun 29 04:24:33 kernel:  do_mprotect_pkey+0x317/0x510
Jun 29 04:24:33 kernel:  __x64_sys_mprotect+0x22/0x30
Jun 29 04:24:33 kernel:  do_syscall_64+0xa6/0x3e0
...
Jun 29 04:24:33 kernel: RIP: 0033:0x7f31ead4345b
...
Jun 29 04:24:33 kernel: note: 007FirstLight.e[18449] exited with irqs disabled

Technical Analysis

1. The Assembly Failure

The crash occurs at amdgpu_hmm_invalidate_gfx+0x39/0xb0.
The CPU instructions leading to the page fault:

Code: ... 48 8b 47 f0 ... 48 89 fd <48> 8b 00 ...
  1. 48 8b 47 f0 (mov -0x10(%rdi), %rax): This loads the bo->vm_bo pointer from the bo structure (passed in RDI).
  2. 48 89 fd (mov %rdi, %rbp): Compiler register setup.
  3. <48> 8b 00 (mov (%rax), %rax): The CPU attempts to dereference the loaded pointer (bo->vm_bo) to fetch vm. Because the pointer loaded in step 1 is 0000000000000000 (NULL), the dereference fails, triggering a page fault at CR2: 0000000000000000.

2. The Source Code Bug

In drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c (at tag cachyos-7.1.2-2):

static bool amdgpu_hmm_invalidate_gfx(struct mmu_interval_notifier *mni,
				      const struct mmu_notifier_range *range,
				      unsigned long cur_seq)
{
	struct amdgpu_bo *bo = container_of(mni, struct amdgpu_bo, notifier);
	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
	struct amdgpu_bo *vm_root = bo->vm_bo->vm->root.bo; // <--- NULL Pointer Dereference
	long r;

If bo->vm_bo is NULL (which can happen during HMM page migration or compaction cycles), dereferencing it to obtain vm triggers the NULL pointer dereference immediately.


Proposed Fix

A null-check guard should be placed before resolving vm_root. If bo->vm_bo is NULL, there is no GPU virtual memory space to invalidate, and the notifier callback should return true immediately.

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
index 5bfa5a84b09cb..0b76e2515a8c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
@@ -75,9 +75,13 @@ static bool amdgpu_hmm_invalidate_gfx(struct mmu_interval_notifier *mni,
 {
 	struct amdgpu_bo *bo = container_of(mni, struct amdgpu_bo, notifier);
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-	struct amdgpu_bo *vm_root = bo->vm_bo->vm->root.bo;
+	struct amdgpu_bo *vm_root;
 	long r;

+	if (!bo->vm_bo)
+		return true;
+
+	vm_root = bo->vm_bo->vm->root.bo;
 	if (!mmu_notifier_range_blockable(range))
 		return false;

Several launches and no crashes later, the rollback appears to have worked. Underlying kernel issue still needs fixing, but at least it’s workable. Thanks.