Video driver crashing constantly

Zeekz · April 3, 2025, 8:27am

I need a bit of help troubleshooting this. It’s not a CachyOS issue, and started happening since yesterday.

Happens on ALL steam games with an AMD gpu (7800xt). The only error I can see after running Proton DEBUG = 1 is this one:

0:00:00.034816391 4855 0x74bdf80b25c0 ERROR WINE wg_source.c:423:check_decoding_support: pad <avdec_vc1-0:sink>

Used a few rollbacks to restore previous points before this latest kwin update and mesa update but nope still crashing so probably unrelated to those updates.

The games crash ONLY in Fullscreen/Windowed Borderless but not in Windowed. I tried running OpenGL still crashing under that scenario. Changed all sorts of Proton Version, the same thing.

Removed my 'LD_PRELOAD=“” gamemoderun %command% ’ didn’t make any difference.

Since it happens to all games, I figured it’s either Wayland crash or GPU driver crash, but since the games work fine on Windowed I’m confused to which is the more likely one.

Any other logs I can check and start from there?

P.S Ran a non-Steam game in fullscreen without an issue. So it’s something in there

naim · April 3, 2025, 10:05am

Is this game also running via Wine/Proton. If yes, are you running Steam Native or Steam Runtime? Please try the one you weren’t using. Thanks.

Zeekz · April 3, 2025, 10:22am

I believe it’s running some sort of a wrapper and it’s not a proton/wine game. I’m running Steam Runtime, I did try Native and did not make a difference.

The blackscreens are only “fixable” with Windowed, and to get rid of them I need to alt-tab so the focus shifts on another window like an open browser. At that point the game is on the background and I can safely close it.

At this point I’m not sure if the title is correct as this does not appear to be a driver crash but some sort of a screen blackout that gets fixed with an Alt-tab. The screen does go blank every 4-5s thought then back in picture and back to black. The proton debug log has a bunch of warnings, don’t know if those would be useful

naim · April 3, 2025, 2:06pm

Can you share those logs? Also logs from PROTON_LOG=1

Zeekz · April 3, 2025, 2:43pm

Had to run it a few times and stop quickly or the log gets too big to be pasted.

This is what the Proton Log looks like. I’ve been playing on these loading variables for the last month:

LD_PRELOAD=“” gamemoderun PROTON_LOG=1 %command%

minus the log of course. The preload fixed my time bomb stutters after 30 minutes, while the gamemoderun fixes my high ping bug (happens in some games but this fixes it).

I did several updates yesterday, but never restarted until I went to bed and when I woke up today nothing could behave normally under fullscreen or window borderless on steam.

This is the system error log I get from dmesg

The gpu is not undervolted/overclocked and I’ve been running this fine and it still runs fine on anything that’s not steam so I’m pretty sure this is not a hardware issue.

Also the longer I run a game that does this the longer i get

728.811:0120:0124:trace:unwind:dump_unwind_info unwind info at 00006FFFFB1DAB14 flags 0 prolog 0x1a bytes function 00006FFFFA666BC0-00006FFFFA666D07
728.811:0120:0124:trace:unwind:dump_unwind_info frame register rbp offset 0x30(%rsp)
728.811:0120:0124:trace:unwind:dump_unwind_info 0x1a: movq %rdi,0x48(%rsp)
728.811:0120:0124:trace:unwind:dump_unwind_info 0x16: movq %rbx,0x40(%rsp)
728.811:0120:0124:trace:unwind:dump_unwind_info 0xb: leaq 0x30(%rsp),rbp
728.811:0120:0124:trace:unwind:dump_unwind_info 0x6: subq $0x30,%rsp
728.811:0120:0124:trace:unwind:dump_unwind_info 0x2: pushq %rbp
728.811:0120:0124:trace:unwind:RtlVirtualUnwind type 0 rip 6ffff765f5ec rsp 10e220
728.811:0120:0124:trace:unwind:dump_unwind_info **** func cf5e0-cf5f9
728.811:0120:0124:trace:unwind:dump_unwind_info unwind info at 00006FFFFAFF5F70 flags 0 prolog 0x4 bytes function 00006FFFF765F5E0-00006FFFF765F5F9
728.811:0120:0124:trace:unwind:dump_unwind_info 0x4: subq $0x28,%rsp
728.811:0120:0124:trace:unwind:RtlVirtualUnwind type 0 rip 6ffff77fb905 rsp 10e250
728.811:0120:0124:trace:unwind:dump_unwind_info **** func 26b830-26b958
728.811:0120:0124:trace:unwind:dump_unwind_info unwind info at 00006FFFFB1F9C3C flags 0 prolog 0x21 bytes function 00006FFFF77FB830-00006FFFF77FB958

it’s this unwind just at different memory places over and over and over. A good 10s produce like 27,000 log

Zeekz · April 4, 2025, 10:47am

After trying literally anything I can think of, I just went ahead, formatted my cachyos partition and reinstalled it. After setting up everything back to how it was, things worked normally.

I don’t know what had happened but now ive left the default snapper settings for up to 50 restore points (previously I had set them to 3, because who needs so many restore points lol).

Zeekz · April 11, 2025, 9:17am

After a power outtage last night while I was in a game, the same thing returned. This time I had good snapper back ups, but sadly they did not help if the issue was a corrupted file?

Tried reinstalling the headers, the drivers, steam, proton anything that has to do with software. From what I’ve read it could be the driver getting into a ‘bad state’?

After outtage this starts to show up

[ 887.595000] amdgpu: ATOM BIOS: 113-EXT90455-10> 0

[ 887.609395] amdgpu 0000:09:00.0: amdgpu: CP RS64 enable
[ 887.613893] amdgpu 0000:09:00.0: vgaarb: deactivate vga console
[ 887.613895] amdgpu 0000:09:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 887.613904] amdgpu 0000:09:00.0: amdgpu: MODE1 reset
[ 887.613906] amdgpu 0000:09:00.0: amdgpu: GPU mode1 reset
[ 887.613974] amdgpu 0000:09:00.0: amdgpu: GPU smu mode1 reset
[ 888.119224] amdgpu 0000:09:00.0: amdgpu: MEM ECC is not presented.
[ 888.119229] amdgpu 0000:09:00.0: amdgpu: SRAM ECC is not presented.
[ 888.119236] amdgpu 0000:09:00.0: amdgpu: DF poison setting is inconsistent(1:0:0:0)!
[ 888.119238] amdgpu 0000:09:00.0: amdgpu: Poison setting is inconsistent in DF/UMC(0:1)!
[ 888.119255] amdgpu 0000:09:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 888.119257] amdgpu 0000:09:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[ 888.119454] [drm] amdgpu: 16368M of VRAM memory ready
[ 888.119457] [drm] amdgpu: 15995M of GTT memory ready.
[ 888.199482] amdgpu 0000:09:00.0: amdgpu: reserve 0xa700000 from 0x83e0000000 for PSP TMR
[ 888.461680] amdgpu 0000:09:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 888.461683] amdgpu 0000:09:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 888.461712] amdgpu 0000:09:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x00505300 (80.83.0)
[ 888.461715] amdgpu 0000:09:00.0: amdgpu: SMU driver if version not matched
[ 888.559473] amdgpu 0000:09:00.0: amdgpu: SMU is initialized successfully!
[ 888.572216] snd_hda_intel 0000:09:00.1: bound 0000:09:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[ 888.674667] amdgpu: HMM registered 16368MB device memory
[ 888.675964] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 888.675976] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[ 888.676021] amdgpu: Virtual CRAT table created for GPU
[ 888.676206] amdgpu: Topology: Add dGPU node [0x747e:0x1002]
[ 888.676209] kfd kfd: amdgpu: added device 1002:747e
[ 888.676221] amdgpu 0000:09:00.0: amdgpu: SE 3, SH per SE 2, CU per SH 10, active_cu_number 60
[ 888.676225] amdgpu 0000:09:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 888.676226] amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 888.676228] amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 888.676229] amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 888.676230] amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 888.676232] amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 888.676233] amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 888.676234] amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 888.676236] amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 888.676237] amdgpu 0000:09:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 888.676238] amdgpu 0000:09:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 888.676240] amdgpu 0000:09:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 888.676241] amdgpu 0000:09:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
[ 888.676242] amdgpu 0000:09:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8
[ 888.676244] amdgpu 0000:09:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[ 888.677356] amdgpu 0000:09:00.0: amdgpu: Using BACO for runtime pm
[ 888.677970] amdgpu 0000:09:00.0: [drm] Registered 4 planes with drm panic
[ 888.677971] [drm] Initialized amdgpu 3.61.0 for 0000:09:00.0 on minor 0
[ 888.684042] fbcon: amdgpudrmfb (fb0) is primary device
[ 888.798574] amdgpu 0000:09:00.0: [drm] fb0: amdgpudrmfb frame buffer device

However I’ve never checked for errors before so have no idea if it’s an actual problem here or just information. The issue is exactly as before, under Fullscreen or Borderless Window I start getting black screens which stay for up to 10s then it’s back to picture and then back to black screen. On Windowed this does not exist.

Timewise, the machine was stable 10 minutes ago, then power went off and became this. Any restore points didn’t help and it just acts like that.

P.S After searching for a while it says that it could be Adaptive Sync which would make sense why the restores didn’t work. What do I have to reinstall if it breaks again, as for now my “fix” is reinstalling CachyOS over and over