[Tutorial] Mitigate GFX Crash/Lockup (apparent freeze) with amdgpu

There are numerous bug reports and even more possibly-related forum posts and anecdotes floating around about a surprisingly common problem with amdgpu systems.

Symptoms include an apparent ‘freeze’ - though this is often actually just the gfx ceasing to be updated, while audio continues if playing, and a TTY is accessible with patience.

Often triggered by h.264, fullscreen videos in browsers like firefox, or fullscreen games.


Output in the logs would appear like some of the following;

Pageflip timed out! This is a bug in the amdgpu kernel driver

Or

flip_done timed out


Kernel

First it may be mentioned that kernel 6.6 ( linux66 ) is often comparatively reliable without further settings but this is not a permanent solution, nor is it even an option for some systems.




Power and related Boot Parameters

The following sections contain kernel or boot parameters that might be applied.

PSR

It is quite possible that these issues are related to PSR, “Panel Self Refresh”, a power saving feature. The amdgpu.dcdebugmask parameter can be used to affect PSR options derived from these values;

Click to show snippet
enum DC_DEBUG_MASK {
	DC_DISABLE_PIPE_SPLIT = 0x1,
	DC_DISABLE_STUTTER = 0x2,
	DC_DISABLE_DSC = 0x4,
	DC_DISABLE_CLOCK_GATING = 0x8,
	DC_DISABLE_PSR = 0x10,
	DC_FORCE_SUBVP_MCLK_SWITCH = 0x20,
	DC_DISABLE_MPO = 0x40,
	DC_ENABLE_DPIA_TRACE = 0x80,
	DC_ENABLE_DML2 = 0x100,
	DC_DISABLE_PSR_SU = 0x200,
	DC_DISABLE_REPLAY = 0x400,
	DC_DISABLE_IPS = 0x800,

(some definitions: Core Driver Infrastructure — The Linux Kernel documentation)

Meaning something like these:

DC_DISABLE_PSR_SU

amdgpu.dcdebugmask=0x200

OR

DC_DISABLE_PSR_SU and DC_DISABLE_REPLAY

amdgpu.dcdebugmask=0x600

OR

DC_DISABLE_PSR (automatically also SU)

amdgpu.dcdebugmask=0x10

OR

DC_DISABLE_PSR (automatically also SU) & DC_DISABLE_STUTTER

amdgpu.dcdebugmask=0x12

In roughly increasing order of severity may be enough to workaround the issue.
 

Dynamic Power Management

DPM dynamically changes clock speeds and voltage based on GPU load and is usually desirable but it can be disabled to work around certain issues at the cost of energy efficiency and possibly performance.

amdgpu.dpm=0

 

dGPU Power Management

Users of dedicated GPUs may find that the following option is useful, again at the cost of power-saving.

amdgpu.runpm=0



Direct Scan Out

Finally, especially for integrated GPU users where none of the above successfully abated the amdgpu freeze, another possible ‘fix’ is to disable “Direct Scan-out”. This is not optimal as direct scanout is meant to increase performance and decrease latency. However this is the only thing that worked for me and is preferable to a full graphics lockup. This can be controlled via a global environment variable (such as set in /etc/environment) and varies with window manager.

For kwin:

KWIN_DRM_NO_DIRECT_SCANOUT=1

For other wayland compositors:

WLR_SCENE_DISABLE_DIRECT_SCANOUT=1


Hope that was helpful to someone else out there. <3


More Information

See Also Env Vars

Environment Variables - Arch Wiki

See Also Boot Options

Kernel Parameters - Arch Wiki

Thank you for this. It’s the most clear and concise summary of all the advice I’ve found online about this while trying to troubleshoot my issue (on a Ryzen AI Max+ 395 with the 8060S).

In case others like me find this post — unfortunately, this didn’t work for me (and I’ve tried a bunch of other stuff too), but hopefully it helps others.

One thing I haven’t been brave enough to test yet is the amdgpu.ppfeaturesmask setting. If that works, I’ll post an update here too.

Thanks for the fixes cscs.

Unfortunately, after trying everything none of them worked for myself. Freezes would still happen randomly while playing games, sometimes within a couple minutes, and sometimes after a couple hours.

However, what did work for me was Cachy’s latest release-candidate for the 6.16 kernel. I’ve been running it for a week and haven’t crashed since. I believe 6.16 is considered stable and released this Sunday (or the next) but for anyone impatient enough I’d say give that a shot.

This issue was where I got the idea to try 6.16: Making sure you're not a bot!

And here is the patch pointed out in the issue’s thread that seems to have fixed the root cause, at least for myself: drm/amdgpu/vcn3: read back register after written · torvalds/linux@b7a4842 · GitHub

Hope this helps anyone else!

I will give 6.16 a try, the most recent 6.15 was by far the worst while 6.12.36-2 LTS has been stable

Not fixed in 6.16 RC, full system freeze after 5 minutes, back to 6.12.36-2 LTS

I have the same problem, thanks OP for stating out the possible fixes.

I have relatively new performant hardware (ryzen 395+, fresh mediatek wi-fi) and it seems, older kernel is not an option for me (system wont load on <6.14). Hoping for a solution.

Also its not cachyos related, I had same problems with PikaOS.
Its not linux patches related, same things going on non-cachyos-patched regular linux kernel and lts.

Did you try any of the workarounds above?

Yea, sure, I tried like thousands of combinations of various advices on the internet, including switching to xorg, other distros and so on. That problem drives me crazy last days..

Nothing interesting in logs except those:

авг 03 19:32:18 cachyos-faex9 kwin_wayland[1135]: kwin_wayland_drm: Pageflip timed out! This is a bug in the amdgpu kernel driver
авг 03 19:32:18 cachyos-faex9 kwin_wayland[1135]: kwin_wayland_drm: Please report this at Making sure you're not a bot!
авг 03 19:32:18 cachyos-faex9 kwin_wayland[1135]: kwin_wayland_drm: With the output of ‘sudo dmesg’ and ‘journalctl --user-unit plasma-kwin_wayland --boot 0’
авг 03 19:32:18 cachyos-faex9 kernel: amdgpu 0000:c5:00.0: [drm] ERROR [CRTC:86:crtc-0] flip_done timed out

I have complete freeze of system (ctrl+alt+f3 to switch to another terminal doesnt work), but interesting detail is music working if it was played.

For now I am trying to check if it is something browsers hw acceleration related as some user on the internet stated and actually waiting for fix of those flip_done timeouts

So .. thats pretty weird that you could get a wayland bug output while using X11.

But the comment also seems vague.

Did you actually try any of the specific steps written in the guide above? Any you skipped?

Sure, I tried all of those params - and in different combinations. Those advices are not working you can try on the internet on this problem.

I think the problem is not related to cachyos at all so I’ve decided to keep such conversations on amdgpu forum

here is my post there.

News are: hardware accelerations manipulations with browsers did not help but I now I know I can revive DE by pulling off and on the HDMI cable and following CTRL+ALT+F1. Not very convenient, but is much better than hard reboot.

I also tried dozens of combinations of kernels, vulkan-radeon, mesa… Also tried PikaOS and it locked up on boot.

My system has been most stable under Xorg. It’s strange that yours still crashes. Have you tried disabling page flipping in the xorg.conf? (I use GNOME rather than KDE, which also requires disabling Wayland in the GDM config file, in case that’s relevant).

Yes. As you said that, I am not sure if it fixed problem in PikaOS Gnome env, but probably so (did not use it too much, because I like CachyOS more and Plasma env and wanted to solve problem here).

On current environment I use Wayland, so no xorg configs for me here.

I also think my cheap KVM with strange EDID simulation might be main issue on this, but I am not sure, did not test yet.

fixed it for me on 6.17.something, its been two weeks. i did not try the other ones since i did not figure out how to change them lol

Yeah, it started again since updating to 6.18

i am having a very similar issue but with an integrated intel gpu (i5-8500). my sound is looping the last second and i find nothing in the journalctl, maybe because the system already hung at that point.

mine also seems to be triggered by youtube but most of it seems to be vp9 codec (or at least youtube is always playing when my system locks up, not fullscreen though).

i switched to x11 for about a week now and the freezes stopped (fingers crossed). i am going to test for another week to make sure but i had at least one freeze per week on wayland.

i am wondering if i should report this somewhere?

That sounds like it could be a different problem.

But you might report it to upstream intel and/or KDE and/or freedesktop.

thanks for the reply! i am not familliar with the forum structure here, could you point me to the correct forum for “upstream intel and/or KDE”, please?

These might be some;




As to the topic issue…

I am glad to report that in my case one of the 6.18 releases fixed it.
Sorry I cannot be more exact except to note that early 6.18 releases suffered the same problems.
But at least as of 6.18.5-2 none of the previously noted workarounds were any longer required. :slight_smile:

This is a great guide!

I have two new laptops with AMD iGPUs. Adding “amdgpu.dcdebugmask=0x12” to the boot options solved the freezes for me. Smooth sailing for two weeks straight now :slight_smile:

Here’s a post in the Manjaro forum talking about freezes/pageflip timeouts:

It looks like it’s a hard to reproduce race condition, highly dependent on timing (and therefore highly hardware and usage dependent), which explains why it’s been so hard to fix so far: [PATCH 1/1] drm/amd/display: complete cursor vblank events immediately