Sudden graphical artifacting inside games (AMD RX 9070XT)

Hello,

(Context) I have been using CachyOS for almost a year now with no issues. Half of that year was with an RX 9070XT, which I bought partially to upgrade from my very old Nvidia card and mostly to get better Linux support. It worked much better than said Nvidia card… until today.

For any game with lots of 3D models, there is heavy artifacting, getting bad right away once there’s a certain amount of stuff on the screen and getting worse from there. An example of this can be seen here:

Visual artifacting appearing during battlefield setup in Totally Accurate Battle Simulator

Some things I already know:

  • It’s not the graphics card itself; it runs perfectly fine on the same games in more extreme circumstances on Windows. It also can run MSI Kombustor’s Vulkan Furmark on 4k with no measured artifacting.
  • It isn’t related to a system upgrade; I only upgraded my system after this started happening.
  • It probably isn’t related to the Proton version? I tried older, Cachy, and GE Proton versions, all to no avail.
  • It’s not related to Steam Recording.
  • It happens in most 3D games. Hitman: WOA doesn’t have artifacting outside of 2D menus. I also had an one frame artifact of what I think is a shader in Deltarune Chapter 4 (can’t embed it since I already used up my one embed).
  • While the artifacts don’t show anything in journalctl, Hitman: WOA does crash fully and give error messages. Two of these crash logs can be found below. (note: the Proton log files are rather unusable since they are filled with three million lines of fixme:amdxc:AmdExtD3DFactory_CreateInterface unknown guid)
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:367)
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:  Process hitman3.exe pid 13017 thread vkd3d_queue pid 13064
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:   in page starting at address 0x000045101074d000 from client 10
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701031
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:          Faulty UTCL2 client ID: TCP (0x8)
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:          MORE_FAULTS: 0x1
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:          WALKER_ERROR: 0x0
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:          PERMISSION_FAULTS: 0x3
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:          MAPPING_ERROR: 0x0
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:          RW: 0x0
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:367)
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:  Process hitman3.exe pid 13017 thread vkd3d_queue pid 13064
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:   in page starting at address 0x0000451010751000 from client 10
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:367)
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:  Process hitman3.exe pid 13017 thread vkd3d_queue pid 13064
Jun 24 19:44:14 cachyos-64 kernel: amdgpu 0000:03:00.0:   in page starting at address 0x0000451010755000 from client 10
Jun 24 19:44:15 cachyos-64 steam[1959]: Audio source [System Pulse]: Signal levels: -28.8dB, -25.6dB
Jun 24 19:44:15 cachyos-64 steam[1959]: Audio mix: start=7980686511, returned=2881186
Jun 24 19:44:15 cachyos-64 steam[1959]: Audio source [System Pulse]: init=7980711233, adjustment=0, through=2881666, last_start=2881186, mixed=2880480, drop_before=0, drop_after=0
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: Dumping IP State
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: Dumping IP State Completed
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: [drm] AMDGPU device coredump file has been created
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 timeout, signaled seq=1067706, emitted seq=1067708
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0:  Process hitman3.exe pid 13017 thread vkd3d_queue pid 13064
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: Starting gfx_0.0.0 ring reset
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: Ring gfx_0.0.0 reset succeeded
Jun 24 19:44:16 cachyos-64 kernel: amdgpu 0000:03:00.0: [drm] device wedged, but no recovery needed
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649602] Udev event detected
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638] Udev_Event_Detail
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_subsystem:  drm
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_action:     change
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_connector:  (null)
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_devname:    /dev/dri/card1
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_devmode:    (null)
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_hotplug:    (null)
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_major:      226
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    prop_minor:      1
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    sysname:         card1
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    syspath:         /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/drm/card1
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649638]    attr_name:       (null)
Jun 24 19:44:16 cachyos-64 org_kde_powerdevil[1580]: [  1871][8012.649658] (dw_watch_display_connections) Time since last return from sleep = 813474394416 ns = 813474 ms
Jun 24 19:44:16 cachyos-64 steam[1959]: radv/amdgpu: The CS has been cancelled because the context is lost. This context is innocent.
Jun 24 19:44:16 cachyos-64 steam[1959]: radv: GPUVM fault detected at address 0x45101074d000.
Jun 24 19:44:16 cachyos-64 steam[1959]: GCVM_L2_PROTECTION_FAULT_STATUS: 0x701031
Jun 24 19:44:16 cachyos-64 steam[1959]:          CLIENT_ID: (TCP) 0x8
Jun 24 19:44:16 cachyos-64 steam[1959]:          MORE_FAULTS: 1
Jun 24 19:44:16 cachyos-64 steam[1959]:          WALKER_ERROR: 0
Jun 24 19:44:16 cachyos-64 steam[1959]:          PERMISSION_FAULTS: 3
Jun 24 19:44:16 cachyos-64 steam[1959]:          MAPPING_ERROR: 0
Jun 24 19:44:16 cachyos-64 steam[1959]:          RW: 0

--------------------------------------------------------------------------------

Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: Dumping IP State
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: Dumping IP State Completed
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: [drm] AMDGPU device coredump file has been created
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: ring gfx_0.0.0 timeout, signaled seq=1209420, emitted seq=1209423
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0:  Process hitman3.exe pid 16142 thread vkd3d_queue pid 16192
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: Starting gfx_0.0.0 ring reset
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: Ring gfx_0.0.0 reset succeeded
Jun 24 19:49:02 cachyos-64 kernel: amdgpu 0000:03:00.0: [drm] device wedged, but no recovery needed
Jun 24 19:49:02 cachyos-64 steam[1959]: radv/amdgpu: The CS has been cancelled because the context is lost. This context is innocent.
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864185] Udev event detected
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217] Udev_Event_Detail
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_subsystem:  drm
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_action:     change
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_connector:  (null)
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_devname:    /dev/dri/card1
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_devmode:    (null)
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_hotplug:    (null)
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_major:      226
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    prop_minor:      1
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    sysname:         card1
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    syspath:         /sys/devices/pci0000:00/0000:00:01.1/0000:01:00.0/0000:02:00.0/0000:03:00.0/drm/card1
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864217]    attr_name:       (null)
Jun 24 19:49:02 cachyos-64 org_kde_powerdevil[1580]: [  1871][8298.864235] (dw_watch_display_connections) Time since last return from sleep = 1099688972153 ns = 1099689 ms

While I am somewhat unsure if this fits on these forums, I am less sure if this fits for Steam or Proton. Should I figure out more about this issue, I may move this to a more fitting forum/issue tracker.

Any ideas on how to alleviate this, or at the very least what is going on here?

Thanks for any help!

(title edited after being solved because I don’t think the “system changes” part is relevant; just the graphics card and artifacting)

One thing I can recommend is trying to clear both the Mesa and Steam shader caches, especially Steam’s shader cache, since corrupted shaders are known to persist across updates.

Go into Steam, select a game to test and disable pre-caching. Or do it globaly, your choice.

Mesa documents that the shader cache normally lives in: ~/.cache/mesa_shader_cache

It can also be disabled for testing with: MESA_SHADER_CACHE_DISABLE=true

Mesa’s documentation also says that the shader cache can be disabled in Steam launch options with: RADV_DEBUG=nocache %command%

I’m sure this is a Steam problem, rather than anything else. Anyway, please post the results.

Have a nice day! :grin:

Thanks for the response.

I took the nuclear option right away and tried everything mentioned in your post: Clearing the mesa shader cache, disabling it in both of the mentioned ways, and disabling it in the Steam settings. No luck though; the behavior is identical in all the games I have checked.

Have you tried other kernel options Cachy provides, maybe booting into an LTS kernel brings better results?

I’m also thinking that maybe the linux driver pushes the card too far? Try running mangohud and comparing the clocks and voltage with windows, both idle and during load…

I fed my ‘little’ helper your post and it suggested to disable Delta Color Compression using RADV_DEBUG=nodcc,nodisplaydcc %command% in steam launch options. Other than that it had no clue to what could be happening.

I would grab LACT and check if the clocks/power are to spec for the card. I vaguely recall some cards not having their info reported to the kernel correctly causing them to be wildly out of spec and cause issues.

install LACT and enable overclocking (we didn’t overclocking, instead downclocking LMAO)

since you’ve using 9070XT, like mine, the easist way is

  1. change the performance level into “Highest Clocks”, the mostly culprit was “Automatic”
  2. and for the final ones (in case still happens), change GPU Clock Offset to -500

the rest is up to you, if you want to undervolt or not


Another note, to avoid forcing your GPU always use the highest clock, instead of overwriting the default profile, better create another profile for this, so you can still use automatic mode for your other apps, and switch the profile to highest one if you want to prevent the artifact or crash when running 3D apps

Yep, that worked! Don’t see any artifacting at all anymore. I didn’t have to change the offset or enable overclocking though.

It does seem to strain my memory when idle a little (from 44°C to 48°C instead), but it’s vastly preferable to having essentially a borked graphics card.

Thanks for the help!

This is not good general advice. For some cards, selecting Highest will prevent the card from hitting boost clocks: 'Highest Clocks' prevents boost clocks · Issue #1067 · ilya-zlobintsev/LACT · GitHub

well this is indeed only for RDNA4 which is i believe always out of bound the max boost clock, for example mine, which is the max spec said to only 3060, if it left with auto it often reach 3200+ and got system crash, and this has been going since early Jan i think, hence the proper way even for now is doing manual OC with -500 offset for the starter, auto will leave it borked most of time (100% system crash on my end), while highest clock limiting the clock to “non boost” but safer way to avoid the crash / artifact

Odd, but a win is a win. I’d also advise looking into undervolting. If that card is anything like my 6800XT I was able to easily cut 100-120mv off and shave several degrees off with little performance impact. Biscuits’ -80 in their screenshot seems like a sane value to try.