Amdgpu issue with linux-cachyos kernel 6.12.x [SOLVED: NOT AN ISSUE]

Hello, first of all I wanted to thank Cachyos team for OS, you guys are great thanks for your work!

I noticed in a game that my GPU (Gigabyte AMD Radeon RX 7900 XT) suddenly reduced FPS after some time. This surprised me and I activated GPU core clock stats in mangohud.

What I then saw unsettled me somewhat - GPU core clock was at 2750-2950 MHz at the peaks. Which then also explains throttling.

Interestingly, I used LACT for the settings and the maximum I set was 2500 MHz.

I first thought that there was a problem with LACT or with the AMD driver and uninstalled it completely.
LACT via GUI in CachyOS package installer and AMD driver via chwd (sudo chwd --remove amd))

I also deleted /etc/modprobe.d/99-amdgpu-overdrive.conf
I also checked /etc/default/grub to be sure that overclocking is completely off.

Then I reinstalled drivers via chwd → unfortunately it did not help.
I have tried several kernels from CachyOS team from 6.12.x onwards the problem is everywhere.
CachyOS LTS kernel as well as core/linux 6.12.1.arch1-1 are free of the problem.

I did a quick search for the problem and came across similar AMD GPU behavior here by default the kernel sets a maximum gpu clock that exceeds the manufacturers specifications, causing hardware crashes (#3131) · Issues · drm / amd · GitLab

Unfortunately I was not able to find a solution there.
It should be mentioned that in LACT I used the ported function of the kernel for AMD GPUs, which allows to control zero rpm mode on the AMD GPUs.

My current solution looks like this: I use CachyOS LTS and 6.12.1.arch1-1 kernel to avoid possible damage to the hardware.

Maybe I am doing something wrong or is it a currently known problem?

Many thanks in advance!

Best regards,
Mark

Are you sure that 2750-2950 is above the limits defined by the hardware spec? I suggest looking it up instead of “avoiding possible damage”

Were the overclocking flags enabled? IIRC you need it if you want to set these limits

This is due to drm/amd/pm: fix and simplify workload handling · CachyOS/linux@df826f5 · GitHub, without it you get degraded performance OOTB. You can argue that users that want more performance can use corectrl or lact, but that is not the default and users potentially need to enable overdrive → taints the kernel drm/amd: Taint the kernel when enabling overdrive · CachyOS/linux@73c55eb · GitHub.

If you are unhappy with the behaviour (clocks exceeding the hardware limits), report it upstream. This will be the new default behaviour.

AMD Radeon RX 7900 XT Review - Efficiency & Clock Speeds | TechPowerUp - As it so turns out, the peak clocks you’re seeing is still within spec. This is supported by the fact that the GPU is still throttling. If it were over, your GPU would keep the clocks until it damaged your hardware.

Many thanks for the answer! If you look at the specification on the manufacturer’s website, it looks different:

Boost Clock* : up to 2535 MHz (Reference card: 2400 MHz)
Game Clock* : up to 2175 MHz (Reference card: 2000 MHz)

So far, it has also worked in such a way that frequencies set in LACT have been adopted. Until the zero rpm control function appeared.
The overclocking flags, as previously mentioned, are set. →
/etc/modprobe.d/99-amdgpu-overdrive.conf

I also see warnings during boot like:
amdgpu: Overdrive is enabled, please disable it before reporting any bugs.

I also checked the values in pp_od_clk_voltage
unter /sys/devices/pci0000:00/0000:00:03.1/0000:07:00.0/0000:08:00.0/0000:09:00.0

It looks like:

❯ cat pp_od_clk_voltage
OD_SCLK:
0: 500Mhz
1: 2451Mhz
OD_MCLK:
0: 97Mhz
1: 1250MHz
OD_VDDGFX_OFFSET:
0mV
OD_RANGE:
SCLK:     500Mhz       5000Mhz
MCLK:      97Mhz       1500Mhz
VDDGFX_OFFSET:    -450mv          0mv

With the other kernels I mentioned, everything works as it should.

But no matter which maximum frequencies I enter under CachyOS Kernel 6.12.x or which profile I use, for example BOOTUP_DEFAULT or POWER_SAVING, it does not work. Maximum frequencies are set

Interesting. If I turn on the “zero rpm mode” option again, then everything works as it should, the frequencies are taken over by the GPU as set in LACT.