GPU at 90c!? this cant be right. Assistance needed understanding Nvidia drivers and tools in CachyOS

I am hoping these are more noob general questions. Very new to Linux and just lost on this issue. I absolutely cannot fiure out Nvidia drivers in Linux OS. I see almost zero parity for features (or i just dont know where to look) Where are the Gsync settings? where are the DLDSR settings? RTX Digital Vibrace? RTX super resolution video playback, etc…
I have GWE, Nvidia Xserver Display settings, no overclock and use the GUI on these tools to make adjustments, however some adjustments need to be made everytime I boot. For instance, Fan curves need to be re-applied and Open GL needs to be set to Performance vs Quality at every boot, and GWE power limits. but it does remember my resolution and refresh rate

single system with 2 independednt PCIe SSD’s. 1 for Windows (PCIe 4.0) and 1 for CachyOS Linux (PCIe 3.0)

boot into windows and play games for about an hour -
3090ti GPU is undervolted so it runs beautifully. 67c, pulling about 370 watts. Clocks boost 2070

Making zero changes to anything in the system, I reboot and load CachyOS for comparison
3090ti GPU is power capped at 375 watt via GWE - the wattage is staying within the capped limit but the GPU is hot as Hades and clock speeds only boost to about 1900. After about 35 minutes of gaming Goverlay reports temp throttling is active and the readout says 90c!! - This cant be right, even if I completely unleash the GPU in Windows and allow it to consume all 480 watts of power the temp doesnt go above 78c (still on the hot side, which is why I undervolt, but a considerable gap between the two OS’s)

FPS is also considerably lower in Linux vs Windows, Id say about a 10% difference on average. If the temp is truly that high, the reduced FPS make sense, but again, I have never in my life seen this GPU pass 78c while gaming in Windows for the entiretly of my ownership.

Helldivers2 at native 1440p\Ultra settings gets about 100-110 in Windows, but only about 90-95 in Linux. All these behaviors are present from game to game. The Finals, The Witcher 3, Cyberpunk.

Lets skip the technical stuff for now (if we can) and start with the basic questions. I’ll post a debug report after work today if we have to get into the nuts and bolts.

Can someone please explain NVIDIA drivers in Linux to me like I am 5 years old? I have read the documentation on the Arch wiki but there are alot of terms I am just not familiar with. X11, Wayland, Mesa, Booster, DRM Kernel Mode, Nvidia-xconfig, nvidia-settings, cachyos-settings… I’m so lost on which application or combination thereof will give me the ability to undervolt and power limit the card to resolve this issue, or if the temp sensors are not reading correctly?

Our!

First, it’s best to go over your hardware specifications. Otherwise it becomes difficult for people to help.

Run the following command in the terminal and show the response here:

inxi -bGi

Second, are you using Plasma Desktop? It is the choice selected by default during installation. It starts with the new wayland “graphics mode”. This combination, Wayland+Nvidia, is very new and may not work for some. For me it doesn’t work very well.

I personally use the “older” way, Xorg. I’ve been using Linux since 1998 and have never, EVER, had any problems with Nvidia.

So stay calm, and let’s find out what happens to your system. You yourself stated that these are new terms for you, so there is a lot to get familiar with.

1 Like

I talked to ptr1337 yesterday about this topic.

There is a small regress with the new plasma 6.1. Both me and ptr1337 dont have the issue. Fellow member of the community has a 4090 like me, he has the problem. But if you say you have a bit lower fps right now, that might be correct, due to that issue. Ptr1337 mentioned about 15% lost perf.

As far as i know, they are on the case, and this is all on the KDE team to fix.

So give it a few days yet. Then it prolly come an update for it :slight_smile:

Im running a test iso of the Cachyos atm, the next release. That I installed 2 days ago, so i have not looked into performance. Was planning on benchmarking 25 games yesterday to compare performance past 2 months. But im waiting a few days my self to see if the update get pushed, so i dont have to do the work twice.

1 Like

Hey maybe GWE not working properly as it’s not mantained anymore and can you provide the result of this command

nvidia-smi -q -d POWER

And you can try to change power limit via terminal maybe that will help so for that you can do this

sudo nvidia-smi -i <gpu_id> -pl <power_limit>

Replace <power_limit> with the desired power limit in watts and <gpu_id> with the ID of your GPU (usually 0 if you have one GPU).

For example to set the power limit of gpu 0 to 375

sudo nvidia-smi -i 0 -pl 375

And we don’t have RTX Digital vibrance or rtx super resulotion in linux bcz nvidia doesn’t give that on linux only for windows

1 Like

LOL i love the “calm down”, its like you knew :joy:. Here is the output from inxi -bGi

System:
Host: DARTHLINUX Kernel: 6.9.5-2-cachyos arch: x86_64 bits: 64
Desktop: KDE Plasma v: 6.1.0 Distro: CachyOS
Machine:
Type: Desktop System: Gigabyte product: Z690 AORUS ELITE AX DDR4 v: N/A
serial:
Mobo: Gigabyte model: Z690 AORUS ELITE AX DDR4
serial: UEFI: American Megatrends LLC. v: F29
date: 12/14/2023
Battery:
ID-1: hidpp_battery_0 charge: 84% condition: N/A
CPU:
Info: 16-core (8-mt/8-st) 13th Gen Intel Core i7-13700K [MST AMCP]
speed (MHz): avg: 1306 min/max: 800/5300:5400:4200
Graphics:
Device-1: Intel Raptor Lake-S GT1 [UHD Graphics 770] driver: i915 v: kernel
Device-2: NVIDIA GA102 [GeForce RTX 3090 Ti] driver: nvidia v: 555.52.04
Display: x11 server: X.Org v: 21.1.13 with: Xwayland v: 24.1.0 driver: X:
loaded: nvidia gpu: nvidia,nvidia-nvswitch resolution: 2560x1440
API: EGL v: 1.5 drivers: iris,nvidia,swrast
platforms: gbm,x11,surfaceless,device
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 555.52.04
renderer: NVIDIA GeForce RTX 3090 Ti/PCIe/SSE2
API: Vulkan v: 1.3.285 drivers: nvidia surfaces: xcb,xlib
Network:
Device-1: Realtek RTL8125 2.5GbE driver: r8169
IF: enp5s0 state: up speed: 1000 Mbps duplex: full mac: xxxxxxx
IP v4: X.X.X.X\24 type: dynamic noprefixroute scope: global
IP v6: xxxxxxxx64
type: dynamic noprefixroute scope: global
IP v6: fxxxxxxxxx64 type: noprefixroute scope: link
Device-2: Intel Wi-Fi 6E AX210/AX1675 2x2 [Typhoon Peak] driver: iwlwifi
IF: wlan0 state: down mac: xxxxxxx
WAN IP: xxxxxxx
Drives:
Local Storage: total: raw: 8.65 TiB usable: 1.38 TiB used: 3.18 TiB (231.1%)
Info:
Memory: total: 32 GiB note: est. available: 31.11 GiB used: 3.09 GiB (9.9%)
Processes: 511 Uptime: 3m Shell: fish inxi: 3.3.34

Here is the output of nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp : Wed Jun 26 17:32:26 2024
Driver Version : 555.52.04
CUDA Version : 12.5

Attached GPUs : 1
GPU 00000000:01:00.0
GPU Power Readings
Power Draw : 17.27 W
Current Power Limit : 450.00 W
Requested Power Limit : 450.00 W
Default Power Limit : 450.00 W
Min Power Limit : 100.00 W
Max Power Limit : 450.00 W
Power Samples
Duration : 5.23 sec
Number of Samples : 119
Max : 55.69 W
Min : 16.15 W
Avg : 25.17 W
GPU Memory Power Readings
Power Draw : N/A
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A

See GWE not working it says max power limit 450w so when you play games it runs to maximum load which is higher then windows, my rtx3060 runs at 170w but it suppose to only run 140 max so my gpu also run hot. Now try to change the power via terminal with the command that i provided, and your gpu id is 1 so replace the 0 to 1 from my command

Issued the command(s) and was able to see that NOW the power cap was correctly reported by the system at 375. It warned me about persistent mode, but I’m pretty sure it’s just another command or an added switch to this command to make it persistent. I’ll work that one out on my own hopefully, not too important for me ATM. Whats more important now is that I havent been able to test it since my SSD says it’s full so none of my games will start…
Where in the name of Linus did 500GB go? But that’s an issue for another thread…I’ll test some games and provide feedback as soon as I can work that out. Out of the proverbial frying pan and into the fire

Thanks everyone for your feedback and assitance. I’ll bump the thread when I get things moving again.

Apparently I buggered up the partitions on that install so bad I had to wipe and reinstall. Everything seems well now. Running under Wayland I am still noticing that temps are about 4c higher in CachyOS on average, which in turn impacts the boost clock, which in turn contributes about a 10% performance drop when compared to running in Windows. Not too much else to dig into on this one.

1 Like

To put a final pin in this one I made the fan curves more aggressive on my GPU and used Powermizer to overclock. The GPU temps are within 2c of Windows and GPU clock boost is also within a tolerable margin of variance. However, I am still noticing about a 10% (give or take) performance hit on Linux vs Windows.
Wolfenstein: Youngblood is a great example. With the settings I use the game will sit nicely at around 130-140fps in Windows. In Linux it sits around 110-120. Helldivers 2 is also a good example - Windows its about 120fps and in Linux is around 103fps. But those are issues for another thread…

Yes, this is what the many youtubers who have regularly conducted benchmarks on nvida vs amd on linux vs windows always find:

  • nvidia performace is generally ~10% worse on linux vs windows.
  • amd performance is generally ~10% better on linux vs windows.

(of course this can vary game by game & there are some exceptions).

→ nvidia is better on windows, amd is better on linux

=> you need to complain to nvidia (file bug reports & flood their support channels, because this has been going on for years, so it won’t change until they get a lot of complaints from customers), they have to improve their linux drivers.

Usually, people who game with nvidia on linux often do it because they prefer the os (some for privacy reasons refuse windows, some just prefer linux, etc…), so they don’t care about the ~10% loss with nvidia.