Nvidia power limit at random point of time lowered to 5W

Hi, newbie here, this issue has been plaguing my laptop for a while. The power limit somehow dropped to 5W instead of 50W and tanks my fps to only 10fps. The issue first happened when I was playing War Thunder, so I decided to try out other games to see if it also occurs, and it did. I have tried switching open and closed drivers and even reinstalling the OS, reinstalling did fix for a while and it came back. In the Nvidia settings it showed default and max TGP as “N/A”.

Sat Nov 23 12:19:06 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650 Ti     Off |   00000000:01:00.0  On |                  N/A |
| N/A   69C    P8              8W /    5W |     902MiB /   4096MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       939      G   /usr/lib/Xorg                                  95MiB |
|    0   N/A  N/A      1826      G   ...local/share/Steam/ubuntu12_32/steam          2MiB |
|    0   N/A  N/A      2133      G   ./steamwebhelper                                3MiB |
|    0   N/A  N/A      2172    C+G   ....local/share/Steam/logs/cef_log.txt          4MiB |
|    0   N/A  N/A      7595    C+G   sober                                         783MiB |
+-----------------------------------------------------------------------------------------+

bug report log: ca25da3

My Laptop specs are:
CPU: i5-10300H
RAM: 16GB 3200mhz
GPU: GTX 1650 Ti

If there are some things required that I am missing please do tell me

Can you share the output of powerprofilesctl and systemctl status nvidia-powerd?

Sorry it took hours since it happens randomly

powerprofilesctl:

performance:
    CpuDriver:  intel_pstate
    Degraded:   no

  balanced:
    CpuDriver:  intel_pstate
    PlatformDriver:     placeholder

  power-saver:
    CpuDriver:  intel_pstate
    PlatformDriver:     placeholder

systemctl status nvidia-powerd:

nvidia-powerd.service - nvidia-powerd service
     Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Sat 2024-11-23 16:22:11 WIB; 3h 59min ago
   Duration: 14ms
 Invocation: c5363a21938d445b834205fd60c0f698
    Process: 687 ExecStart=/usr/bin/nvidia-powerd (code=exited, status=1/FAILURE)
   Main PID: 687 (code=exited, status=1/FAILURE)
   Mem peak: 1.7M
        CPU: 4ms

Nov 23 16:22:11 FreakyLenovo systemd[1]: Started nvidia-powerd service.
Nov 23 16:22:11 FreakyLenovo /usr/bin/nvidia-powerd[687]: nvidia-powerd version:1.0(build 1)
Nov 23 16:22:11 FreakyLenovo /usr/bin/nvidia-powerd[687]: SBIOS support not found for NVPCF GET_SUPPORTED function
Nov 23 16:22:11 FreakyLenovo /usr/bin/nvidia-powerd[687]: No matching GPU found
Nov 23 16:22:11 FreakyLenovo /usr/bin/nvidia-powerd[687]: Failed to initialize Dynamic Boost
Nov 23 16:22:11 FreakyLenovo /usr/bin/nvidia-powerd[687]: Failed to detach GPU id 256
Nov 23 16:22:11 FreakyLenovo /usr/bin/nvidia-powerd[687]: Failed to initialize Dynamic Boost
Nov 23 16:22:11 FreakyLenovo /usr/bin/nvidia-powerd[687]: Failed to detach GPU id 256
Nov 23 16:22:11 FreakyLenovo systemd[1]: nvidia-powerd.service: Main process exited, code=exited, status=1/FAILURE
Nov 23 16:22:11 FreakyLenovo systemd[1]: nvidia-powerd.service: Failed with result 'exit-code'.

new bugreport if needed: b2b168e

GTX 1650 isn’t the most modern card…

Chapter 23. Dynamic Boost on Linux says you can check support with nvidia-settings -q DynamicBoostSupport

Nvidia-powerd fails to start - Linux - NVIDIA Developer Forums says “Nvidia-powerd is only for mobile Ampere gpus so it’s useless with your 2080. Please disable and mask the service.”

and Ampere (microarchitecture) - Wikipedia says Ampere is “GeForce 30 series”.

Thank you for the info, I will be testing it out.

From what I read your card will probably be unsupported by nvidia-powerd, for several reasons.
That service is there to raise GPU power limits in case the total system power usage is still not exhausted, for example because the CPU is not fully loaded and there is headroom left. The usecase is usually with laptops (as above articles say) because they are pretty limited on power delivery and pulling too much might cause damage to power regulators.

But then, that’s not your problem but the card locking down to 5 W.

Disabling nvidia-powerd doesn’t seem to fix the issue

systemctl status nvidia-powerd
○ nvidia-powerd.service - nvidia-powerd service
     Loaded: loaded (/usr/lib/systemd/system/nvidia-powerd.service; disabled; preset: disabled)
     Active: inactive (dead)

nvidia-smi
Tue Nov 26 17:01:55 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650 Ti     Off |   00000000:01:00.0  On |                  N/A |
| N/A   61C    P8              2W /    5W |    2596MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1011      G   /usr/lib/Xorg                                  98MiB |
|    0   N/A  N/A      2005      G   ./steamwebhelper                                2MiB |
|    0   N/A  N/A      2056    C+G   ....local/share/Steam/logs/cef_log.txt          4MiB |
|    0   N/A  N/A      7132    C+G   ...pps/common/War Thunder/linux64/aces       2476MiB |
+-----------------------------------------------------------------------------------------+

Disabling nvidia-powerd would have no effect for you as it was already dead.


What does nvidia-smi -q -d POWER show?

For me it’s (+ inline comments):

$ nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Tue Nov 26 11:06:40 2024
Driver Version                            : 565.57.01
CUDA Version                              : 12.7

Attached GPUs                             : 1
GPU 00000000:01:00.0
    GPU Power Readings
        Power Draw                        : 18.64 W   <-- what it's currently pulling when not in D3cold or drained
        Current Power Limit               : 65.00 W   <-- that should be around the default and can be modified by a running nvidia-powerd on a laptop (like mine is)
        Requested Power Limit             : N/A       <-- you can modify the power limit with "sudo nvidia-smi -pl <watt>" - that is, if the platform, the card and the driver supports it (which is not the case for me)
        Default Power Limit               : 60.00 W   <-- default after driver initialization on boot
        Min Power Limit                   : 1.00 W    <-- minimum
        Max Power Limit                   : 75.00 W   <-- maximum; is this your 5 W?
    Power Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found
    GPU Memory Power Readings 
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

Many values are missing here, so don’t worry.

sorry for the late reply, it took awhile for the issue to reappear

this is during the issue occuring

nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Wed Nov 27 21:27:33 2024
Driver Version                            : 565.57.01
CUDA Version                              : 12.7

Attached GPUs                             : 1
GPU 00000000:01:00.0
    GPU Power Readings
        Power Draw                        : 5.17 W
        Current Power Limit               : 5.00 W
        Requested Power Limit             : 50.00 W
        Default Power Limit               : 50.00 W
        Min Power Limit                   : 1.00 W
        Max Power Limit                   : 50.00 W
    Power Samples
        Duration                          : Not Found
        Number of Samples                 : Not Found
        Max                               : Not Found
        Min                               : Not Found
        Avg                               : Not Found

That’s up to date, good

There truly is a power limit in effect, at 5 Watt.

However, it is requested to be 50 W, which is a normal value.

These are also normal.

So, the default, min and max limits are all okay.

There is a power limit active, at 5 Watt.

So the card correctly follows that by consuming 5.17 Watt, looks good.

Question is - where does that power limit come from? The default is 50, so it gets set somewhere along the line from boot to now.

Are you able to change it to 45 Watt with sudo nvidia-smi -pl 45 ?

I have tried changing the power limit but it wont let me

sudo nvidia-smi -pl 45
Changing power management limit is not supported for GPU: 00000000:01:00.0.
Treating as warning and moving on.
All done.

Ugh, hm. So it’s not possible to change the power limit.
Wonder how it got there in the first place.
Maybe the folks at NVIDIA Developer Forums > Linux can help.

Another idea: You could try both open and closed nvidia kernel drivers to see if it makes a difference:

$ pacman -Ss linux-cachyos-nvidia
cachyos-v3/linux-cachyos-nvidia 6.12.1-2
    nvidia module of 565.57.01 driver for the linux-cachyos kernel
cachyos-v3/linux-cachyos-nvidia-open 6.12.1-2 [installed]
    nvidia open modules of 565.57.01 driver for the linux-cachyos kernel