Limine Ollama Performance

I’ve been playing with local AI using Ollama on the rEFInd AI SDK, but recently switched to the Limine bootloader due to its integrated snapshots feature.

After installing ollama-cuda and its dependencies from the Cachy v3 repo, I’m not seeing the expected performance boost. Ollama seems to now rely on my CPU instead of fully utilizing my RTX 3060 Mobile GPU, making it quite slow compared to when I was using the AI SDK.

System Specs:

  • CPU: AMD Ryzen 7 5800H (16) @ 4.46 GHz
  • GPU: NVIDIA GeForce RTX 3060 Mobile
  • RAM: 16GB DDR4

I’ve experimented with various models ranging from 4B to 12B, but none have shown a significant performance improvement. Are there any specific settings or configurations Ive missed in Ollama to optimize GPU usage and enhance performance?

Also, If anyone has any recommendations for specific models that work well with these specs, I’d appreciate it.

Cheers!

I am using ollama and openwebui and I just installed ollama with the install script and it autodetected my nvidia card and installed all the dependencies. using systemd boot not limine. Have you tried installing ollama with the install script provided on the ollama website?

Thank you, but I would assume the ollama-cuda package thats maintained by cachy would be the best choice? I will give your way a go aswell.

It still runs as it is, and uses my GPU, but the refind AI SDK bootloader was far more performant. If I could find out what exactly they package when creating their AI SDK, I might get a better chance of figuring out the required optimizations/packages.

I am using the cachy nvidida packages as well, I meant that the ollama script installed all the required nvidia dependnecies for ollama like pytorch etc