I’ve been playing with local AI using Ollama on the rEFInd AI SDK, but recently switched to the Limine bootloader due to its integrated snapshots feature.
After installing ollama-cuda and its dependencies from the Cachy v3 repo, I’m not seeing the expected performance boost. Ollama seems to now rely on my CPU instead of fully utilizing my RTX 3060 Mobile GPU, making it quite slow compared to when I was using the AI SDK.
System Specs:
- CPU: AMD Ryzen 7 5800H (16) @ 4.46 GHz
- GPU: NVIDIA GeForce RTX 3060 Mobile
- RAM: 16GB DDR4
I’ve experimented with various models ranging from 4B to 12B, but none have shown a significant performance improvement. Are there any specific settings or configurations Ive missed in Ollama to optimize GPU usage and enhance performance?
Also, If anyone has any recommendations for specific models that work well with these specs, I’d appreciate it.
Cheers!