Hello CachyOS forum, how are ya!
I’ve pulled lama3.1:70b but I don’t have available memory to use it. I just want to try it out, see how slow it will be on my laptop, and I’m so close to have sufficient memory! Which way is the best for me if I don’t buy more RAM… Should I increase swap or should I increase zram? Follow up question is, how do I then increase that memory you suggest?
ollama run llama3.1:70b
Error: model requires more system memory (35.5 GiB) than is available (27.0 GiB)
~
❯ zramctl
NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 zstd 15,5G 1,4M 215,7K 988K 8 [SWAP]
~
❯ free -t
total used free shared buff/cache available
Mem: 16220188 3440208 3296688 1409748 11234780 12779980
Swap: 16220156 1536 16218620
Total: 32440344 3441744 19515308
Hey @Totte
There are several problems going on here and to make a long story short, currently, no laptop can run Llama 3.1 70B
So let’s dissect the reasons why:
VRAM is not your laptops’s memory, VRAM is your laptops graphics card memory. My laptop has 8 GB and I can NOT run the 70B model. I can run the 8B model and depending on how much VRAM your laptop has, you should be able to as well.
You can NOT upgrade VRAM. Laptop memory can be upgraded depending on your model and configuration but it will be limited in comparison to a desktop computer.
To run LLMs with 70 Billion parameters, you should subscribe to an online service or, if money is no object, buy or build a desktop PC with an NVIDIA 4090. According to some online articles, even that card isn’t sufficient which is why NVIDIA sells special gfx cards with 48 GB of VRAM or more.
I hope this helps you understand the requirements a little better.
To give you an idea of the capabilites that powerful LLMs, which won’t run on our home computers offer, go to https://chatgpt.com and have fun. While there, you can ask all kinds of questions regarding VRAM and RAM and so on.
Thx @vancouver for replying!
My laptop’s GPU has 4 GB memory and just like your laptop can run the 8B model.
I understand that it will be slow to run 70B model on my laptop, I just want to know HOW slow.
Do you mean that even if I increase my swap to acceptable level (more than 32 GB total available RAM), the llama will complain in a next step and tell me that I cannot run llama because of not meeting the minimum requirement of VRAM?
If llama doesn’t stop me, I still would like to try 70B just to see HOW slow llamas response will be.
I use the llama3.1 8B in my laptop, it’s quite fine. I asked llama if it is possible or not to run 70B model on my specs, and it answered:
"It is theoretically possible to run the LLaMA 3.1 70B model on your system, but not
practically feasible.**
Theoretically, you could try to install a deep learning framework like PyTorch or
TensorFlow, download the pre-trained LLaMA 3.1 70B model, and attempt to load it onto
your system. However:
Your NVIDIA Quadro M1200 Mobile GPU is likely to cause significant performance
bottlenecks.
The amount of memory (RAM) allocated for your system is far too low to fit the
model’s weights and activations.
In practice, attempting to run the LLaMA 3.1 70B model on your system would most
likely result in:
Slow performance due to GPU limitations
Memory allocation errors or crashes
So, while it might be technically possible to try, I wouldn’t recommend wasting time
trying to make it work".
The thing is, this is an experiment I’m doing, so the question still stands, what swap would be the recommended swap of upgrading in my system when working with llama, and how do I do that?
With the help of llama I managed to increase the zram (I increased it to 48 GB).
But, llama 70b still complained about not enough system memoyy available. Looks like it wants me to increase the physical RAM installed. I give up!
Yeah, the problem is too that you need VRAM, since the RAM is not shared with the VRAM in these sizes (e.g this is not a unified RAM as it is with the MacBook Mx for example).
You would need for a 70B modell way more RAM as well as VRAM.
Yes, and what I noticed when installing another AI was that I could see the computers RAM being eaten up until the mxaimum 16GB of what the laptops has, then the terminal/server crashed, so I can only stay with lower models.