Run LLaMA Models in LM Studio

atti_reti · June 14, 2025, 7:00pm

How to Run LLaMA Models in LM Studio

On Linux, .AppImage and .deb versions are available. No CLI setup needed.

In the model search bar, type llama3 or llama2.
Choose a GGUF-format model. For example:
- TheBloke/Llama-2-7B-GGUF
- meta-llama/Llama-3-8B-Instruct-GGUF
Select a quantized version (smaller = faster):
- Good balance: Q4_K_M
- High quality: Q8_0 (uses more RAM)

LM Studio will automatically download and manage the model.

You can adjust the following:

You don’t need a GPU to use these models. CPU-only works fine, especially with quantized versions like Q4_K_M.

Issue	Solution
“Out of memory” error	Use a smaller or lower-quantized model (Q4).
Model doesn’t respond well	Make sure you’re using an Instruct version.
Download fails	Check your internet or manually get the model from Hugging Face.

Once the model is downloaded, everything works entirely offline — great for privacy and performance.