How to Run LLaMA Models in LM Studio
1. Download and Install LM Studio
- Go to the official LM Studio website:
https://lmstudio.ai
- Download the version for your OS (Windows, macOS, or Linux).
- Install it like any regular app.
On Linux,
.AppImage
and.deb
versions are available. No CLI setup needed.
2. Launch LM Studio
- Open the app.
- The interface will show an empty list of models.
- Click “Explore models” to browse and download LLMs.
3. Download a LLaMA Model
- In the model search bar, type
llama3
orllama2
. - Choose a GGUF-format model. For example:
TheBloke/Llama-2-7B-GGUF
meta-llama/Llama-3-8B-Instruct-GGUF
- Select a quantized version (smaller = faster):
- Good balance:
Q4_K_M
- High quality:
Q8_0
(uses more RAM)
- Good balance:
LM Studio will automatically download and manage the model.
4. Load and Chat with the Model
- Once downloaded, click “Run” or “Load”.
- A chat window will open.
- Start typing your prompts and the model will respond.
Optional Settings
You can adjust the following:
- Max Tokens
- Temperature
- Top-K / Top-P
- System Prompt (instruction for the model)
Recommended System Requirements
Model | RAM Needed | GPU (optional) |
---|---|---|
LLaMA 2 7B | 6–8 GB | 6+ GB VRAM |
LLaMA 3 8B | 8–12 GB | 8+ GB VRAM |
LLaMA 3 70B | 32–64 GB+ | 24+ GB VRAM |
You don’t need a GPU to use these models. CPU-only works fine, especially with quantized versions like Q4_K_M.
Troubleshooting
Issue | Solution |
---|---|
“Out of memory” error | Use a smaller or lower-quantized model (Q4). |
Model doesn’t respond well | Make sure you’re using an Instruct version. |
Download fails | Check your internet or manually get the model from Hugging Face. |
Bonus: Run LM Studio Offline
Once the model is downloaded, everything works entirely offline — great for privacy and performance.