Run LLaMA Models in LM Studio

:llama: How to Run LLaMA Models in LM Studio

:white_check_mark: 1. Download and Install LM Studio

  • Go to the official LM Studio website:
    :backhand_index_pointing_right: https://lmstudio.ai
  • Download the version for your OS (Windows, macOS, or Linux).
  • Install it like any regular app.

On Linux, .AppImage and .deb versions are available. No CLI setup needed.


:white_check_mark: 2. Launch LM Studio

  • Open the app.
  • The interface will show an empty list of models.
  • Click “Explore models” to browse and download LLMs.

:white_check_mark: 3. Download a LLaMA Model

  • In the model search bar, type llama3 or llama2.
  • Choose a GGUF-format model. For example:
    • TheBloke/Llama-2-7B-GGUF
    • meta-llama/Llama-3-8B-Instruct-GGUF
  • Select a quantized version (smaller = faster):
    • Good balance: Q4_K_M
    • High quality: Q8_0 (uses more RAM)

LM Studio will automatically download and manage the model.


:white_check_mark: 4. Load and Chat with the Model

  • Once downloaded, click “Run” or “Load”.
  • A chat window will open.
  • Start typing your prompts and the model will respond.

:gear: Optional Settings

You can adjust the following:

  • Max Tokens
  • Temperature
  • Top-K / Top-P
  • System Prompt (instruction for the model)

:brain: Recommended System Requirements

Model RAM Needed GPU (optional)
LLaMA 2 7B 6–8 GB 6+ GB VRAM
LLaMA 3 8B 8–12 GB 8+ GB VRAM
LLaMA 3 70B 32–64 GB+ 24+ GB VRAM

You don’t need a GPU to use these models. CPU-only works fine, especially with quantized versions like Q4_K_M.


:red_question_mark: Troubleshooting

Issue Solution
“Out of memory” error Use a smaller or lower-quantized model (Q4).
Model doesn’t respond well Make sure you’re using an Instruct version.
Download fails Check your internet or manually get the model from Hugging Face.

:yellow_circle: Bonus: Run LM Studio Offline

Once the model is downloaded, everything works entirely offline — great for privacy and performance.

6 Likes