Local LLM Setup Guide¶

A guide for running large language models locally using NVIDIA GPU passthrough, Ollama, and Open WebUI.

NVIDIA Driver Installation (Ubuntu)¶

Add Driver Repository¶

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

Install Driver¶

# Install specific version
sudo apt install nvidia-driver-570

# Or use auto-detection
sudo ubuntu-drivers autoinstall

Reboot and Verify¶

sudo reboot

After reboot:

# Check GPU status
nvidia-smi

# Load NVIDIA module
sudo modprobe nvidia

# Detailed GPU info
lspci -v | grep -A 12 NVIDIA

GPU Passthrough Verification (Proxmox VMs)¶

Check IOMMU Status¶

dmesg | grep -i iommu

Check VFIO Configuration¶

cat /etc/default/grub | grep iommu

SecureBoot Status¶

mokutil --sb-state

Note: If SecureBoot is enabled, you may need to disable it for NVIDIA drivers to load:
sudo apt install mokutil
sudo mokutil --disable-validation

Driver Verification¶

modinfo nvidia | grep version
dkms status

Monitor GPU in Real-Time¶

watch -n 0.5 nvidia-smi

Ollama Installation¶

Install Ollama¶

curl -fsSL https://ollama.com/install.sh | sh

Basic Usage¶

# List installed models
ollama list

# Run a model
ollama run llama3.1:8b

# Stop a running model
ollama stop llama3.1:8b

# Pull a model without running
ollama pull deepseek-r1:14b

Open WebUI (Docker)¶

Open WebUI provides a ChatGPT-like interface for Ollama models.

Default Setup (Ollama on Same Machine)¶

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Remote Ollama Server¶

docker run -d -p 3000:8080 \
  -e OLLAMA_BASE_URL=https://ollama.example.com \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

With NVIDIA GPU Support¶

docker run -d -p 3000:8080 \
  --gpus all \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:cuda

OpenAI API Only¶

docker run -d -p 3000:8080 \
  -e OPENAI_API_KEY=your_api_token_here \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Bundled (Ollama + WebUI in One Container)¶

With GPU:

docker run -d -p 3000:8080 \
  --gpus=all \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

CPU Only:

docker run -d -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

Access¶

Open your browser and navigate to:

http://server.example.com:3000

Create an admin account on first login.