Ollama Deployment and Usage Guide
Prerequisites
System Requirements
- OS: macOS 11+, Windows 10/11, or Linux (Ubuntu 20.04+, Fedora 36+, Debian 11+)
- Architecture: x86_64, ARM64 (Apple Silicon), or ARMv7
- Memory: 8GB+ RAM (16GB+ recommended for larger models)
- Storage: 10GB+ free space for models
- Optional: NVIDIA GPU (CUDA 11.8+) or AMD GPU (ROCm 5.5+) for acceleration
Build Requirements (Source)
- Go: 1.22 or later
- Git: 2.0+
- C++ Compiler: GCC 11+ or Clang 14+ (for CGO dependencies)
- CMake: 3.24+ (for llama.cpp backend compilation)
Optional Tools
- Docker: 20.10+ (for containerized deployment)
- Python: 3.8+ (for Python SDK usage)
- Node.js: 18+ (for JavaScript SDK usage)
Installation
Quick Install (Recommended)
macOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows (PowerShell):
irm https://ollama.com/install.ps1 | iex
Manual Download:
- macOS: Ollama.dmg
- Windows: OllamaSetup.exe
Docker Installation
# Pull official image
docker pull ollama/ollama
# Run with persistent storage
docker run -d \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama
Build from Source
# Clone repository
git clone https://github.com/ollama/ollama.git
cd ollama
# Build (requires Go 1.22+ and C++ compiler)
go build -o ollama .
# Or use the build script
make
Client Libraries
Python:
pip install ollama
JavaScript/TypeScript:
npm install ollama
Configuration
Environment Variables
Create a systemd override or export in your shell:
# Server configuration
export OLLAMA_HOST=0.0.0.0:11434 # Bind address (default: 127.0.0.1:11434)
export OLLAMA_MODELS=/path/to/models # Model storage location
export OLLAMA_ORIGINS=* # CORS origins (comma-separated)
export OLLAMA_KEEP_ALIVE=5m # Keep models loaded duration
export OLLAMA_NUM_PARALLEL=4 # Parallel request handling
export OLLAMA_MAX_LOADED_MODELS=2 # Max models in memory simultaneously
# GPU Configuration
export CUDA_VISIBLE_DEVICES=0 # Specific NVIDIA GPU
export HIP_VISIBLE_DEVICES=0 # Specific AMD GPU
export OLLAMA_GPU_OVERHEAD=1GB # Reserve VRAM for system
Systemd Service (Linux)
Create /etc/systemd/system/ollama.service.d/override.conf:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/var/lib/ollama/models"
Reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Integration Configuration
Configure AI coding assistants to use Ollama:
Claude Code:
ollama launch claude
# Or manually configure ~/.claude/settings.json
Codex:
ollama launch codex
OpenCode/Droid:
ollama launch opencode
ollama launch droid
Database (Desktop App)
The desktop app stores settings in SQLite (macOS/Windows):
- macOS:
~/Library/Application Support/Ollama/database.sqlite - Windows:
%LOCALAPPDATA%\Ollama\database.sqlite - Schema version: 13 (auto-migrated on startup)
Build & Run
Development Mode
# Start server with debug logging
OLLAMA_DEBUG=1 ./ollama serve
# In another terminal, run a model
./ollama run gemma3
# Or start interactive mode
./ollama
Production Build
# Optimized build
go build -ldflags="-s -w" -o ollama .
# Run as background service
./ollama serve &
GPU Acceleration
NVIDIA:
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 ollama/ollama
AMD:
docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 ollama/ollama
API Usage Examples
REST API:
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'
Python:
from ollama import chat
response = chat(model='gemma3', messages=[
{'role': 'user', 'content': 'Why is the sky blue?'}
])
print(response.message.content)
JavaScript:
import ollama from 'ollama'
const response = await ollama.chat({
model: 'gemma3',
messages: [{ role: 'user', content: 'Why is the sky blue?' }]
})
console.log(response.message.content)
Deployment
Docker Compose
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- OLLAMA_KEEP_ALIVE=24h
- OLLAMA_NUM_PARALLEL=4
webui:
image: open-webui/open-webui:main
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
volumes:
- open-webui:/app/backend/data
volumes:
ollama:
open-webui:
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
resources:
limits:
nvidia.com/gpu: 1
volumeMounts:
- name: ollama-storage
mountPath: /root/.ollama
volumes:
- name: ollama-storage
persistentVolumeClaim:
claimName: ollama-pvc
---
apiVersion: v1
kind: Service
metadata:
name: ollama-service
spec:
selector:
app: ollama
ports:
- port: 11434
targetPort: 11434
type: ClusterIP
Cloud Deployment
AWS (EC2 with GPU):
# Use Deep Learning AMI
docker run -d --gpus all -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
# Configure security group to allow port 11434
Reverse Proxy (Nginx):
location /ollama/ {
proxy_pass http://localhost:11434/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_buffering off;
}
Troubleshooting
Port Already in Use
Error: bind: address already in use
# Find process using port 11434
lsof -i :11434
# Kill existing Ollama process
pkill ollama
# Or change port
OLLAMA_HOST=127.0.0.1:11435 ollama serve
Model Download Failures
Issue: Interrupted downloads or checksum errors
# Remove corrupted model
rm -rf ~/.ollama/models/blobs/sha256-*
# Re-pull model
ollama pull gemma3
GPU Not Detected
Symptoms: Slow inference, CPU usage high
# Verify NVIDIA drivers
nvidia-smi
# Check Docker GPU runtime
docker run --rm --gpus all ollama/ollama nvidia-smi
# Force CPU mode (if needed)
OLLAMA_NO_GPU=1 ollama serve
Out of Memory
Error: runtime error: out of memory
# Reduce parallel requests
export OLLAMA_NUM_PARALLEL=1
# Limit context window (in Modelfile)
PARAMETER num_ctx 2048
# Use smaller quantization
ollama pull gemma3:2b
Permission Denied (Linux)
Fix:
# Add user to ollama group
sudo usermod -aG ollama $USER
# Or fix permissions
sudo chown -R $USER:$USER ~/.ollama
Integration Connection Issues
Claude/Codex not connecting:
- Verify Ollama server is running:
curl http://localhost:11434/api/tags - Check integration config paths in
~/.claude/or~/.codex/ - Ensure
OLLAMA_ORIGINSincludes the integration's origin
Database Lock (Desktop App)
Error: database is locked
# Desktop app uses WAL mode - avoid network drives for database
# Reset if corrupted:
mv ~/Library/Application\ Support/Ollama/database.sqlite ~/Library/Application\ Support/Ollama/database.sqlite.bak
Build Errors
CGO errors:
# macOS
xcode-select --install
# Ubuntu/Debian
sudo apt-get install build-essential cmake
# Fedora
sudo dnf install gcc gcc-c++ cmake