LocalAI Deployment and Usage Guide
1. Prerequisites
System Requirements
- Operating System: Linux, macOS, or Windows (with WSL2 recommended for Windows)
- Memory: Minimum 4GB RAM (8GB+ recommended for larger models)
- Storage: 2GB+ free space for models and dependencies
- CPU: Modern x86-64 or ARM64 processor (no GPU required)
Required Software
- Go 1.21+ (for building from source)
- Docker (for containerized deployment)
- Python 3.8+ (optional, for some utilities and examples)
- Git (for cloning the repository)
Optional Dependencies
- CUDA (for GPU acceleration, if available)
- FFmpeg (for audio/video processing features)
- Make (for build automation)
2. Installation
Quick Start with Docker (Recommended)
# Pull the latest Docker image
docker pull quay.io/go-skynet/local-ai:latest
# Or from Docker Hub
docker pull localai/localai:latest
# Run with basic configuration
docker run -p 8080:8080 -v $PWD/models:/models localai/localai:latest
Manual Installation from Source
# Clone the repository
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
# Build with Make (includes downloading models)
make build
# Or build manually
go build -o local-ai .
# For GPU support (CUDA)
make build-cuda
Package Manager Installation
# Using Homebrew (macOS)
brew install local-ai
# Using Nix
nix run github:go-skynet/LocalAI
# Using Snap (Linux)
snap install local-ai
3. Configuration
Environment Variables
Create a .env file or set these environment variables:
# Core Configuration
MODELS_PATH=/path/to/models # Directory for model files
THREADS=4 # CPU threads to use
CONTEXT_SIZE=512 # Default context size
DEBUG=true # Enable debug logging
# API Configuration
API_KEY=your-api-key-here # Optional API key for authentication
CORS_ALLOW_ORIGINS=* # CORS settings
LISTEN_ADDRESS=:8080 # Server address
# Model Configuration
MODEL=ggml-gpt4all-j.bin # Default model to load
BACKEND=llama # Inference backend
Model Configuration Files
Create model configurations in models/ directory:
# models/gpt4all.yaml
name: gpt4all
backend: llama
parameters:
model: ggml-gpt4all-j.bin
context_size: 512
threads: 4
API Keys (Optional)
LocalAI can use API keys for authentication:
- Set via
API_KEYenvironment variable - Or configure in
application.yaml:
authentication:
api_keys:
- "your-api-key-here"
4. Build & Run
Development Build
# Clone and setup
git clone https://github.com/go-skynet/LocalAI
cd LocalAI
# Install dependencies
go mod download
# Build development version
go build -tags dev -o local-ai .
# Run with hot reload (air)
air
Production Build
# Build optimized binary
make build
# Or with specific features
make build RELEASE=1
# Run the server
./local-ai --models-path ./models --context-size 2048 --threads 8
Running with Docker Compose
Create docker-compose.yaml:
version: '3.8'
services:
localai:
image: localai/localai:latest
ports:
- "8080:8080"
volumes:
- ./models:/models
- ./images:/images
environment:
- MODELS_PATH=/models
- THREADS=4
- CONTEXT_SIZE=512
restart: unless-stopped
Run with:
docker-compose up -d
Testing the Installation
# Check if server is running
curl http://localhost:8080/ready
# Test chat completion
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello!"}]
}'
5. Deployment
Containerized Deployment
# Build custom Docker image
docker build -t my-localai .
# Push to container registry
docker tag my-localai myregistry.localai:latest
docker push myregistry.localai:latest
# Kubernetes deployment
kubectl apply -f kubernetes/
Platform Recommendations
Cloud Platforms
- AWS: ECS/EKS with Fargate for serverless
- Google Cloud: Cloud Run or GKE
- Azure: Container Instances or AKS
- DigitalOcean: App Platform or Droplets
Self-Hosted Options
- Home Server: Docker on Ubuntu Server
- NAS Devices: Docker on Synology/QNAP
- Raspberry Pi: ARM64 builds available
Production Considerations
- Reverse Proxy: Use nginx or Caddy for SSL termination
- Load Balancing: Deploy multiple instances behind a load balancer
- Persistent Storage: Mount volumes for models and data
- Monitoring: Enable metrics endpoint at
/metrics
Kubernetes Deployment
# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: localai
spec:
replicas: 2
selector:
matchLabels:
app: localai
template:
metadata:
labels:
app: localai
spec:
containers:
- name: localai
image: localai/localai:latest
ports:
- containerPort: 8080
volumeMounts:
- name: models
mountPath: /models
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
volumes:
- name: models
persistentVolumeClaim:
claimName: localai-models
6. Troubleshooting
Common Issues and Solutions
Server Won't Start
# Check port availability
netstat -tulpn | grep :8080
# Run with debug logging
./local-ai --debug
# Check for missing models
ls -la models/
Model Loading Failures
# Verify model files exist
file models/your-model.bin
# Check model configuration
cat models/your-model.yaml
# Test with a simple model first
curl -O https://huggingface.co/local-ai/example-models/resolve/main/ggml-gpt4all-j.bin
Performance Issues
# Increase available threads
export THREADS=$(nproc)
# Adjust context size for memory
export CONTEXT_SIZE=1024
# Enable GPU if available
export GPU_LAYERS=20
API Connection Problems
# Test API endpoint
curl -v http://localhost:8080/v1/models
# Check CORS settings
export CORS_ALLOW_ORIGINS="http://localhost:3000"
# Verify API key (if enabled)
curl -H "Authorization: Bearer $API_KEY" http://localhost:8080/v1/chat/completions
Audio/Video Features Not Working
# Install FFmpeg
sudo apt install ffmpeg # Ubuntu/Debian
brew install ffmpeg # macOS
# Check audio backend dependencies
ldd $(which local-ai) | grep -i audio
WebSocket/Realtime Issues
# Check WebSocket support
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" http://localhost:8080
# Increase WebSocket message size limit
export MAX_WEBSOCKET_MESSAGE_SIZE=10485760
Getting Help
- Documentation: https://localai.io/
- FAQ: https://localai.io/faq/
- Discord: https://discord.gg/uJAeKSAGDy
- GitHub Discussions: https://github.com/go-skynet/LocalAI/discussions
- Issue Tracker: https://github.com/go-skynet/LocalAI/issues
Logging and Debugging
# Enable verbose logging
export DEBUG=true
export LOG_LEVEL=debug
# View logs in real-time
docker logs -f localai-container
# Generate debug report
./local-ai --debug-report > debug_report.txt