LocalAI Deployment and Usage Guide

1. Prerequisites

System Requirements

Operating System: Linux, macOS, or Windows (with WSL2 recommended for Windows)
Memory: Minimum 4GB RAM (8GB+ recommended for larger models)
Storage: 2GB+ free space for models and dependencies
CPU: Modern x86-64 or ARM64 processor (no GPU required)

Required Software

Go 1.21+ (for building from source)
Docker (for containerized deployment)
Python 3.8+ (optional, for some utilities and examples)
Git (for cloning the repository)

Optional Dependencies

CUDA (for GPU acceleration, if available)
FFmpeg (for audio/video processing features)
Make (for build automation)

2. Installation

Quick Start with Docker (Recommended)

# Pull the latest Docker image
docker pull quay.io/go-skynet/local-ai:latest

# Or from Docker Hub
docker pull localai/localai:latest

# Run with basic configuration
docker run -p 8080:8080 -v $PWD/models:/models localai/localai:latest

Manual Installation from Source

# Clone the repository
git clone https://github.com/go-skynet/LocalAI
cd LocalAI

# Build with Make (includes downloading models)
make build

# Or build manually
go build -o local-ai .

# For GPU support (CUDA)
make build-cuda

Package Manager Installation

# Using Homebrew (macOS)
brew install local-ai

# Using Nix
nix run github:go-skynet/LocalAI

# Using Snap (Linux)
snap install local-ai

3. Configuration

Environment Variables

Create a .env file or set these environment variables:

# Core Configuration
MODELS_PATH=/path/to/models          # Directory for model files
THREADS=4                            # CPU threads to use
CONTEXT_SIZE=512                     # Default context size
DEBUG=true                           # Enable debug logging

# API Configuration
API_KEY=your-api-key-here            # Optional API key for authentication
CORS_ALLOW_ORIGINS=*                 # CORS settings
LISTEN_ADDRESS=:8080                 # Server address

# Model Configuration
MODEL=ggml-gpt4all-j.bin            # Default model to load
BACKEND=llama                        # Inference backend

Model Configuration Files

Create model configurations in models/ directory:

# models/gpt4all.yaml
name: gpt4all
backend: llama
parameters:
  model: ggml-gpt4all-j.bin
context_size: 512
threads: 4

API Keys (Optional)

LocalAI can use API keys for authentication:

Set via API_KEY environment variable
Or configure in application.yaml:

authentication:
  api_keys:
    - "your-api-key-here"

4. Build & Run

Development Build

# Clone and setup
git clone https://github.com/go-skynet/LocalAI
cd LocalAI

# Install dependencies
go mod download

# Build development version
go build -tags dev -o local-ai .

# Run with hot reload (air)
air

Production Build

# Build optimized binary
make build

# Or with specific features
make build RELEASE=1

# Run the server
./local-ai --models-path ./models --context-size 2048 --threads 8

Running with Docker Compose

Create docker-compose.yaml:

version: '3.8'
services:
  localai:
    image: localai/localai:latest
    ports:
      - "8080:8080"
    volumes:
      - ./models:/models
      - ./images:/images
    environment:
      - MODELS_PATH=/models
      - THREADS=4
      - CONTEXT_SIZE=512
    restart: unless-stopped

Run with:

docker-compose up -d

Testing the Installation

# Check if server is running
curl http://localhost:8080/ready

# Test chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

5. Deployment

Containerized Deployment

# Build custom Docker image
docker build -t my-localai .

# Push to container registry
docker tag my-localai myregistry.localai:latest
docker push myregistry.localai:latest

# Kubernetes deployment
kubectl apply -f kubernetes/

Platform Recommendations

Cloud Platforms

AWS: ECS/EKS with Fargate for serverless
Google Cloud: Cloud Run or GKE
Azure: Container Instances or AKS
DigitalOcean: App Platform or Droplets

Self-Hosted Options

Home Server: Docker on Ubuntu Server
NAS Devices: Docker on Synology/QNAP
Raspberry Pi: ARM64 builds available

Production Considerations

Reverse Proxy: Use nginx or Caddy for SSL termination
Load Balancing: Deploy multiple instances behind a load balancer
Persistent Storage: Mount volumes for models and data
Monitoring: Enable metrics endpoint at /metrics

Kubernetes Deployment

# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: localai
spec:
  replicas: 2
  selector:
    matchLabels:
      app: localai
  template:
    metadata:
      labels:
        app: localai
    spec:
      containers:
      - name: localai
        image: localai/localai:latest
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: models
          mountPath: /models
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
      volumes:
      - name: models
        persistentVolumeClaim:
          claimName: localai-models

6. Troubleshooting

Common Issues and Solutions

Server Won't Start

# Check port availability
netstat -tulpn | grep :8080

# Run with debug logging
./local-ai --debug

# Check for missing models
ls -la models/

Model Loading Failures

# Verify model files exist
file models/your-model.bin

# Check model configuration
cat models/your-model.yaml

# Test with a simple model first
curl -O https://huggingface.co/local-ai/example-models/resolve/main/ggml-gpt4all-j.bin

Performance Issues

# Increase available threads
export THREADS=$(nproc)

# Adjust context size for memory
export CONTEXT_SIZE=1024

# Enable GPU if available
export GPU_LAYERS=20

API Connection Problems

# Test API endpoint
curl -v http://localhost:8080/v1/models

# Check CORS settings
export CORS_ALLOW_ORIGINS="http://localhost:3000"

# Verify API key (if enabled)
curl -H "Authorization: Bearer $API_KEY" http://localhost:8080/v1/chat/completions

Audio/Video Features Not Working

# Install FFmpeg
sudo apt install ffmpeg  # Ubuntu/Debian
brew install ffmpeg      # macOS

# Check audio backend dependencies
ldd $(which local-ai) | grep -i audio

WebSocket/Realtime Issues

# Check WebSocket support
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" http://localhost:8080

# Increase WebSocket message size limit
export MAX_WEBSOCKET_MESSAGE_SIZE=10485760

Getting Help

Documentation: https://localai.io/
FAQ: https://localai.io/faq/
Discord: https://discord.gg/uJAeKSAGDy
GitHub Discussions: https://github.com/go-skynet/LocalAI/discussions
Issue Tracker: https://github.com/go-skynet/LocalAI/issues

Logging and Debugging

# Enable verbose logging
export DEBUG=true
export LOG_LEVEL=debug

# View logs in real-time
docker logs -f localai-container

# Generate debug report
./local-ai --debug-report > debug_report.txt

How to Deploy & Use LocalAI

LocalAI Deployment and Usage Guide

1. Prerequisites

System Requirements

Required Software

Optional Dependencies

2. Installation

Quick Start with Docker (Recommended)

Manual Installation from Source

Package Manager Installation

3. Configuration

Environment Variables

Model Configuration Files

API Keys (Optional)

4. Build & Run

Development Build

Production Build

Running with Docker Compose

Testing the Installation

5. Deployment

Containerized Deployment

Platform Recommendations

Cloud Platforms

Self-Hosted Options

Production Considerations

Kubernetes Deployment

6. Troubleshooting

Common Issues and Solutions

Server Won't Start

Model Loading Failures

Performance Issues

API Connection Problems

Audio/Video Features Not Working

WebSocket/Realtime Issues

Getting Help

Logging and Debugging