← Back to LocalAI

How to Deploy & Use LocalAI

LocalAI Deployment and Usage Guide

1. Prerequisites

System Requirements

  • Operating System: Linux, macOS, or Windows (with WSL2 recommended for Windows)
  • Memory: Minimum 4GB RAM (8GB+ recommended for larger models)
  • Storage: 2GB+ free space for models and dependencies
  • CPU: Modern x86-64 or ARM64 processor (no GPU required)

Required Software

  • Go 1.21+ (for building from source)
  • Docker (for containerized deployment)
  • Python 3.8+ (optional, for some utilities and examples)
  • Git (for cloning the repository)

Optional Dependencies

  • CUDA (for GPU acceleration, if available)
  • FFmpeg (for audio/video processing features)
  • Make (for build automation)

2. Installation

Quick Start with Docker (Recommended)

# Pull the latest Docker image
docker pull quay.io/go-skynet/local-ai:latest

# Or from Docker Hub
docker pull localai/localai:latest

# Run with basic configuration
docker run -p 8080:8080 -v $PWD/models:/models localai/localai:latest

Manual Installation from Source

# Clone the repository
git clone https://github.com/go-skynet/LocalAI
cd LocalAI

# Build with Make (includes downloading models)
make build

# Or build manually
go build -o local-ai .

# For GPU support (CUDA)
make build-cuda

Package Manager Installation

# Using Homebrew (macOS)
brew install local-ai

# Using Nix
nix run github:go-skynet/LocalAI

# Using Snap (Linux)
snap install local-ai

3. Configuration

Environment Variables

Create a .env file or set these environment variables:

# Core Configuration
MODELS_PATH=/path/to/models          # Directory for model files
THREADS=4                            # CPU threads to use
CONTEXT_SIZE=512                     # Default context size
DEBUG=true                           # Enable debug logging

# API Configuration
API_KEY=your-api-key-here            # Optional API key for authentication
CORS_ALLOW_ORIGINS=*                 # CORS settings
LISTEN_ADDRESS=:8080                 # Server address

# Model Configuration
MODEL=ggml-gpt4all-j.bin            # Default model to load
BACKEND=llama                        # Inference backend

Model Configuration Files

Create model configurations in models/ directory:

# models/gpt4all.yaml
name: gpt4all
backend: llama
parameters:
  model: ggml-gpt4all-j.bin
context_size: 512
threads: 4

API Keys (Optional)

LocalAI can use API keys for authentication:

  • Set via API_KEY environment variable
  • Or configure in application.yaml:
authentication:
  api_keys:
    - "your-api-key-here"

4. Build & Run

Development Build

# Clone and setup
git clone https://github.com/go-skynet/LocalAI
cd LocalAI

# Install dependencies
go mod download

# Build development version
go build -tags dev -o local-ai .

# Run with hot reload (air)
air

Production Build

# Build optimized binary
make build

# Or with specific features
make build RELEASE=1

# Run the server
./local-ai --models-path ./models --context-size 2048 --threads 8

Running with Docker Compose

Create docker-compose.yaml:

version: '3.8'
services:
  localai:
    image: localai/localai:latest
    ports:
      - "8080:8080"
    volumes:
      - ./models:/models
      - ./images:/images
    environment:
      - MODELS_PATH=/models
      - THREADS=4
      - CONTEXT_SIZE=512
    restart: unless-stopped

Run with:

docker-compose up -d

Testing the Installation

# Check if server is running
curl http://localhost:8080/ready

# Test chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

5. Deployment

Containerized Deployment

# Build custom Docker image
docker build -t my-localai .

# Push to container registry
docker tag my-localai myregistry.localai:latest
docker push myregistry.localai:latest

# Kubernetes deployment
kubectl apply -f kubernetes/

Platform Recommendations

Cloud Platforms

  • AWS: ECS/EKS with Fargate for serverless
  • Google Cloud: Cloud Run or GKE
  • Azure: Container Instances or AKS
  • DigitalOcean: App Platform or Droplets

Self-Hosted Options

  • Home Server: Docker on Ubuntu Server
  • NAS Devices: Docker on Synology/QNAP
  • Raspberry Pi: ARM64 builds available

Production Considerations

  1. Reverse Proxy: Use nginx or Caddy for SSL termination
  2. Load Balancing: Deploy multiple instances behind a load balancer
  3. Persistent Storage: Mount volumes for models and data
  4. Monitoring: Enable metrics endpoint at /metrics

Kubernetes Deployment

# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: localai
spec:
  replicas: 2
  selector:
    matchLabels:
      app: localai
  template:
    metadata:
      labels:
        app: localai
    spec:
      containers:
      - name: localai
        image: localai/localai:latest
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: models
          mountPath: /models
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
      volumes:
      - name: models
        persistentVolumeClaim:
          claimName: localai-models

6. Troubleshooting

Common Issues and Solutions

Server Won't Start

# Check port availability
netstat -tulpn | grep :8080

# Run with debug logging
./local-ai --debug

# Check for missing models
ls -la models/

Model Loading Failures

# Verify model files exist
file models/your-model.bin

# Check model configuration
cat models/your-model.yaml

# Test with a simple model first
curl -O https://huggingface.co/local-ai/example-models/resolve/main/ggml-gpt4all-j.bin

Performance Issues

# Increase available threads
export THREADS=$(nproc)

# Adjust context size for memory
export CONTEXT_SIZE=1024

# Enable GPU if available
export GPU_LAYERS=20

API Connection Problems

# Test API endpoint
curl -v http://localhost:8080/v1/models

# Check CORS settings
export CORS_ALLOW_ORIGINS="http://localhost:3000"

# Verify API key (if enabled)
curl -H "Authorization: Bearer $API_KEY" http://localhost:8080/v1/chat/completions

Audio/Video Features Not Working

# Install FFmpeg
sudo apt install ffmpeg  # Ubuntu/Debian
brew install ffmpeg      # macOS

# Check audio backend dependencies
ldd $(which local-ai) | grep -i audio

WebSocket/Realtime Issues

# Check WebSocket support
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" http://localhost:8080

# Increase WebSocket message size limit
export MAX_WEBSOCKET_MESSAGE_SIZE=10485760

Getting Help

Logging and Debugging

# Enable verbose logging
export DEBUG=true
export LOG_LEVEL=debug

# View logs in real-time
docker logs -f localai-container

# Generate debug report
./local-ai --debug-report > debug_report.txt