← Back to SurfSense

How to Deploy & Use SurfSense

SurfSense Deployment and Usage Guide

1. Prerequisites

Required Software

  • Docker (recommended) or Python 3.11+
  • Docker Compose (optional, for multi-container setup)
  • PostgreSQL 14+ (if not using Docker)
  • Redis (for Celery task queue and caching)
  • Git (for cloning the repository)

API Keys & Accounts

Depending on which connectors you plan to use, you may need:

  • LLM Provider API Keys: OpenAI, Anthropic, Google AI, Azure OpenAI, or local LLM (vLLM/Ollama)
  • Search Engine APIs: Tavily, SearxNG, or LinkUp
  • Cloud Service Credentials: Google Drive, Slack, Microsoft Teams, Notion, GitHub, etc.
  • Embedding & Reranker Services: OpenAI, Cohere, Voyage, Jina, or local models

System Requirements

  • Minimum 4GB RAM (8GB+ recommended for local LLMs)
  • 10GB+ free disk space for document storage and vector indices
  • Multi-core CPU for optimal performance

2. Installation

Docker (Recommended)

# Pull and run the latest image
docker run -d \
  -p 3000:3000 \
  -p 8000:8000 \
  -p 5133:5133 \
  -v surfsense-data:/data \
  --name surfsense \
  --restart unless-stopped \
  ghcr.io/modsetter/surfsense:latest

Docker Compose

# docker-compose.yml
version: '3.8'
services:
  surfsense:
    image: ghcr.io/modsetter/surfsense:latest
    ports:
      - "3000:3000"
      - "8000:8000"
      - "5133:5133"
    volumes:
      - surfsense-data:/data
    environment:
      - DATABASE_URL=postgresql://user:password@postgres:5432/surfsense
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      - postgres
      - redis
    restart: unless-stopped
  
  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=surfsense
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
  
  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data

volumes:
  surfsense-data:
  postgres-data:
  redis-data:

Manual Installation

# Clone the repository
git clone https://github.com/MODSetter/SurfSense.git
cd SurfSense

# Install backend dependencies
cd surfsense_backend
pip install -r requirements.txt

# Install frontend dependencies (if building from source)
cd ../surfsense_frontend
npm install

3. Configuration

Environment Variables

Create a .env file in the backend directory:

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/surfsense

# Redis
REDIS_URL=redis://localhost:6379/0
REDIS_APP_URL=redis://localhost:6379/1

# Security
SECRET_KEY=your-secret-key-here
ENCRYPTION_KEY=your-encryption-key-for-oauth-tokens

# LLM Configuration (choose one or more)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=your-key
GOOGLE_AI_API_KEY=your-key
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_ENDPOINT=your-endpoint

# Local LLM (optional)
OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000

# Embedding Models
OPENAI_EMBEDDINGS_API_KEY=sk-...
VOYAGE_API_KEY=your-key
JINA_API_KEY=your-key

# Search APIs
TAVILY_API_KEY=your-key
LINKUP_API_KEY=your-key

# File Processing
UNSTRUCTURED_API_KEY=your-key  # For cloud processing
LLAMACLOUD_API_KEY=your-key    # Alternative cloud processor

# Connector-specific credentials
GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret
NOTION_CLIENT_ID=your-client-id
NOTION_CLIENT_SECRET=your-client-secret
# ... add other connector credentials as needed

# Application Settings
APP_ENV=production  # or development
FRONTEND_URL=http://localhost:3000
BACKEND_URL=http://localhost:8000
CORS_ORIGINS=http://localhost:3000,http://localhost:5173

# Task Processing
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

Configuration Files

  • app/config.py: Main configuration module with settings classes
  • alembic.ini: Database migration configuration
  • docker-compose.yml: Multi-service deployment (if using Docker Compose)

Database Setup

# Initialize database (if not using Docker)
cd surfsense_backend

# Run migrations
alembic upgrade head

# Create initial admin user (if needed)
python -m app.scripts.create_admin_user

4. Build & Run

Development Mode

# Backend (with hot reload)
cd surfsense_backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Frontend (in separate terminal)
cd surfsense_frontend
npm run dev

# Celery worker (for background tasks)
cd surfsense_backend
celery -A app.celery_app worker --loglevel=info

# Beat scheduler (for periodic tasks)
celery -A app.celery_app beat --loglevel=info

Production Build

# Build frontend
cd surfsense_frontend
npm run build

# Build Docker image (optional)
docker build -t surfsense:latest .

# Run with production settings
cd surfsense_backend
APP_ENV=production uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4

Docker Updates

# Manual update
docker pull ghcr.io/modsetter/surfsense:latest
docker stop surfsense
docker rm surfsense
docker run ... # (same as initial run command)

# Automatic updates with Watchtower
docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  nickfedor/watchtower \
  --run-once surfsense

5. Deployment

Cloud Platforms

  • AWS: Use ECS/EKS with RDS (PostgreSQL) and ElastiCache (Redis)
  • Google Cloud: Cloud Run or GKE with Cloud SQL and Memorystore
  • Azure: Container Instances or AKS with Azure Database and Redis Cache
  • DigitalOcean: App Platform or Droplets with Managed Databases
  • Railway/Replit: One-click deployment options

Kubernetes (Helm)

# Example deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: surfsense
spec:
  replicas: 3
  selector:
    matchLabels:
      app: surfsense
  template:
    metadata:
      labels:
        app: surfsense
    spec:
      containers:
      - name: surfsense
        image: ghcr.io/modsetter/surfsense:latest
        ports:
        - containerPort: 8000
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: surfsense-secrets
              key: database-url
        # ... other environment variables

Reverse Proxy Setup (Nginx)

# /etc/nginx/sites-available/surfsense
server {
    listen 80;
    server_name surfsense.yourdomain.com;
    
    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
    
    location /api/ {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
    
    location /ws/ {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }
}

Scaling Considerations

  • Database: Use connection pooling (PgBouncer) for PostgreSQL
  • Redis: Enable persistence and consider Redis Cluster for high availability
  • File Storage: Use S3/MinIO for document storage in multi-instance deployments
  • Vector Search: Consider dedicated vector databases (Qdrant, Pinecone) for large knowledge bases

6. Troubleshooting

Common Issues

1. Database Connection Errors

# Check if PostgreSQL is running
sudo systemctl status postgresql

# Test connection
psql -h localhost -U user -d surfsense

# Reset migrations if needed
alembic downgrade base
alembic upgrade head

2. Redis Connection Issues

# Check Redis status
redis-cli ping  # Should return "PONG"

# Test from Python
python -c "import redis; r = redis.Redis(); print(r.ping())"

3. Document Processing Failures

  • Symptoms: Documents stuck in "pending" or "processing" state
  • Check Celery worker logs: celery -A app.celery_app worker --loglevel=debug
  • Verify file processor API keys: Unstructured, LlamaCloud, or local Docling
  • Check storage permissions: Ensure write access to document storage directory

4. Connector Sync Issues

# Check connector status in database
SELECT * FROM search_source_connectors WHERE last_indexed IS NULL;

# Manual sync trigger (example for Google Drive)
from app.tasks.connector_indexers.google_drive_indexer import index_google_drive_files
# Call with appropriate parameters

5. LLM Integration Problems

  • API Key Validation: Test keys directly with provider
  • Rate Limiting: Implement exponential backoff in app/services/llm_service.py
  • Model Availability: Check if specified model exists in your provider account

6. Memory Issues with Local LLMs

# Monitor resource usage
docker stats surfsense  # or htop/glances

# Reduce vLLM memory usage
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_CPU_KVCACHE_SPACE=2  # GB

# Use smaller models for embeddings
EMBEDDING_MODEL_NAME="BAAI/bge-small-en-v1.5"

7. WebSocket/Realtime Chat Issues

  • Check CORS configuration: Ensure WebSocket origins are allowed
  • Verify Redis Pub/Sub: Used for realtime collaboration
  • Browser console errors: Check for WebSocket connection failures

Logging and Debugging

# View application logs
docker logs surfsense -f

# Check specific service logs
docker logs surfsense --tail 100 | grep -i "error\|exception"

# Enable debug logging
export LOG_LEVEL=DEBUG
export PYTHONASYNCIODEBUG=1

# Database query logging
export SQLALCHEMY_ECHO=1

Performance Optimization

  1. Database Indexing: Ensure proper indexes on frequently queried columns
  2. Redis Caching: Implement cache for expensive operations
  3. Connection Pooling: Configure SQLAlchemy and Redis connection pools
  4. CDN for Static Assets: Use Cloudflare or similar for frontend assets
  5. Batch Processing: Use Celery for heavy operations like document indexing

Getting Help

Health Checks

# API health endpoint
curl http://localhost:8000/api/health

# Database health
curl http://localhost:8000/api/health/db

# Redis health
curl http://localhost:8000/api/health/redis

# Celery health
celery -A app.celery_app inspect ping