SurfSense Deployment and Usage Guide

1. Prerequisites

Required Software

Docker (recommended) or Python 3.11+
Docker Compose (optional, for multi-container setup)
PostgreSQL 14+ (if not using Docker)
Redis (for Celery task queue and caching)
Git (for cloning the repository)

API Keys & Accounts

Depending on which connectors you plan to use, you may need:

LLM Provider API Keys: OpenAI, Anthropic, Google AI, Azure OpenAI, or local LLM (vLLM/Ollama)
Search Engine APIs: Tavily, SearxNG, or LinkUp
Cloud Service Credentials: Google Drive, Slack, Microsoft Teams, Notion, GitHub, etc.
Embedding & Reranker Services: OpenAI, Cohere, Voyage, Jina, or local models

System Requirements

Minimum 4GB RAM (8GB+ recommended for local LLMs)
10GB+ free disk space for document storage and vector indices
Multi-core CPU for optimal performance

2. Installation

Docker (Recommended)

# Pull and run the latest image
docker run -d \
  -p 3000:3000 \
  -p 8000:8000 \
  -p 5133:5133 \
  -v surfsense-data:/data \
  --name surfsense \
  --restart unless-stopped \
  ghcr.io/modsetter/surfsense:latest

Docker Compose

# docker-compose.yml
version: '3.8'
services:
  surfsense:
    image: ghcr.io/modsetter/surfsense:latest
    ports:
      - "3000:3000"
      - "8000:8000"
      - "5133:5133"
    volumes:
      - surfsense-data:/data
    environment:
      - DATABASE_URL=postgresql://user:password@postgres:5432/surfsense
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      - postgres
      - redis
    restart: unless-stopped
  
  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=surfsense
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres-data:/var/lib/postgresql/data
  
  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data

volumes:
  surfsense-data:
  postgres-data:
  redis-data:

Manual Installation

# Clone the repository
git clone https://github.com/MODSetter/SurfSense.git
cd SurfSense

# Install backend dependencies
cd surfsense_backend
pip install -r requirements.txt

# Install frontend dependencies (if building from source)
cd ../surfsense_frontend
npm install

3. Configuration

Environment Variables

Create a .env file in the backend directory:

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/surfsense

# Redis
REDIS_URL=redis://localhost:6379/0
REDIS_APP_URL=redis://localhost:6379/1

# Security
SECRET_KEY=your-secret-key-here
ENCRYPTION_KEY=your-encryption-key-for-oauth-tokens

# LLM Configuration (choose one or more)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=your-key
GOOGLE_AI_API_KEY=your-key
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_ENDPOINT=your-endpoint

# Local LLM (optional)
OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000

# Embedding Models
OPENAI_EMBEDDINGS_API_KEY=sk-...
VOYAGE_API_KEY=your-key
JINA_API_KEY=your-key

# Search APIs
TAVILY_API_KEY=your-key
LINKUP_API_KEY=your-key

# File Processing
UNSTRUCTURED_API_KEY=your-key  # For cloud processing
LLAMACLOUD_API_KEY=your-key    # Alternative cloud processor

# Connector-specific credentials
GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret
NOTION_CLIENT_ID=your-client-id
NOTION_CLIENT_SECRET=your-client-secret
# ... add other connector credentials as needed

# Application Settings
APP_ENV=production  # or development
FRONTEND_URL=http://localhost:3000
BACKEND_URL=http://localhost:8000
CORS_ORIGINS=http://localhost:3000,http://localhost:5173

# Task Processing
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

Configuration Files

app/config.py: Main configuration module with settings classes
alembic.ini: Database migration configuration
docker-compose.yml: Multi-service deployment (if using Docker Compose)

Database Setup

# Initialize database (if not using Docker)
cd surfsense_backend

# Run migrations
alembic upgrade head

# Create initial admin user (if needed)
python -m app.scripts.create_admin_user

4. Build & Run

Development Mode

# Backend (with hot reload)
cd surfsense_backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Frontend (in separate terminal)
cd surfsense_frontend
npm run dev

# Celery worker (for background tasks)
cd surfsense_backend
celery -A app.celery_app worker --loglevel=info

# Beat scheduler (for periodic tasks)
celery -A app.celery_app beat --loglevel=info

Production Build

# Build frontend
cd surfsense_frontend
npm run build

# Build Docker image (optional)
docker build -t surfsense:latest .

# Run with production settings
cd surfsense_backend
APP_ENV=production uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4

Docker Updates

# Manual update
docker pull ghcr.io/modsetter/surfsense:latest
docker stop surfsense
docker rm surfsense
docker run ... # (same as initial run command)

# Automatic updates with Watchtower
docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  nickfedor/watchtower \
  --run-once surfsense

5. Deployment

Cloud Platforms

AWS: Use ECS/EKS with RDS (PostgreSQL) and ElastiCache (Redis)
Google Cloud: Cloud Run or GKE with Cloud SQL and Memorystore
Azure: Container Instances or AKS with Azure Database and Redis Cache
DigitalOcean: App Platform or Droplets with Managed Databases
Railway/Replit: One-click deployment options

Kubernetes (Helm)

# Example deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: surfsense
spec:
  replicas: 3
  selector:
    matchLabels:
      app: surfsense
  template:
    metadata:
      labels:
        app: surfsense
    spec:
      containers:
      - name: surfsense
        image: ghcr.io/modsetter/surfsense:latest
        ports:
        - containerPort: 8000
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: surfsense-secrets
              key: database-url
        # ... other environment variables

Reverse Proxy Setup (Nginx)

# /etc/nginx/sites-available/surfsense
server {
    listen 80;
    server_name surfsense.yourdomain.com;
    
    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
    
    location /api/ {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_cache_bypass $http_upgrade;
    }
    
    location /ws/ {
        proxy_pass http://localhost:8000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_set_header Host $host;
    }
}

Scaling Considerations

Database: Use connection pooling (PgBouncer) for PostgreSQL
Redis: Enable persistence and consider Redis Cluster for high availability
File Storage: Use S3/MinIO for document storage in multi-instance deployments
Vector Search: Consider dedicated vector databases (Qdrant, Pinecone) for large knowledge bases

6. Troubleshooting

Common Issues

1. Database Connection Errors

# Check if PostgreSQL is running
sudo systemctl status postgresql

# Test connection
psql -h localhost -U user -d surfsense

# Reset migrations if needed
alembic downgrade base
alembic upgrade head

2. Redis Connection Issues

# Check Redis status
redis-cli ping  # Should return "PONG"

# Test from Python
python -c "import redis; r = redis.Redis(); print(r.ping())"

3. Document Processing Failures

Symptoms: Documents stuck in "pending" or "processing" state
Check Celery worker logs: celery -A app.celery_app worker --loglevel=debug
Verify file processor API keys: Unstructured, LlamaCloud, or local Docling
Check storage permissions: Ensure write access to document storage directory

4. Connector Sync Issues

# Check connector status in database
SELECT * FROM search_source_connectors WHERE last_indexed IS NULL;

# Manual sync trigger (example for Google Drive)
from app.tasks.connector_indexers.google_drive_indexer import index_google_drive_files
# Call with appropriate parameters

5. LLM Integration Problems

API Key Validation: Test keys directly with provider
Rate Limiting: Implement exponential backoff in app/services/llm_service.py
Model Availability: Check if specified model exists in your provider account

6. Memory Issues with Local LLMs

# Monitor resource usage
docker stats surfsense  # or htop/glances

# Reduce vLLM memory usage
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_CPU_KVCACHE_SPACE=2  # GB

# Use smaller models for embeddings
EMBEDDING_MODEL_NAME="BAAI/bge-small-en-v1.5"

7. WebSocket/Realtime Chat Issues

Check CORS configuration: Ensure WebSocket origins are allowed
Verify Redis Pub/Sub: Used for realtime collaboration
Browser console errors: Check for WebSocket connection failures

Logging and Debugging

# View application logs
docker logs surfsense -f

# Check specific service logs
docker logs surfsense --tail 100 | grep -i "error\|exception"

# Enable debug logging
export LOG_LEVEL=DEBUG
export PYTHONASYNCIODEBUG=1

# Database query logging
export SQLALCHEMY_ECHO=1

Performance Optimization

Database Indexing: Ensure proper indexes on frequently queried columns
Redis Caching: Implement cache for expensive operations
Connection Pooling: Configure SQLAlchemy and Redis connection pools
CDN for Static Assets: Use Cloudflare or similar for frontend assets
Batch Processing: Use Celery for heavy operations like document indexing

Getting Help

Discord Community: https://discord.gg/ejRNvftDp9
GitHub Issues: Report bugs and feature requests
Documentation: https://www.surfsense.com/docs/
Reddit Community: r/SurfSense

Health Checks

# API health endpoint
curl http://localhost:8000/api/health

# Database health
curl http://localhost:8000/api/health/db

# Redis health
curl http://localhost:8000/api/health/redis

# Celery health
celery -A app.celery_app inspect ping

How to Deploy & Use SurfSense

SurfSense Deployment and Usage Guide

1. Prerequisites

Required Software

API Keys & Accounts

System Requirements

2. Installation

Docker (Recommended)

Docker Compose

Manual Installation

3. Configuration

Environment Variables

Configuration Files

Database Setup

4. Build & Run

Development Mode

Production Build

Docker Updates

5. Deployment

Cloud Platforms

Kubernetes (Helm)

Reverse Proxy Setup (Nginx)

Scaling Considerations

6. Troubleshooting

Common Issues

1. Database Connection Errors

2. Redis Connection Issues

3. Document Processing Failures

4. Connector Sync Issues

5. LLM Integration Problems

6. Memory Issues with Local LLMs

7. WebSocket/Realtime Chat Issues

Logging and Debugging

Performance Optimization

Getting Help

Health Checks