SurfSense Deployment and Usage Guide
1. Prerequisites
Required Software
- Docker (recommended) or Python 3.11+
- Docker Compose (optional, for multi-container setup)
- PostgreSQL 14+ (if not using Docker)
- Redis (for Celery task queue and caching)
- Git (for cloning the repository)
API Keys & Accounts
Depending on which connectors you plan to use, you may need:
- LLM Provider API Keys: OpenAI, Anthropic, Google AI, Azure OpenAI, or local LLM (vLLM/Ollama)
- Search Engine APIs: Tavily, SearxNG, or LinkUp
- Cloud Service Credentials: Google Drive, Slack, Microsoft Teams, Notion, GitHub, etc.
- Embedding & Reranker Services: OpenAI, Cohere, Voyage, Jina, or local models
System Requirements
- Minimum 4GB RAM (8GB+ recommended for local LLMs)
- 10GB+ free disk space for document storage and vector indices
- Multi-core CPU for optimal performance
2. Installation
Docker (Recommended)
# Pull and run the latest image
docker run -d \
-p 3000:3000 \
-p 8000:8000 \
-p 5133:5133 \
-v surfsense-data:/data \
--name surfsense \
--restart unless-stopped \
ghcr.io/modsetter/surfsense:latest
Docker Compose
# docker-compose.yml
version: '3.8'
services:
surfsense:
image: ghcr.io/modsetter/surfsense:latest
ports:
- "3000:3000"
- "8000:8000"
- "5133:5133"
volumes:
- surfsense-data:/data
environment:
- DATABASE_URL=postgresql://user:password@postgres:5432/surfsense
- REDIS_URL=redis://redis:6379/0
depends_on:
- postgres
- redis
restart: unless-stopped
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_DB=surfsense
- POSTGRES_USER=user
- POSTGRES_PASSWORD=password
volumes:
- postgres-data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
volumes:
- redis-data:/data
volumes:
surfsense-data:
postgres-data:
redis-data:
Manual Installation
# Clone the repository
git clone https://github.com/MODSetter/SurfSense.git
cd SurfSense
# Install backend dependencies
cd surfsense_backend
pip install -r requirements.txt
# Install frontend dependencies (if building from source)
cd ../surfsense_frontend
npm install
3. Configuration
Environment Variables
Create a .env file in the backend directory:
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/surfsense
# Redis
REDIS_URL=redis://localhost:6379/0
REDIS_APP_URL=redis://localhost:6379/1
# Security
SECRET_KEY=your-secret-key-here
ENCRYPTION_KEY=your-encryption-key-for-oauth-tokens
# LLM Configuration (choose one or more)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=your-key
GOOGLE_AI_API_KEY=your-key
AZURE_OPENAI_API_KEY=your-key
AZURE_OPENAI_ENDPOINT=your-endpoint
# Local LLM (optional)
OLLAMA_BASE_URL=http://localhost:11434
VLLM_BASE_URL=http://localhost:8000
# Embedding Models
OPENAI_EMBEDDINGS_API_KEY=sk-...
VOYAGE_API_KEY=your-key
JINA_API_KEY=your-key
# Search APIs
TAVILY_API_KEY=your-key
LINKUP_API_KEY=your-key
# File Processing
UNSTRUCTURED_API_KEY=your-key # For cloud processing
LLAMACLOUD_API_KEY=your-key # Alternative cloud processor
# Connector-specific credentials
GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret
NOTION_CLIENT_ID=your-client-id
NOTION_CLIENT_SECRET=your-client-secret
# ... add other connector credentials as needed
# Application Settings
APP_ENV=production # or development
FRONTEND_URL=http://localhost:3000
BACKEND_URL=http://localhost:8000
CORS_ORIGINS=http://localhost:3000,http://localhost:5173
# Task Processing
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0
Configuration Files
app/config.py: Main configuration module with settings classesalembic.ini: Database migration configurationdocker-compose.yml: Multi-service deployment (if using Docker Compose)
Database Setup
# Initialize database (if not using Docker)
cd surfsense_backend
# Run migrations
alembic upgrade head
# Create initial admin user (if needed)
python -m app.scripts.create_admin_user
4. Build & Run
Development Mode
# Backend (with hot reload)
cd surfsense_backend
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Frontend (in separate terminal)
cd surfsense_frontend
npm run dev
# Celery worker (for background tasks)
cd surfsense_backend
celery -A app.celery_app worker --loglevel=info
# Beat scheduler (for periodic tasks)
celery -A app.celery_app beat --loglevel=info
Production Build
# Build frontend
cd surfsense_frontend
npm run build
# Build Docker image (optional)
docker build -t surfsense:latest .
# Run with production settings
cd surfsense_backend
APP_ENV=production uvicorn app.main:app --host 0.0.0.0 --port 8000 --workers 4
Docker Updates
# Manual update
docker pull ghcr.io/modsetter/surfsense:latest
docker stop surfsense
docker rm surfsense
docker run ... # (same as initial run command)
# Automatic updates with Watchtower
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
nickfedor/watchtower \
--run-once surfsense
5. Deployment
Cloud Platforms
- AWS: Use ECS/EKS with RDS (PostgreSQL) and ElastiCache (Redis)
- Google Cloud: Cloud Run or GKE with Cloud SQL and Memorystore
- Azure: Container Instances or AKS with Azure Database and Redis Cache
- DigitalOcean: App Platform or Droplets with Managed Databases
- Railway/Replit: One-click deployment options
Kubernetes (Helm)
# Example deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: surfsense
spec:
replicas: 3
selector:
matchLabels:
app: surfsense
template:
metadata:
labels:
app: surfsense
spec:
containers:
- name: surfsense
image: ghcr.io/modsetter/surfsense:latest
ports:
- containerPort: 8000
- containerPort: 3000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: surfsense-secrets
key: database-url
# ... other environment variables
Reverse Proxy Setup (Nginx)
# /etc/nginx/sites-available/surfsense
server {
listen 80;
server_name surfsense.yourdomain.com;
location / {
proxy_pass http://localhost:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
location /api/ {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
location /ws/ {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
proxy_set_header Host $host;
}
}
Scaling Considerations
- Database: Use connection pooling (PgBouncer) for PostgreSQL
- Redis: Enable persistence and consider Redis Cluster for high availability
- File Storage: Use S3/MinIO for document storage in multi-instance deployments
- Vector Search: Consider dedicated vector databases (Qdrant, Pinecone) for large knowledge bases
6. Troubleshooting
Common Issues
1. Database Connection Errors
# Check if PostgreSQL is running
sudo systemctl status postgresql
# Test connection
psql -h localhost -U user -d surfsense
# Reset migrations if needed
alembic downgrade base
alembic upgrade head
2. Redis Connection Issues
# Check Redis status
redis-cli ping # Should return "PONG"
# Test from Python
python -c "import redis; r = redis.Redis(); print(r.ping())"
3. Document Processing Failures
- Symptoms: Documents stuck in "pending" or "processing" state
- Check Celery worker logs:
celery -A app.celery_app worker --loglevel=debug - Verify file processor API keys: Unstructured, LlamaCloud, or local Docling
- Check storage permissions: Ensure write access to document storage directory
4. Connector Sync Issues
# Check connector status in database
SELECT * FROM search_source_connectors WHERE last_indexed IS NULL;
# Manual sync trigger (example for Google Drive)
from app.tasks.connector_indexers.google_drive_indexer import index_google_drive_files
# Call with appropriate parameters
5. LLM Integration Problems
- API Key Validation: Test keys directly with provider
- Rate Limiting: Implement exponential backoff in
app/services/llm_service.py - Model Availability: Check if specified model exists in your provider account
6. Memory Issues with Local LLMs
# Monitor resource usage
docker stats surfsense # or htop/glances
# Reduce vLLM memory usage
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_CPU_KVCACHE_SPACE=2 # GB
# Use smaller models for embeddings
EMBEDDING_MODEL_NAME="BAAI/bge-small-en-v1.5"
7. WebSocket/Realtime Chat Issues
- Check CORS configuration: Ensure WebSocket origins are allowed
- Verify Redis Pub/Sub: Used for realtime collaboration
- Browser console errors: Check for WebSocket connection failures
Logging and Debugging
# View application logs
docker logs surfsense -f
# Check specific service logs
docker logs surfsense --tail 100 | grep -i "error\|exception"
# Enable debug logging
export LOG_LEVEL=DEBUG
export PYTHONASYNCIODEBUG=1
# Database query logging
export SQLALCHEMY_ECHO=1
Performance Optimization
- Database Indexing: Ensure proper indexes on frequently queried columns
- Redis Caching: Implement cache for expensive operations
- Connection Pooling: Configure SQLAlchemy and Redis connection pools
- CDN for Static Assets: Use Cloudflare or similar for frontend assets
- Batch Processing: Use Celery for heavy operations like document indexing
Getting Help
- Discord Community: https://discord.gg/ejRNvftDp9
- GitHub Issues: Report bugs and feature requests
- Documentation: https://www.surfsense.com/docs/
- Reddit Community: r/SurfSense
Health Checks
# API health endpoint
curl http://localhost:8000/api/health
# Database health
curl http://localhost:8000/api/health/db
# Redis health
curl http://localhost:8000/api/health/redis
# Celery health
celery -A app.celery_app inspect ping