Apache Superset Deployment & Usage Guide
Prerequisites
Runtime Requirements:
- Python: 3.9, 3.10, or 3.11 (3.12 not yet supported)
- Node.js: 16.x or 18.x LTS (for frontend builds)
- npm: 8.x+ or Yarn 1.22.x
- Database: PostgreSQL 12+, MySQL 5.7+, or SQLite (dev only)
- Redis: 5.0+ (required for caching, Celery, and async queries)
System Dependencies:
- macOS/Linux:
gcc,libffi-dev,libpq-dev(orpostgresql-devel),libsasl2-dev - Windows: WSL2 recommended; Visual C++ 14.0+ build tools if native
Accounts & Services:
- Database credentials with CREATE privileges
- (Optional) S3/GCS/Azure blob storage for chart exports
- (Optional) OAuth/OpenID Connect provider credentials for SSO
Installation
1. Clone the Repository
git clone https://github.com/apache/superset.git
cd superset
git checkout 3.1.0 # Or latest stable tag
2. Backend Setup (Python)
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install Superset with all database drivers
pip install -e ".[postgres,redis,thumbnails]" # Add other extras as needed
# Initialize database
superset db upgrade
# Create admin user
superset fab create-admin
# Load example data (optional)
superset load_examples
# Initialize roles/permissions
superset init
3. Frontend Setup (TypeScript/React)
cd superset-frontend
# Install dependencies (npm or yarn)
npm ci # Recommended for reproducible builds
# OR
yarn install --frozen-lockfile
# Return to root
cd ..
Configuration
1. Create Configuration File
Create superset_config.py in your PYTHONPATH (e.g., repository root):
import os
# Security
SECRET_KEY = os.environ.get('SUPERSET_SECRET_KEY', 'your-secure-random-key-here')
WTF_CSRF_ENABLED = True
# Database
SQLALCHEMY_DATABASE_URI = os.environ.get(
'DATABASE_URL',
'postgresql://user:password@localhost:5432/superset'
)
# Redis (for caching and Celery)
REDIS_HOST = os.environ.get('REDIS_HOST', 'localhost')
REDIS_PORT = int(os.environ.get('REDIS_PORT', 6379))
CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 300,
'CACHE_KEY_PREFIX': 'superset_',
'CACHE_REDIS_HOST': REDIS_HOST,
'CACHE_REDIS_PORT': REDIS_PORT,
}
# Feature Flags (referenced in source: FeatureFlag, isFeatureEnabled)
FEATURE_FLAGS = {
'ENABLE_TEMPLATE_PROCESSING': True,
'DASHBOARD_CROSS_FILTERS': True, # Enables cross-filtering (seen in dashboardState.ts)
'DASHBOARD_NATIVE_FILTERS': True,
'THUMBNAILS': False,
}
# Theme Configuration (optional, references ThemeController.ts)
APP_THEME = 'LIGHT' # or 'DARK' - controls default ThemeMode
# Superset specific settings
SUPERSET_WEBSERVER_PORT = 8088
2. Environment Variables
export FLASK_APP=superset
export PYTHONPATH=/path/to/superset_config/directory:$PYTHONPATH
export SUPERSET_SECRET_KEY=$(openssl rand -base64 42)
3. Database Connection Strings
Configure data sources via UI or superset_config.py:
# Example: Athena connection
SQLALCHEMY_EXAMPLES_URI = 'awsathena+rest://...'
Build & Run
Development Mode (Full Stack)
Terminal 1 - Backend:
source venv/bin/activate
flask run -p 8088 --with-threads --reload --debugger
# OR
superset run -p 8088 --with-threads --reload --debugger
Terminal 2 - Frontend (Hot Reload):
cd superset-frontend
npm run dev-server
Access at http://localhost:9000 (webpack dev server proxies to Flask at 8088)
Production Build
- Build Frontend Assets:
cd superset-frontend
npm run build # Outputs to superset/static/assets
- Production Server:
# Using Gunicorn (recommended)
gunicorn -w 10 -k gevent --timeout 120 -b 0.0.0.0:8088 "superset.app:create_app()"
# Or using Superset CLI
superset run -h 0.0.0.0 -p 8088
Celery Workers (for async queries, thumbnails, alerts):
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
celery --app=superset.tasks.celery_app:app beat # For scheduled jobs
Deployment
Docker (Recommended for Quick Deploy)
# Clone and run official compose
git clone https://github.com/apache/superset.git
cd superset
docker-compose -f docker-compose-non-dev.yml up -d
Kubernetes (Production Scale)
# Using official Helm chart
helm repo add superset https://apache.github.io/superset
helm install superset superset/superset \
--set configOverrides.secret=SECRET_KEY \
--set postgresql.enabled=true \
--set redis.enabled=true
Cloud Platforms:
- AWS: Deploy via ECS Fargate or EKS; use RDS PostgreSQL and ElastiCache Redis
- GCP: Cloud Run (stateless) + Cloud SQL; or GKE with persistent volumes
- Azure: Container Instances or AKS with Azure Database for PostgreSQL
- Heroku: Use
apache-supersetbuildpack (note: ephemeral filesystem limits caching)
Security Checklist for Production:
- Change default SECRET_KEY (minimum 32 bytes)
- Enable HTTPS/TLS termination
- Configure authentication (OAuth, LDAP, or SAML) via Flask-AppBuilder
- Set
SESSION_COOKIE_SECURE = TrueandSESSION_COOKIE_HTTPONLY = True - Restrict database connections (read-only service accounts recommended)
Troubleshooting
Build Issues:
Frontend build fails with "JavaScript heap out of memory"
export NODE_OPTIONS="--max-old-space-size=8192"
npm run build
Module not found errors in superset-frontend
# Clear caches and reinstall
rm -rf node_modules package-lock.json
npm cache clean --force
npm ci
Runtime Issues:
Database migration errors
# Reset (WARNING: destroys data)
superset db downgrade base
superset db upgrade
# Or stamp to specific version
superset db stamp <revision>
"No module named superset"
# Ensure you're in the virtual environment and installed in editable mode
pip install -e .
export PYTHONPATH=$(pwd):$PYTHONPATH
Chart rendering errors (NVD3/Pivot Table plugins)
- Check browser console for missing CSS (source files reference
nv.d3.css) - Verify
supersetThemeObjectis loaded (ThemeController.ts dependency) - Clear localStorage if theme-related crashes occur (
localStorage.clear())
Cross-filtering not working
- Verify
DASHBOARD_CROSS_FILTERSfeature flag is enabled in superset_config.py - Check browser console for
getCrossFiltersConfigurationerrors (referenced in dashboardState.ts)
Celery tasks stuck in "Pending"
- Verify Redis connectivity:
redis-cli ping - Check worker logs:
celery -A superset.tasks.celery_app:app worker -l info - Ensure
RESULT_BACKENDis configured in celeryconfig
Performance Tuning:
- Enable caching: Set
CACHE_CONFIGwith Redis - Configure async queries for large datasets:
GLOBAL_ASYNC_QUERIES = True - Increase
SUPERSET_WEBSERVER_TIMEOUTfor long-running queries - Use a production WSGI server (Gunicorn/gevent) instead of Flask dev server