TorchVision Deployment & Usage Guide
1. Prerequisites
System Requirements
- Python: 3.10–3.14 (for torchvision 0.25+ / torch 2.10+)
- PyTorch: Compatible version required (see compatibility matrix below)
- CUDA Toolkit: 11.8 or 12.1+ (optional, for GPU acceleration)
- System Libraries:
libjpeg,libpng(for image I/O operations)
Version Compatibility Matrix
| PyTorch | TorchVision | Python |
|---|---|---|
| 2.10 | 0.25 | 3.10–3.14 |
| 2.9 | 0.24 | 3.10–3.14 |
| 2.8 | 0.23 | 3.9–3.13 |
| 2.7 | 0.22 | 3.9–3.13 |
| 2.6 | 0.21 | 3.9–3.12 |
2. Installation
Standard Installation (Recommended)
Install via pip with CUDA 12.1 support:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
For CPU-only installation:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
For CUDA 11.8:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Conda Installation
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
Development Installation (From Source)
# Clone repository
git clone https://github.com/pytorch/vision.git
cd vision
# Install in editable mode with strict compliance
pip install -e . --config-settings editable_mode=strict
# Or using setuptools
python setup.py develop
Image Backend Optimization
For significantly faster image processing, install Pillow-SIMD (drop-in replacement):
pip uninstall pillow
pip install pillow-simd
3. Configuration
Environment Variables
# Cache directory for datasets and pre-trained models
export TORCH_HOME=/path/to/cache
# Specific torchvision cache (overrides TORCH_HOME)
export TORCHVISION_HOME=/path/to/torchvision_cache
# Image backend selection (PIL or tensor)
export TORCHVISION_BACKEND=PIL
# CUDA device selection
export CUDA_VISIBLE_DEVICES=0,1
# Disable progress bars for datasets
export TORCHVISION_DISABLE_PROGRESS_BAR=1
Runtime Configuration
import torchvision
torchvision.set_image_backend('PIL') # or 'accimage' if installed
torchvision.disable_progress_bar() # Disable download progress bars
4. Build & Run
Using Pre-trained Models
Image Classification (ResNet)
from torchvision.models import resnet50, ResNet50_Weights
from PIL import Image
import torch
# Load model with pre-trained weights
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
# Preprocess image
preprocess = weights.transforms()
img = Image.open("image.jpg")
batch = preprocess(img).unsqueeze(0)
# Inference
with torch.no_grad():
prediction = model(batch).squeeze(0)
probabilities = torch.nn.functional.softmax(prediction, dim=0)
EfficientNet
from torchvision.models import efficientnet_b0, EfficientNet_B0_Weights
weights = EfficientNet_B0_Weights.DEFAULT
model = efficientnet_b0(weights=weights)
preprocess = weights.transforms()
Swin Transformer
from torchvision.models import swin_t, Swin_T_Weights
weights = Swin_T_Weights.DEFAULT
model = swin_t(weights=weights)
Optical Flow (RAFT)
from torchvision.models.optical_flow import raft_large, Raft_Large_Weights
from torchvision.io import read_video
import torch
# Load RAFT model
weights = Raft_Large_Weights.DEFAULT
model = raft_large(weights=weights)
model.eval()
# Load video frames
frames, _, _ = read_video("video.mp4", pts_unit="sec")
img1 = frames[0].permute(2, 0, 1).unsqueeze(0).float() / 255.0
img2 = frames[1].permute(2, 0, 1).unsqueeze(0).float() / 255.0
# Predict optical flow
with torch.no_grad():
flow = model(img1, img2)[-1] # Final iteration output
Working with Datasets
Standard Dataset Usage
from torchvision import datasets, transforms
# Define transforms
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
# Load dataset
trainset = datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
Stereo Matching Datasets
from torchvision.datasets import KITTI2015Stereo
dataset = KITTI2015Stereo(root='./data', split='train')
left_img, right_img, disparity, valid_mask = dataset[0]
Custom Transforms Pipeline
from torchvision.transforms import v2
transforms = v2.Compose([
v2.RandomResizedCrop(224),
v2.RandomHorizontalFlip(p=0.5),
v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
v2.ToImage(),
v2.ToDtype(torch.float32, scale=True),
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
5. Deployment
Docker Deployment
Create a Dockerfile for production inference:
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
WORKDIR /app
# Install torchvision
RUN pip install torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121
# Copy application code
COPY inference.py .
COPY model_weights ./weights
# Set environment variables
ENV TORCH_HOME=/app/cache
ENV PYTHONUNBUFFERED=1
EXPOSE 8080
CMD ["python", "inference.py"]
Build and run:
docker build -t torchvision-app .
docker run --gpus all -p 8080:8080 torchvision-app
Cloud Deployment Options
AWS SageMaker
Deploy as a SageMaker endpoint:
import sagemaker
from sagemaker.pytorch import PyTorchModel
model = PyTorchModel(
model_data='s3://bucket/model.tar.gz',
role=role,
framework_version='2.0',
py_version='py310',
entry_point='inference.py'
)
predictor = model.deploy(instance_type='ml.g4dn.xlarge', initial_instance_count=1)
Google Cloud Vertex AI
Use pre-built PyTorch containers or custom containers with torchvision pre-installed.
Azure Machine Learning
from azure.ai.ml.entities import Environment
env = Environment(
name="torchvision-env",
image="mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04:latest",
conda_file="conda_dependencies.yml" # includes torchvision
)
Model Serving with TorchServe
Create a model.py handler:
from torchvision.models import resnet50, ResNet50_Weights
from ts.torch_handler.base_handler import BaseHandler
class ModelHandler(BaseHandler):
def initialize(self, context):
self.model = resnet50(weights=ResNet50_Weights.DEFAULT)
self.model.eval()
self.preprocess = ResNet50_Weights.DEFAULT.transforms()
def preprocess(self, data):
# Implement image decoding and preprocessing
pass
def inference(self, inputs):
with torch.no_grad():
return self.model(inputs)
Package and serve:
torch-model-archiver --model-name resnet50 --version 1.0 --model-file model.py --handler handler.py
torchserve --start --model-store model_store --models resnet50=resnet50.mar
ONNX Export for Cross-Platform Deployment
import torch
from torchvision.models import resnet50, ResNet50_Weights
model = resnet50(weights=ResNet50_Weights.DEFAULT)
model.eval()
dummy_input = torch.randn(1, 3, 224, 224)
# Export to ONNX
torch.onnx.export(
model,
dummy_input,
"resnet50.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}
)
6. Troubleshooting
Installation Issues
CUDA Version Mismatch
# Error: CUDA error: no kernel image is available
# Solution: Install torchvision matching your CUDA version
pip install torchvision --index-url https://download.pytorch.org/whl/cu118 # For CUDA 11.8
Pillow vs Pillow-SIMD Conflicts
# Error: ImportError: cannot import name '_imaging' from 'PIL'
# Solution: Clean reinstall
pip uninstall pillow pillow-simd -y
pip install pillow
# Or for SIMD optimization:
pip install pillow-simd
Python Version Compatibility
# Error: SyntaxError with type hints (e.g., tuple[...])
# Solution: Use Python 3.10+ for torchvision 0.25+
# For older Python, use torchvision 0.21 (Python 3.9+) or earlier
Runtime Issues
Dataset Download Failures
# Issue: Permission denied or SSL errors during download
# Solution: Set environment variables
import os
os.environ['TORCH_HOME'] = '/path/with/write/permission'
# Or disable SSL verification (not recommended for production)
os.environ['CURL_CA_BUNDLE'] = ''
Memory Issues with Large Models
# Issue: CUDA out of memory when loading EfficientNet-B7 or Swin Transformer
# Solution: Use torch.no_grad() and half precision
with torch.no_grad():
with torch.cuda.amp.autocast():
output = model(input)
Model Loading Errors
# Issue: Weights enum not found or deprecated
# Solution: Use new weights API
from torchvision.models import resnet50, ResNet50_Weights
# Instead of pretrained=True
model = resnet50(weights=ResNet50_Weights.DEFAULT)
Development Issues
Building from Source Failures
# Issue: error: command 'gcc' failed or missing headers
# Solution: Install build dependencies
# Ubuntu/Debian:
sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
# macOS:
brew install jpeg libpng libtiff
# Issue: CUDA extensions fail to build
# Solution: Ensure nvcc is in PATH and matches PyTorch CUDA version
export PATH=/usr/local/cuda-12.1/bin:$PATH
export CUDA_HOME=/usr/local/cuda-12.1
Import Errors in Editable Install
# Issue: ModuleNotFoundError after pip install -e .
# Solution: Use strict editable mode or install build deps first
pip install -e . --config-settings editable_mode=strict
# Or:
python setup.py develop
Performance Optimization
Slow Data Loading
# Solution: Increase num_workers and enable pin_memory
from torch.utils.data import DataLoader
loader = DataLoader(
dataset,
batch_size=32,
num_workers=4, # Adjust based on CPU cores
pin_memory=True, # For GPU training
persistent_workers=True # Avoid worker spawn overhead
)
Image Backend Optimization
# Use Pillow-SIMD for 10-20x speedup in image transforms
# Set before importing torchvision
import os
os.environ['TORCHVISION_BACKEND'] = 'PIL'
# Ensure PIL is Pillow-SIMD variant