TorchVision Deployment & Usage Guide

1. Prerequisites

System Requirements

Python: 3.10–3.14 (for torchvision 0.25+ / torch 2.10+)
PyTorch: Compatible version required (see compatibility matrix below)
CUDA Toolkit: 11.8 or 12.1+ (optional, for GPU acceleration)
System Libraries: libjpeg, libpng (for image I/O operations)

Version Compatibility Matrix

PyTorch	TorchVision	Python
2.10	0.25	3.10–3.14
2.9	0.24	3.10–3.14
2.8	0.23	3.9–3.13
2.7	0.22	3.9–3.13
2.6	0.21	3.9–3.12

2. Installation

Standard Installation (Recommended)

Install via pip with CUDA 12.1 support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

For CPU-only installation:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

For CUDA 11.8:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Conda Installation

conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia

Development Installation (From Source)

# Clone repository
git clone https://github.com/pytorch/vision.git
cd vision

# Install in editable mode with strict compliance
pip install -e . --config-settings editable_mode=strict

# Or using setuptools
python setup.py develop

Image Backend Optimization

For significantly faster image processing, install Pillow-SIMD (drop-in replacement):

pip uninstall pillow
pip install pillow-simd

3. Configuration

Environment Variables

# Cache directory for datasets and pre-trained models
export TORCH_HOME=/path/to/cache

# Specific torchvision cache (overrides TORCH_HOME)
export TORCHVISION_HOME=/path/to/torchvision_cache

# Image backend selection (PIL or tensor)
export TORCHVISION_BACKEND=PIL

# CUDA device selection
export CUDA_VISIBLE_DEVICES=0,1

# Disable progress bars for datasets
export TORCHVISION_DISABLE_PROGRESS_BAR=1

Runtime Configuration

import torchvision
torchvision.set_image_backend('PIL')  # or 'accimage' if installed
torchvision.disable_progress_bar()   # Disable download progress bars

4. Build & Run

Using Pre-trained Models

Image Classification (ResNet)

from torchvision.models import resnet50, ResNet50_Weights
from PIL import Image
import torch

# Load model with pre-trained weights
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()

# Preprocess image
preprocess = weights.transforms()
img = Image.open("image.jpg")
batch = preprocess(img).unsqueeze(0)

# Inference
with torch.no_grad():
    prediction = model(batch).squeeze(0)
    probabilities = torch.nn.functional.softmax(prediction, dim=0)

EfficientNet

from torchvision.models import efficientnet_b0, EfficientNet_B0_Weights

weights = EfficientNet_B0_Weights.DEFAULT
model = efficientnet_b0(weights=weights)
preprocess = weights.transforms()

Swin Transformer

from torchvision.models import swin_t, Swin_T_Weights

weights = Swin_T_Weights.DEFAULT
model = swin_t(weights=weights)

Optical Flow (RAFT)

from torchvision.models.optical_flow import raft_large, Raft_Large_Weights
from torchvision.io import read_video
import torch

# Load RAFT model
weights = Raft_Large_Weights.DEFAULT
model = raft_large(weights=weights)
model.eval()

# Load video frames
frames, _, _ = read_video("video.mp4", pts_unit="sec")
img1 = frames[0].permute(2, 0, 1).unsqueeze(0).float() / 255.0
img2 = frames[1].permute(2, 0, 1).unsqueeze(0).float() / 255.0

# Predict optical flow
with torch.no_grad():
    flow = model(img1, img2)[-1]  # Final iteration output

Working with Datasets

Standard Dataset Usage

from torchvision import datasets, transforms

# Define transforms
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                         std=[0.229, 0.224, 0.225])
])

# Load dataset
trainset = datasets.CIFAR10(root='./data', train=True, 
                            download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

Stereo Matching Datasets

from torchvision.datasets import KITTI2015Stereo

dataset = KITTI2015Stereo(root='./data', split='train')
left_img, right_img, disparity, valid_mask = dataset[0]

Custom Transforms Pipeline

from torchvision.transforms import v2

transforms = v2.Compose([
    v2.RandomResizedCrop(224),
    v2.RandomHorizontalFlip(p=0.5),
    v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

5. Deployment

Docker Deployment

Create a Dockerfile for production inference:

FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

WORKDIR /app

# Install torchvision
RUN pip install torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121

# Copy application code
COPY inference.py .
COPY model_weights ./weights

# Set environment variables
ENV TORCH_HOME=/app/cache
ENV PYTHONUNBUFFERED=1

EXPOSE 8080

CMD ["python", "inference.py"]

Build and run:

docker build -t torchvision-app .
docker run --gpus all -p 8080:8080 torchvision-app

Cloud Deployment Options

AWS SageMaker

Deploy as a SageMaker endpoint:

import sagemaker
from sagemaker.pytorch import PyTorchModel

model = PyTorchModel(
    model_data='s3://bucket/model.tar.gz',
    role=role,
    framework_version='2.0',
    py_version='py310',
    entry_point='inference.py'
)

predictor = model.deploy(instance_type='ml.g4dn.xlarge', initial_instance_count=1)

Google Cloud Vertex AI

Use pre-built PyTorch containers or custom containers with torchvision pre-installed.

Azure Machine Learning

from azure.ai.ml.entities import Environment

env = Environment(
    name="torchvision-env",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04:latest",
    conda_file="conda_dependencies.yml"  # includes torchvision
)

Model Serving with TorchServe

Create a model.py handler:

from torchvision.models import resnet50, ResNet50_Weights
from ts.torch_handler.base_handler import BaseHandler

class ModelHandler(BaseHandler):
    def initialize(self, context):
        self.model = resnet50(weights=ResNet50_Weights.DEFAULT)
        self.model.eval()
        self.preprocess = ResNet50_Weights.DEFAULT.transforms()
        
    def preprocess(self, data):
        # Implement image decoding and preprocessing
        pass
        
    def inference(self, inputs):
        with torch.no_grad():
            return self.model(inputs)

Package and serve:

torch-model-archiver --model-name resnet50 --version 1.0 --model-file model.py --handler handler.py
torchserve --start --model-store model_store --models resnet50=resnet50.mar

ONNX Export for Cross-Platform Deployment

import torch
from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.DEFAULT)
model.eval()

dummy_input = torch.randn(1, 3, 224, 224)

# Export to ONNX
torch.onnx.export(
    model, 
    dummy_input, 
    "resnet50.onnx",
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}
)

6. Troubleshooting

Installation Issues

CUDA Version Mismatch

# Error: CUDA error: no kernel image is available
# Solution: Install torchvision matching your CUDA version
pip install torchvision --index-url https://download.pytorch.org/whl/cu118  # For CUDA 11.8

Pillow vs Pillow-SIMD Conflicts

# Error: ImportError: cannot import name '_imaging' from 'PIL'
# Solution: Clean reinstall
pip uninstall pillow pillow-simd -y
pip install pillow
# Or for SIMD optimization:
pip install pillow-simd

Python Version Compatibility

# Error: SyntaxError with type hints (e.g., tuple[...])
# Solution: Use Python 3.10+ for torchvision 0.25+
# For older Python, use torchvision 0.21 (Python 3.9+) or earlier

Runtime Issues

Dataset Download Failures

# Issue: Permission denied or SSL errors during download
# Solution: Set environment variables
import os
os.environ['TORCH_HOME'] = '/path/with/write/permission'
# Or disable SSL verification (not recommended for production)
os.environ['CURL_CA_BUNDLE'] = ''

Memory Issues with Large Models

# Issue: CUDA out of memory when loading EfficientNet-B7 or Swin Transformer
# Solution: Use torch.no_grad() and half precision
with torch.no_grad():
    with torch.cuda.amp.autocast():
        output = model(input)

Model Loading Errors

# Issue: Weights enum not found or deprecated
# Solution: Use new weights API
from torchvision.models import resnet50, ResNet50_Weights
# Instead of pretrained=True
model = resnet50(weights=ResNet50_Weights.DEFAULT)

Development Issues

Building from Source Failures

# Issue: error: command 'gcc' failed or missing headers
# Solution: Install build dependencies
# Ubuntu/Debian:
sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
# macOS:
brew install jpeg libpng libtiff

# Issue: CUDA extensions fail to build
# Solution: Ensure nvcc is in PATH and matches PyTorch CUDA version
export PATH=/usr/local/cuda-12.1/bin:$PATH
export CUDA_HOME=/usr/local/cuda-12.1

Import Errors in Editable Install

# Issue: ModuleNotFoundError after pip install -e .
# Solution: Use strict editable mode or install build deps first
pip install -e . --config-settings editable_mode=strict
# Or:
python setup.py develop

Performance Optimization

Slow Data Loading

# Solution: Increase num_workers and enable pin_memory
from torch.utils.data import DataLoader
loader = DataLoader(
    dataset, 
    batch_size=32, 
    num_workers=4,      # Adjust based on CPU cores
    pin_memory=True,    # For GPU training
    persistent_workers=True  # Avoid worker spawn overhead
)

Image Backend Optimization

# Use Pillow-SIMD for 10-20x speedup in image transforms
# Set before importing torchvision
import os
os.environ['TORCHVISION_BACKEND'] = 'PIL'
# Ensure PIL is Pillow-SIMD variant

How to Deploy & Use pytorch/vision