← Back to pytorch/vision

How to Deploy & Use pytorch/vision

TorchVision Deployment & Usage Guide

1. Prerequisites

System Requirements

  • Python: 3.10–3.14 (for torchvision 0.25+ / torch 2.10+)
  • PyTorch: Compatible version required (see compatibility matrix below)
  • CUDA Toolkit: 11.8 or 12.1+ (optional, for GPU acceleration)
  • System Libraries: libjpeg, libpng (for image I/O operations)

Version Compatibility Matrix

PyTorchTorchVisionPython
2.100.253.10–3.14
2.90.243.10–3.14
2.80.233.9–3.13
2.70.223.9–3.13
2.60.213.9–3.12

2. Installation

Standard Installation (Recommended)

Install via pip with CUDA 12.1 support:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

For CPU-only installation:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

For CUDA 11.8:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Conda Installation

conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia

Development Installation (From Source)

# Clone repository
git clone https://github.com/pytorch/vision.git
cd vision

# Install in editable mode with strict compliance
pip install -e . --config-settings editable_mode=strict

# Or using setuptools
python setup.py develop

Image Backend Optimization

For significantly faster image processing, install Pillow-SIMD (drop-in replacement):

pip uninstall pillow
pip install pillow-simd

3. Configuration

Environment Variables

# Cache directory for datasets and pre-trained models
export TORCH_HOME=/path/to/cache

# Specific torchvision cache (overrides TORCH_HOME)
export TORCHVISION_HOME=/path/to/torchvision_cache

# Image backend selection (PIL or tensor)
export TORCHVISION_BACKEND=PIL

# CUDA device selection
export CUDA_VISIBLE_DEVICES=0,1

# Disable progress bars for datasets
export TORCHVISION_DISABLE_PROGRESS_BAR=1

Runtime Configuration

import torchvision
torchvision.set_image_backend('PIL')  # or 'accimage' if installed
torchvision.disable_progress_bar()   # Disable download progress bars

4. Build & Run

Using Pre-trained Models

Image Classification (ResNet)

from torchvision.models import resnet50, ResNet50_Weights
from PIL import Image
import torch

# Load model with pre-trained weights
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()

# Preprocess image
preprocess = weights.transforms()
img = Image.open("image.jpg")
batch = preprocess(img).unsqueeze(0)

# Inference
with torch.no_grad():
    prediction = model(batch).squeeze(0)
    probabilities = torch.nn.functional.softmax(prediction, dim=0)

EfficientNet

from torchvision.models import efficientnet_b0, EfficientNet_B0_Weights

weights = EfficientNet_B0_Weights.DEFAULT
model = efficientnet_b0(weights=weights)
preprocess = weights.transforms()

Swin Transformer

from torchvision.models import swin_t, Swin_T_Weights

weights = Swin_T_Weights.DEFAULT
model = swin_t(weights=weights)

Optical Flow (RAFT)

from torchvision.models.optical_flow import raft_large, Raft_Large_Weights
from torchvision.io import read_video
import torch

# Load RAFT model
weights = Raft_Large_Weights.DEFAULT
model = raft_large(weights=weights)
model.eval()

# Load video frames
frames, _, _ = read_video("video.mp4", pts_unit="sec")
img1 = frames[0].permute(2, 0, 1).unsqueeze(0).float() / 255.0
img2 = frames[1].permute(2, 0, 1).unsqueeze(0).float() / 255.0

# Predict optical flow
with torch.no_grad():
    flow = model(img1, img2)[-1]  # Final iteration output

Working with Datasets

Standard Dataset Usage

from torchvision import datasets, transforms

# Define transforms
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                         std=[0.229, 0.224, 0.225])
])

# Load dataset
trainset = datasets.CIFAR10(root='./data', train=True, 
                            download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

Stereo Matching Datasets

from torchvision.datasets import KITTI2015Stereo

dataset = KITTI2015Stereo(root='./data', split='train')
left_img, right_img, disparity, valid_mask = dataset[0]

Custom Transforms Pipeline

from torchvision.transforms import v2

transforms = v2.Compose([
    v2.RandomResizedCrop(224),
    v2.RandomHorizontalFlip(p=0.5),
    v2.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

5. Deployment

Docker Deployment

Create a Dockerfile for production inference:

FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime

WORKDIR /app

# Install torchvision
RUN pip install torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121

# Copy application code
COPY inference.py .
COPY model_weights ./weights

# Set environment variables
ENV TORCH_HOME=/app/cache
ENV PYTHONUNBUFFERED=1

EXPOSE 8080

CMD ["python", "inference.py"]

Build and run:

docker build -t torchvision-app .
docker run --gpus all -p 8080:8080 torchvision-app

Cloud Deployment Options

AWS SageMaker

Deploy as a SageMaker endpoint:

import sagemaker
from sagemaker.pytorch import PyTorchModel

model = PyTorchModel(
    model_data='s3://bucket/model.tar.gz',
    role=role,
    framework_version='2.0',
    py_version='py310',
    entry_point='inference.py'
)

predictor = model.deploy(instance_type='ml.g4dn.xlarge', initial_instance_count=1)

Google Cloud Vertex AI

Use pre-built PyTorch containers or custom containers with torchvision pre-installed.

Azure Machine Learning

from azure.ai.ml.entities import Environment

env = Environment(
    name="torchvision-env",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04:latest",
    conda_file="conda_dependencies.yml"  # includes torchvision
)

Model Serving with TorchServe

Create a model.py handler:

from torchvision.models import resnet50, ResNet50_Weights
from ts.torch_handler.base_handler import BaseHandler

class ModelHandler(BaseHandler):
    def initialize(self, context):
        self.model = resnet50(weights=ResNet50_Weights.DEFAULT)
        self.model.eval()
        self.preprocess = ResNet50_Weights.DEFAULT.transforms()
        
    def preprocess(self, data):
        # Implement image decoding and preprocessing
        pass
        
    def inference(self, inputs):
        with torch.no_grad():
            return self.model(inputs)

Package and serve:

torch-model-archiver --model-name resnet50 --version 1.0 --model-file model.py --handler handler.py
torchserve --start --model-store model_store --models resnet50=resnet50.mar

ONNX Export for Cross-Platform Deployment

import torch
from torchvision.models import resnet50, ResNet50_Weights

model = resnet50(weights=ResNet50_Weights.DEFAULT)
model.eval()

dummy_input = torch.randn(1, 3, 224, 224)

# Export to ONNX
torch.onnx.export(
    model, 
    dummy_input, 
    "resnet50.onnx",
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}
)

6. Troubleshooting

Installation Issues

CUDA Version Mismatch

# Error: CUDA error: no kernel image is available
# Solution: Install torchvision matching your CUDA version
pip install torchvision --index-url https://download.pytorch.org/whl/cu118  # For CUDA 11.8

Pillow vs Pillow-SIMD Conflicts

# Error: ImportError: cannot import name '_imaging' from 'PIL'
# Solution: Clean reinstall
pip uninstall pillow pillow-simd -y
pip install pillow
# Or for SIMD optimization:
pip install pillow-simd

Python Version Compatibility

# Error: SyntaxError with type hints (e.g., tuple[...])
# Solution: Use Python 3.10+ for torchvision 0.25+
# For older Python, use torchvision 0.21 (Python 3.9+) or earlier

Runtime Issues

Dataset Download Failures

# Issue: Permission denied or SSL errors during download
# Solution: Set environment variables
import os
os.environ['TORCH_HOME'] = '/path/with/write/permission'
# Or disable SSL verification (not recommended for production)
os.environ['CURL_CA_BUNDLE'] = ''

Memory Issues with Large Models

# Issue: CUDA out of memory when loading EfficientNet-B7 or Swin Transformer
# Solution: Use torch.no_grad() and half precision
with torch.no_grad():
    with torch.cuda.amp.autocast():
        output = model(input)

Model Loading Errors

# Issue: Weights enum not found or deprecated
# Solution: Use new weights API
from torchvision.models import resnet50, ResNet50_Weights
# Instead of pretrained=True
model = resnet50(weights=ResNet50_Weights.DEFAULT)

Development Issues

Building from Source Failures

# Issue: error: command 'gcc' failed or missing headers
# Solution: Install build dependencies
# Ubuntu/Debian:
sudo apt-get install libjpeg-dev libpng-dev libtiff-dev
# macOS:
brew install jpeg libpng libtiff

# Issue: CUDA extensions fail to build
# Solution: Ensure nvcc is in PATH and matches PyTorch CUDA version
export PATH=/usr/local/cuda-12.1/bin:$PATH
export CUDA_HOME=/usr/local/cuda-12.1

Import Errors in Editable Install

# Issue: ModuleNotFoundError after pip install -e .
# Solution: Use strict editable mode or install build deps first
pip install -e . --config-settings editable_mode=strict
# Or:
python setup.py develop

Performance Optimization

Slow Data Loading

# Solution: Increase num_workers and enable pin_memory
from torch.utils.data import DataLoader
loader = DataLoader(
    dataset, 
    batch_size=32, 
    num_workers=4,      # Adjust based on CPU cores
    pin_memory=True,    # For GPU training
    persistent_workers=True  # Avoid worker spawn overhead
)

Image Backend Optimization

# Use Pillow-SIMD for 10-20x speedup in image transforms
# Set before importing torchvision
import os
os.environ['TORCHVISION_BACKEND'] = 'PIL'
# Ensure PIL is Pillow-SIMD variant