Table of contents

tensorrt — NVIDIA TensorRT

NVIDIA’s TensorRT optimizer and runtime for high-performance deep-learning inference. Convert PyTorch / ONNX models into optimized engines.

Image tag

docker.io/manvarharsh/tensorrt:cuda12

What’s in this image

  • Base: nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
  • Python 3.10 (conda)
  • TensorRT (CUDA 12 build)
  • PyTorch, ONNX, onnx-graphsurgeon, polygraphy
  • TensorRT-LLM (optional, when present)
  • JupyterHub with Podstack authenticator
  • OpenSSH server

Default ports

PortService
22SSH
8000JupyterHub

Use cases

  • Converting trained PyTorch / ONNX models to TRT engines
  • Benchmarking inference latency / throughput
  • Compiling TRT-LLM engines for large-model serving
  • Mixed-precision (FP16 / INT8 / FP8) optimization

Environment variables

VariableDescription
ENABLE_SSHEnable SSH server
ENABLE_JUPYTERHUBEnable JupyterHub on port 8000
PODSTACK_API_URLBackend URL for JupyterHub token validation
SSH_PUBLIC_KEYPublic key for SSH

Persistence

Mount at /data. Compiled engines (.plan / .engine) under /data/engines/, source models under /data/models/.

See also