Table of contents

unsloth — fast LLM fine-tuning

Unsloth fine-tunes Llama / Mistral / Qwen / Gemma models 2–5× faster with up to 70% less VRAM vs. vanilla Hugging Face — using hand-written Triton kernels.

Image tag

docker.io/manvarharsh/unsloth:cuda12

What’s in this image

  • Base: nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
  • Python 3.10 (conda)
  • PyTorch with CUDA 12
  • Unsloth, transformers, peft, trl, bitsandbytes
  • xformers, Flash Attention
  • JupyterHub with Podstack authenticator
  • OpenSSH server

Default ports

PortService
22SSH
8000JupyterHub

Use cases

  • LoRA / QLoRA fine-tunes on consumer GPUs (T4, 3090, 4090)
  • Fast fine-tuning of Llama 3.x, Mistral, Qwen, Gemma
  • DPO / ORPO preference tuning with Unsloth’s kernels
  • Notebook-driven experimentation with the Unsloth examples

Environment variables

VariableDescription
ENABLE_SSHEnable SSH server
ENABLE_JUPYTERHUBEnable JupyterHub on port 8000
PODSTACK_API_URLBackend URL for JupyterHub token validation
SSH_PUBLIC_KEYPublic key for SSH

Persistence

Mount at /data. Datasets under /data/datasets/, fine-tune outputs under /data/output/.

See also