Table of contents

ollama — local LLM runner

Ollama packages popular open LLMs into easy-to-pull bundles with sensible defaults. One-line model installs and a simple HTTP API.

Image tag

docker.io/manvarharsh/ollama:cuda12

What’s in this image

  • Base: nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
  • Ollama server (CUDA build)
  • Hugging Face Hub CLI for sideloading GGUFs
  • OpenSSH server

Default ports

PortService
22SSH
11434Ollama HTTP API

Use cases

  • Quick local LLM experimentation (ollama run llama3.1)
  • Bring-up of a chat backend for prototypes
  • Running GGUF-quantized models on small GPUs
  • Lightweight inference where vLLM/SGLang is overkill

Environment variables

VariableDescription
ENABLE_SSHEnable SSH server
ENABLE_OLLAMAStart the Ollama server on port 11434
OLLAMA_MODELSOverride model store path (default /data/.ollama/models)
OLLAMA_HOSTBind address (default 0.0.0.0:11434)
SSH_PUBLIC_KEYPublic key for SSH

Quick test

ollama pull llama3.1:8b
curl http://<pod-url>:11434/api/generate \
  -d '{"model":"llama3.1:8b","prompt":"Hello"}'

Persistence

Mount at /data. Ollama stores pulled models under /data/.ollama/models/ so they survive restarts.

See also