Model Catalog
Browse and deploy inference-ready models from the Podstack model catalog.
Browsing Models
Navigate to Inference > Catalog to explore available models.
Model Information
Each model listing shows:
- Name: Model identifier (e.g.,
meta-llama/Llama-3.1-8B-Instruct) - Description: Model capabilities and use cases
- Parameters: Model size
- Pricing: Cost per million tokens (input/output)
- Status: Available, deploying, or disabled
- GPU Requirements: Required GPU type and memory
Filtering Models
Filter the catalog by:
- Task Type: Chat, embeddings, audio, code generation
- Model Size: Small, medium, large
- Status: Available, all
- Search: Filter by model name or keyword
Model Details
Click on a model to view:
- Full description and capabilities
- Pricing breakdown (input/output tokens)
- API endpoint information
- Status and health check
- Example API calls
Requesting a Model
If a model you need isn’t in the catalog:
- Click Request Model
- Enter the Hugging Face model ID or name
- Submit the request
- The team will evaluate and potentially add it
Model Status
| Status | Description |
|---|---|
| Available | Ready for inference |
| Deploying | Being provisioned |
| Disabled | Temporarily unavailable |
| Keep Warm | Always ready with minimal latency |
Cold Starts
Some models may have cold starts if they’re not kept warm:
- First request may take longer (30s-2min depending on model size)
- Subsequent requests are fast
- Frequently used models are kept warm automatically
Next Steps
- Generate API Keys for authentication
- Test in the Playground