Inference API Keys

Name: Podstack GPU Cloud
Brand: Podstack
SKU: PODSTACK-GPU-CLOUD
Availability: InStock
Rating: 4.9 (180 reviews)

Manage API keys for authenticating with the Podstack Inference API.

Overview

Inference API keys are separate from your account API tokens. They are specifically for authenticating inference requests (chat completions, embeddings, audio transcription).

Creating an API Key

Navigate to Inference > API Keys
Click Create API Key
Enter a descriptive name (e.g., “Production App”, “Development”)
Click Create
Copy the key immediately - it won’t be shown again

Using API Keys

Authentication Header

Include the key in the Authorization header:

curl -X POST https://cloud.podstack.ai/inference/v1/chat/completions \
  -H "Authorization: Bearer YOUR_INFERENCE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

With OpenAI SDK

import openai

client = openai.OpenAI(
    api_key="YOUR_INFERENCE_KEY",
    base_url="https://cloud.podstack.ai/inference/v1"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

With Python Requests

import requests

headers = {
    "Authorization": "Bearer YOUR_INFERENCE_KEY",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://cloud.podstack.ai/inference/v1/chat/completions",
    headers=headers,
    json={
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "messages": [{"role": "user", "content": "Hello!"}]
    }
)

Managing Keys

Viewing Keys

The API Keys page shows:

Key name
Key prefix (for identification)
Creation date
Usage statistics

Rate Limits

Each API key has rate limits:

Requests per minute
Tokens per minute
Update limits via the key settings (or contact support for higher limits)

Deleting Keys

Find the key in the list
Click Delete
Confirm deletion
Key is immediately invalidated

Usage Tracking

Monitor your inference usage:

Usage Summary

Navigate to Inference > API Keys to view:

Total requests made
Total tokens consumed (input + output)
Breakdown by model

Per-Key Usage

Each key tracks:

Request count
Token usage
Last used timestamp

Security Best Practices

Never expose keys in client-side code - Use a backend proxy
Use environment variables - Don’t hardcode keys
Rotate keys periodically - Delete old keys and create new ones
Use separate keys - Different keys for development and production
Monitor usage - Watch for unexpected spikes

# Store as environment variable
export PODSTACK_INFERENCE_KEY="your_key_here"

import os
api_key = os.environ.get("PODSTACK_INFERENCE_KEY")