Inference API Keys
Manage API keys for authenticating with the Podstack Inference API.
Overview
Inference API keys are separate from your account API tokens. They are specifically for authenticating inference requests (chat completions, embeddings, audio transcription).
Creating an API Key
- Navigate to Inference > API Keys
- Click Create API Key
- Enter a descriptive name (e.g., “Production App”, “Development”)
- Click Create
- Copy the key immediately - it won’t be shown again
Using API Keys
Authentication Header
Include the key in the Authorization header:
curl -X POST https://cloud.podstack.ai/inference/v1/chat/completions \
-H "Authorization: Bearer YOUR_INFERENCE_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'
With OpenAI SDK
import openai
client = openai.OpenAI(
api_key="YOUR_INFERENCE_KEY",
base_url="https://cloud.podstack.ai/inference/v1"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
With Python Requests
import requests
headers = {
"Authorization": "Bearer YOUR_INFERENCE_KEY",
"Content-Type": "application/json"
}
response = requests.post(
"https://cloud.podstack.ai/inference/v1/chat/completions",
headers=headers,
json={
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}
)
Managing Keys
Viewing Keys
The API Keys page shows:
- Key name
- Key prefix (for identification)
- Creation date
- Usage statistics
Rate Limits
Each API key has rate limits:
- Requests per minute
- Tokens per minute
- Update limits via the key settings (or contact support for higher limits)
Deleting Keys
- Find the key in the list
- Click Delete
- Confirm deletion
- Key is immediately invalidated
Usage Tracking
Monitor your inference usage:
Usage Summary
Navigate to Inference > API Keys to view:
- Total requests made
- Total tokens consumed (input + output)
- Breakdown by model
Per-Key Usage
Each key tracks:
- Request count
- Token usage
- Last used timestamp
Security Best Practices
- Never expose keys in client-side code - Use a backend proxy
- Use environment variables - Don’t hardcode keys
- Rotate keys periodically - Delete old keys and create new ones
- Use separate keys - Different keys for development and production
- Monitor usage - Watch for unexpected spikes
# Store as environment variable
export PODSTACK_INFERENCE_KEY="your_key_here"
import os
api_key = os.environ.get("PODSTACK_INFERENCE_KEY")
Troubleshooting
401 Unauthorized
- Verify the key is correct
- Check the key hasn’t been deleted
- Ensure you’re using
Bearerprefix
429 Too Many Requests
- You’ve hit the rate limit
- Implement exponential backoff
- Contact support for higher limits
Key Not Working
- Verify you’re using an inference key, not an account API token
- Check the key was copied correctly (no extra whitespace)