Model Monitoring
Model Monitoring tracks the runtime performance of deployed model versions: request volume, latency percentiles, and error rates. Use it to know whether a model in production is healthy, slow, or breaking.
Find monitoring under MLOps > Monitoring. The feature is gated by REACT_APP_ENABLE_MODEL_MONITORING.
Concepts
- Monitor: a binding between a model version and an inference endpoint, plus a rolling window over which metrics are aggregated.
- Performance Snapshot: aggregated metrics for the active window —
total_requests,error_rate,p50_latency_ms,p99_latency_ms, etc. - Health: derived from the latest snapshot:
- Healthy — error rate < 5% and p99 < 500 ms
- Warning — error rate ≥ 5% or p99 ≥ 500 ms
- Critical — error rate ≥ 10% or p99 ≥ 1000 ms
- Alert: a rule that fires when a metric crosses a threshold for sustained periods.
Creating a Monitor
- Go to MLOps > Monitoring → Create Monitor.
- Pick:
- Model and Model Version (from the Model Registry)
- Name
- Endpoint URL the model is served from
- Window size:
5m,15m,1h,6h, or1d— controls how much history each snapshot covers
- Save. The monitor starts collecting metrics on the next request.
Dashboard
The monitoring dashboard lists all monitors with a status badge (Active / Paused / Archived) and a health dot derived from the latest snapshot. Click a monitor to see:
- Time series of request volume and latency
- Distribution of p50 / p95 / p99 latency
- Error rate over time
- Recent request samples (where available)
Alerts
Each monitor has an Alerts tab where you can configure threshold-based alerts:
- Metric: latency percentile, error rate, or request rate
- Threshold and comparison (above / below)
- Sustained duration — alert only if the threshold is crossed for at least N minutes
Alerts surface in the notifications panel and can be emailed (configure in Settings > Notifications).
States
- Active — collecting metrics
- Paused — frozen, no new snapshots
- Archived — historical only, not displayed by default
Use Cases
- Detect a regression after promoting a new model version
- Catch sudden latency spikes during traffic bursts
- Page on-call when error rate exceeds budget
- Sanity-check a canary deployment before a full rollout
Next Steps
- Drift Detection — catch input distribution shift before it shows up as errors
- Pipelines — wire monitoring into automatic retraining
- Model Registry — version models that monitors point at