Cookbook
Respond to an incident
Walk a drift alert from the moment it fires through to a signed post-mortem.
This recipe takes a real signal — a drift alert from the model registry — and walks it through the full incident lifecycle. It's a template you can adapt for complaints, regulator queries, or any other inbound signal.
Scenario
Your nightly drift job pulls GET /v1/models/drift and finds a new alert-severity signal: gpt-4-2024-11-20's rejection rate has jumped from 6% (baseline, last 90 days) to 17% (current, last 30 days), a delta of +11pp. Sample size is 142.
Step 1 — Open an incident
Incidents are categorical (MODEL_FAILURE, BAD_ADVICE, DATA_ISSUE, SLA_BREACH, SECURITY, OTHER) and severity-graded (LOW, MEDIUM, HIGH, CRITICAL). Logging an incident anchors a snapshot on the immutable ledger as an INCIDENT_LOGGED event.
curl -X POST https://api.bedrockcompliance.co.uk/v1/incidents \
-H "X-Bedrock-Key: bk_live_..." \
-H "Content-Type: application/json" \
-d '{
"title": "gpt-4-2024-11-20 rejection rate +11pp",
"description": "Drift alert: rejection rate jumped from 6% (90d baseline) to 17% (30d) on 142 samples.",
"severity": "HIGH",
"category": "MODEL_FAILURE",
"affectedReviewJobIds": []
}'Step 2 — Triage
- Identify the affected model:
(provider, version) - Pull the timeline:
GET /v1/models/openai/gpt-4-2024-11-20/timeline - Identify the affected jobs: every job in the current window with that model
- Identify the customers: every
clientReferenceon those jobs
Step 3 — Investigate
Look for a cause:
- Did the provider push a new minor version?
- Did your prompt change?
- Did the input distribution change (new product, new customer segment)?
- Are the rejections concentrated in one product or one adviser?
Step 4 — Remediate
If the model is the cause:
- Pin advisers to the previous version in your back-office.
- File an impact assessment for the new version.
- Run a back-test on the previous month's rejected cases against the old version.
- Reinstate the new version only after the impact assessment is signed off.
Track each step on the incident as a remediation action — they appear in the audit trail alongside the original incident snapshot:
curl -X POST https://api.bedrockcompliance.co.uk/v1/incidents/$INCIDENT_ID/remediations \
-H "X-Bedrock-Key: bk_live_..." \
-H "Content-Type: application/json" \
-d '{
"description": "Pin advisers to gpt-4-2024-09-15 in back-office",
"ownerName": "Jane Smith",
"dueAt": "2026-04-12T17:00:00.000Z"
}'Step 5 — Resolve
Updating the incident's status to RESOLVED (or CLOSED) anchors a second snapshot on the ledger as an INCIDENT_RESOLVED event. The root cause is recorded verbatim and the full incident history (creation, every status change, every remediation) becomes immutable. The post-mortem is signed and stored as a certificate, addressable forever via verify.bedrockcompliance.co.uk/c/....
curl -X POST https://api.bedrockcompliance.co.uk/v1/incidents/$INCIDENT_ID \
-H "X-Bedrock-Key: bk_live_..." \
-H "Content-Type: application/json" \
-d '{
"status": "RESOLVED",
"rootCause": "Provider pushed an undocumented minor with stricter risk-scoring. Pinned advisers to the previous build pending impact-assessment sign-off."
}'