Cookbook

Respond to an incident

Walk a drift alert from the moment it fires through to a signed post-mortem.

This recipe takes a real signal — a drift alert from the model registry — and walks it through the full incident lifecycle. It's a template you can adapt for complaints, regulator queries, or any other inbound signal.

Scenario

Your nightly drift job pulls GET /v1/models/drift and finds a new alert-severity signal: gpt-4-2024-11-20's rejection rate has jumped from 6% (baseline, last 90 days) to 17% (current, last 30 days), a delta of +11pp. Sample size is 142.

Step 1 — Open an incident

Incidents are categorical (MODEL_FAILURE, BAD_ADVICE, DATA_ISSUE, SLA_BREACH, SECURITY, OTHER) and severity-graded (LOW, MEDIUM, HIGH, CRITICAL). Logging an incident anchors a snapshot on the immutable ledger as an INCIDENT_LOGGED event.

bash
curl -X POST https://api.bedrockcompliance.co.uk/v1/incidents \
  -H "X-Bedrock-Key: bk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "title": "gpt-4-2024-11-20 rejection rate +11pp",
    "description": "Drift alert: rejection rate jumped from 6% (90d baseline) to 17% (30d) on 142 samples.",
    "severity": "HIGH",
    "category": "MODEL_FAILURE",
    "affectedReviewJobIds": []
  }'

Step 2 — Triage

  • Identify the affected model: (provider, version)
  • Pull the timeline: GET /v1/models/openai/gpt-4-2024-11-20/timeline
  • Identify the affected jobs: every job in the current window with that model
  • Identify the customers: every clientReference on those jobs

Step 3 — Investigate

Look for a cause:

  • Did the provider push a new minor version?
  • Did your prompt change?
  • Did the input distribution change (new product, new customer segment)?
  • Are the rejections concentrated in one product or one adviser?

Step 4 — Remediate

If the model is the cause:

  1. Pin advisers to the previous version in your back-office.
  2. File an impact assessment for the new version.
  3. Run a back-test on the previous month's rejected cases against the old version.
  4. Reinstate the new version only after the impact assessment is signed off.

Track each step on the incident as a remediation action — they appear in the audit trail alongside the original incident snapshot:

bash
curl -X POST https://api.bedrockcompliance.co.uk/v1/incidents/$INCIDENT_ID/remediations \
  -H "X-Bedrock-Key: bk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "description": "Pin advisers to gpt-4-2024-09-15 in back-office",
    "ownerName": "Jane Smith",
    "dueAt": "2026-04-12T17:00:00.000Z"
  }'

Step 5 — Resolve

Updating the incident's status to RESOLVED (or CLOSED) anchors a second snapshot on the ledger as an INCIDENT_RESOLVED event. The root cause is recorded verbatim and the full incident history (creation, every status change, every remediation) becomes immutable. The post-mortem is signed and stored as a certificate, addressable forever via verify.bedrockcompliance.co.uk/c/....

bash
curl -X POST https://api.bedrockcompliance.co.uk/v1/incidents/$INCIDENT_ID \
  -H "X-Bedrock-Key: bk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "status": "RESOLVED",
    "rootCause": "Provider pushed an undocumented minor with stricter risk-scoring. Pinned advisers to the previous build pending impact-assessment sign-off."
  }'

See also

Bedrock AIAsk me anything about Bedrock

Hi! I'm Bedrock's AI assistant. I can answer questions about the product, pricing, compliance coverage, and integrations. What would you like to know?