Largestack AI — Incident Response Plan

Severity Levels

Level	Definition	Response Time	Examples
P0 Critical	Data breach, system compromise	15 min	API key leak, unauthorized data access
P1 High	Service outage, security vulnerability	1 hour	All providers down, guardrail bypass
P2 Medium	Degraded performance, partial outage	4 hours	Single provider failure, high error rate
P3 Low	Minor issue, no user impact	24 hours	Dashboard bug, non-critical log error

Response Procedures

P0 — Critical

Activate kill switch — largestack._guard.kill_switch.activate("incident")
Assess scope — check audit trail: audit.query(event_type="agent.error")
Contain — revoke compromised credentials, rotate keys
Notify — affected customers within 72 hours (DPDP Act requirement)
Remediate — deploy fix via canary deployment
Post-mortem — document root cause, timeline, corrective actions

P1 — High

Check circuit breakers — largestack dashboard
Failover — circuit breaker auto-routes to healthy providers
Investigate — review traces and anomaly detection alerts
Fix — deploy via canary with monitoring

P2/P3 — Medium/Low

Log — create issue tracker entry
Fix — standard development cycle
Deploy — via CI/CD with quality gates

Communication Templates

Data Breach Notification (DPDP Act §8)

Subject: Security Incident Notification — [Date]

Dear [Customer],

We are writing to inform you of a data security incident affecting 
your account on the Largestack AI platform.

What happened: [Description]
When: [Date/time discovered]  
What data was affected: [Specific data types]
What we're doing: [Remediation steps]
What you should do: [Customer actions]

Contact: [email protected]

Recovery Procedures

Event replay: EventStore.reconstruct_state() — rebuild from event log
Saga rollback: Automatic compensation via SagaOrchestrator
Checkpoint resume: largestack resume — restart from last checkpoint