Need dedicated runbook management?
While some monitoring tools offer separate runbook features, Better Stack integrates runbooks directly into Escalation policies, keeping your incident response workflow simple and unified.
Create structured incident response procedures using Better Stack's Escalation policies. Runbooks help your team follow consistent steps during incidents, reducing response time and ensuring nothing is missed.
While some monitoring tools offer separate runbook features, Better Stack integrates runbooks directly into Escalation policies, keeping your incident response workflow simple and unified.
A runbook is a structured set of predefined steps for handling specific incidents or scenarios. They typically include:
Create a dedicated Escalation policy for your new runbook:
Runbook: High CPU Usage
.- [ ]
to add an interactive task.## When to Use
Triggered when CPU > 90% for 5+ minutes on a web server.
## Steps
- [ ] **Acknowledge the Alert**
- [ ] **Find the Affected Server**
- Use logs or metrics dashboard to identify the instance/container
- Example: `aws ecs list-tasks --cluster web-prod`
- [ ] **SSH or Access Container**
- `ssh ec2-user@<instance-ip>`
- [ ] **Diagnose the Issue**
- Run `top` or `htop` to find CPU-heavy process
- Check application logs
- [ ] **Fix or Mitigate**
- Restart service if needed
- Scale up if traffic is legitimate
- [ ] **Verify**
- CPU drops below 70%
- No 5xx errors
- App is responsive
In your actual escalation policies - the ones that notify your team:
Use Metadata-based rule instead of the time-based rule, and redirect to your runbooks based on the incident metadata values.
Click the Report a new incident in your escalation policy to create a new incident.
You should see your instructions in the incident timeline:
Use a consistent prefix for easy identification:
Runbook: Database Outage
Runbook: API Rate Limiting
Runbook: SSL Certificate Expiry
The same runbook can be referenced across multiple escalation policies. For example, your High CPU Usage runbook might be used in policies for:
This approach keeps runbooks centralized while allowing flexible incident response workflows 🚀