How do I describe my role in an incident without blaming myself?

Stick to facts: "I made a change to [X]. That led to [Y]. I've [remediation]." Focus on what you learned and what you'll do differently.

What if I don't know the root cause yet?

Say so: "We're still investigating. Our current hypothesis is [X]. I'll update the doc when we have more data."

How do I suggest process improvements?

"To prevent this, I suggest we [action]. That would have caught [problem]."

Should I mention others' mistakes?

Focus on the system and process. If someone's action is relevant, describe it factually without naming: "The deploy included a config change that wasn't in the runbook."

How detailed should the timeline be?

Include key moments: deploy, first alert, mitigation start, restore. Minutes matter for incidents.

Incident Root Cause Analysis

Quick answers

What is an incident RCA?: A post-incident meeting to understand what happened, why, and how to prevent it.
What do you cover in an RCA?: Timeline, impact, root cause, and action items to prevent recurrence.
How do I describe the root cause?: Be specific and factual. State what failed and why, without blame.

What it is

After an outage or incident, teams run an RCA (or postmortem). The format is often: What happened? What was the impact? What was the root cause? What are we doing to prevent recurrence? The focus is on learning, not blame.

Why it matters

Clear communication in RCAs helps the whole team learn. Describing timelines, causes, and remediation in plain language is important. Blame-free language and owning your part—without over-apologizing—builds trust.

Instead of → Say

Instead of	Say
It was my fault	I missed the rate limit in the config. I've updated the runbook.
The system broke	The database connection pool was exhausted under load. We've increased the pool size.
We fixed it	We rolled back the deploy, restored from backup, and validated the fix in staging.
It won't happen again	We're adding a pre-deploy check for config changes and alerting on pool usage.
I don't know why	The root cause appears to be the combination of [X] and [Y]. I'll dig deeper and update the doc.

Example dialogue

Facilitator: Can you walk us through the timeline?

You: The deploy went out at 2 p.m. Metrics looked normal until 2:15, when we saw a spike in 5xx errors. We rolled back at 2:25 and restored service by 2:40.

Facilitator: What was the root cause?

You: The new feature introduced a query that wasn't indexed. Under load it caused a cascade. I've added the index and we're adding a performance test to catch this.

Facilitator: Any other action items?

You: I'll update the runbook with the rollback steps. The process worked, but we can document it better.

Common mistakes

Focusing on blame instead of process
Being vague about the root cause
Skipping action items or follow-ups
Over-apologizing—own the issue, then focus on solutions
Using too much jargon without context

Frequently asked questions

How do I describe my role in an incident without blaming myself?: Stick to facts: "I made a change to [X]. That led to [Y]. I've [remediation]." Focus on what you learned and what you'll do differently.
What if I don't know the root cause yet?: Say so: "We're still investigating. Our current hypothesis is [X]. I'll update the doc when we have more data."
How do I suggest process improvements?: "To prevent this, I suggest we [action]. That would have caught [problem]."
Should I mention others' mistakes?: Focus on the system and process. If someone's action is relevant, describe it factually without naming: "The deploy included a config change that wasn't in the runbook."
How detailed should the timeline be?: Include key moments: deploy, first alert, mitigation start, restore. Minutes matter for incidents.

Ready to practice?

Try this scenario with AI and get feedback on your communication.

Start practicing