Template: Service Degradation (P2)¶
For partial degradations that aren't full outages — a single feature is slow or unreliable, a subset of users sees errors, a non-critical endpoint is failing. This is the P2 comms template per the README.md decision tree.
Approval gate: P2 templates send without sign-off. The customer team's first responder posts directly. Escalate to P1 (and use
incident-acknowledgement.md) if the scope grows during the incident.
When to use this template (vs. P0/P1 templates)¶
| Situation | Template |
|---|---|
| Feature is slow but functional for everyone | This template (single notice; update only if scope grows) |
| Feature fails for a subset (% < 10) of users | This template |
| Feature is unavailable for everyone | incident-acknowledgement.md (P1) |
| Single user reports a bug | No public comm; ticket reply only (P3) |
| Cosmetic issue, no user-visible impact | No public comm |
The line between P2 and P1 is scope, not symptom. If 5% of users can't sign in, that's P2. If 100% of users can't sign in, that's P1 — promote and switch templates immediately.
Public version — status page¶
Title: Investigating — degraded performance on {IMPACTED_FEATURE}
We are investigating reports of degraded performance affecting {IMPACTED_FEATURE} on {CUSTOMER_NAME}.
Current status: investigating (degraded service)
Some users may experience: {USER_VISIBLE_SYMPTOM}
Workaround (if available): {WORKAROUND_OR_NONE}
Estimated time to resolution: {ETA_RANGE}
Next update: when status changes, or by {NEXT_UPDATE_BY} if no change
Most users are not affected. We will update this page when we have more information.
{USER_VISIBLE_SYMPTOM} is one customer-language sentence describing what an affected user sees:
- ✅ "longer-than-usual loading times when viewing bet history"
- ✅ "intermittent failures when transferring funds to the vault"
- ✅ "delayed leaderboard updates after placing bets"
- ❌ "5xx error rate elevated on
/bets" (internal)
{WORKAROUND_OR_NONE} — be honest. If there isn't one, "no workaround is currently available; the issue does not affect new bets" is fine.
{ETA_RANGE} — always a range, with an upper bound. For P2 the range can be wider than P0/P1 ("1–4 hours" is acceptable; "by end of business day" is acceptable).
Public version — customer email (SLA-bound only, optional)¶
For SLA-bound customers, P2 generally doesn't require email — the status page is sufficient. Send email only if:
- The customer has a contractual right to email notification at P2 ({{TBD: contractual specifics, customer-team to confirm}}), or
- The degradation directly affects the customer's day-to-day operation in a way that the status page wouldn't surface.
Subject: [P2 service degradation] {IMPACTED_FEATURE}
Hello,
We are investigating a partial degradation of {IMPACTED_FEATURE} on {CUSTOMER_NAME}.
Some users may experience: {USER_VISIBLE_SYMPTOM}
Workaround: {WORKAROUND_OR_NONE}
Estimated time to resolution: {ETA_RANGE}
Most users are not affected. Live updates: {STATUS_PAGE_URL}.
Regards,
{CUSTOMER_NAME} Operations
Internal version — Slack #oncall¶
P2 degradation — {IMPACTED_FEATURE}
INC: {INCIDENT_ID}
Detected: {TIME_DETECTED}
Owner (Tier 1): {RESPONDER}
Status page: posted ({STATUS_PAGE_URL})
Symptom: {SYMPTOM_INTERNAL}
Hypothesis: {HYPOTHESIS_OR_NONE}
Working ticket: {TICKET_LINK}
Watch for scope growth — promote to P1 if {PROMOTION_TRIGGER}.
Investigation thread :point_down:
{PROMOTION_TRIGGER} is the explicit upgrade condition the responder commits to watching. Examples: "if error rate crosses 10%", "if affected user count exceeds 1k", "if duration exceeds 4 hours". Don't leave it unstated — without a trigger, P2 incidents drift.
Workaround communication¶
If a workaround exists, communicate it carefully — workarounds that are wrong cause more harm than no workaround.
| Workaround pattern | Wording template |
|---|---|
| Retry-after-N-seconds | "If you encounter the issue, please retry after {N} seconds." |
| Use-different-path | "Affected users can use {ALTERNATIVE} as an alternative until the issue is resolved." |
| Wait-it-out | "No action is required; the issue will resolve automatically." |
| Clear-cache / re-login | "If you experience the issue, signing out and signing back in resolves it for most users." |
Don't suggest a workaround you haven't validated. Confirm with the on-call engineer that the workaround actually works before publishing.
ETA framing — what's allowed¶
P2 incidents often last hours. The ETA framing rules:
- ✅ "Estimated time to resolution: 1–4 hours."
- ✅ "We expect resolution by end of business day."
- ✅ "We are evaluating multiple fix paths; we will provide an updated ETA when we have one."
- ❌ "We expect resolution within 30 minutes." (concrete short-promise — overshoots look bad)
- ❌ "The issue will be fixed soon." (meaningless)
If the ETA passes without resolution, post an updated estimate before the original deadline — never after.
When to update the status page¶
The status page entry stays in investigating (degraded service) state until:
- Resolved: confirmed fix landed; transition to a single resolution post (use the
incident-resolved.mdformat, but lighter — no formal RCA timeline for P2 unless customer requests it). - Promoted to P1: scope grew. Close this status entry with "scope expanded — see incident {NEW_INCIDENT_ID}" and use
incident-acknowledgement.mdfor the new entry. - Re-classified as expected behavior: rare, but possible — what looked like a degradation is actually a known limitation. Close with one sentence linking to the relevant doc.
Don't let a P2 entry sit on the status page for > 24 hours without an update. Either it resolved, or it warrants a fresh statement of what's still being investigated.
Variables to fill¶
| Variable | Notes |
|---|---|
{INCIDENT_ID} |
Internal incident ID |
{IMPACTED_FEATURE} |
Customer-language feature name. "Bet history", "Wallet transfers", "Leaderboards". Not the internal service or endpoint. |
{USER_VISIBLE_SYMPTOM} |
One customer-language sentence — the user-side observation, not the engineering signal |
{WORKAROUND_OR_NONE} |
Specific workaround or "no workaround is currently available" |
{ETA_RANGE} |
Always a range; wider than P0/P1 is OK |
{NEXT_UPDATE_BY} |
Default +4 hours for P2 |
{PROMOTION_TRIGGER} |
Internal only — the explicit condition that escalates this to P1 |
{TICKET_LINK} |
Working ticket in the team tracker |
Cross-references¶
README.md— decision tree, channel matrixincident-acknowledgement.md— what to use if scope grows to P1incident-resolved.md— resolution wording (use the "lighter" P2 form, no public RCA unless requested)../oncall-runbook.md§1 — the P2 row of the severity table that gates this template../support-model.md— Tier 1's authority to post a P2 directly without sign-off