Template: Service Degradation (P2)¶

For partial degradations that aren't full outages — a single feature is slow or unreliable, a subset of users sees errors, a non-critical endpoint is failing. This is the P2 comms template per the README.md decision tree.

Approval gate: P2 templates send without sign-off. The customer team's first responder posts directly. Escalate to P1 (and use incident-acknowledgement.md) if the scope grows during the incident.

When to use this template (vs. P0/P1 templates)¶

Situation	Template
Feature is slow but functional for everyone	This template (single notice; update only if scope grows)
Feature fails for a subset (% < 10) of users	This template
Feature is unavailable for everyone	`incident-acknowledgement.md` (P1)
Single user reports a bug	No public comm; ticket reply only (P3)
Cosmetic issue, no user-visible impact	No public comm

The line between P2 and P1 is scope, not symptom. If 5% of users can't sign in, that's P2. If 100% of users can't sign in, that's P1 — promote and switch templates immediately.

Public version — status page¶

Title: Investigating — degraded performance on {IMPACTED_FEATURE}

We are investigating reports of degraded performance affecting {IMPACTED_FEATURE} on {CUSTOMER_NAME}.

Current status: investigating (degraded service)
Some users may experience: {USER_VISIBLE_SYMPTOM}
Workaround (if available): {WORKAROUND_OR_NONE}
Estimated time to resolution: {ETA_RANGE}
Next update: when status changes, or by {NEXT_UPDATE_BY} if no change

Most users are not affected. We will update this page when we have more information.

{USER_VISIBLE_SYMPTOM} is one customer-language sentence describing what an affected user sees:

✅ "longer-than-usual loading times when viewing bet history"
✅ "intermittent failures when transferring funds to the vault"
✅ "delayed leaderboard updates after placing bets"
❌ "5xx error rate elevated on /bets" (internal)

{WORKAROUND_OR_NONE} — be honest. If there isn't one, "no workaround is currently available; the issue does not affect new bets" is fine.

{ETA_RANGE} — always a range, with an upper bound. For P2 the range can be wider than P0/P1 ("1–4 hours" is acceptable; "by end of business day" is acceptable).

Public version — customer email (SLA-bound only, optional)¶

For SLA-bound customers, P2 generally doesn't require email — the status page is sufficient. Send email only if:

The customer has a contractual right to email notification at P2 ({{TBD: contractual specifics, customer-team to confirm}}), or
The degradation directly affects the customer's day-to-day operation in a way that the status page wouldn't surface.

Subject: [P2 service degradation] {IMPACTED_FEATURE}

Hello,

We are investigating a partial degradation of {IMPACTED_FEATURE} on {CUSTOMER_NAME}.

Some users may experience: {USER_VISIBLE_SYMPTOM}
Workaround: {WORKAROUND_OR_NONE}
Estimated time to resolution: {ETA_RANGE}

Most users are not affected. Live updates: {STATUS_PAGE_URL}.

Regards,
{CUSTOMER_NAME} Operations

Internal version — Slack `#oncall`¶

P2 degradation — {IMPACTED_FEATURE}

INC: {INCIDENT_ID}
Detected: {TIME_DETECTED}
Owner (Tier 1): {RESPONDER}
Status page: posted ({STATUS_PAGE_URL})
Symptom: {SYMPTOM_INTERNAL}
Hypothesis: {HYPOTHESIS_OR_NONE}
Working ticket: {TICKET_LINK}

Watch for scope growth — promote to P1 if {PROMOTION_TRIGGER}.
Investigation thread :point_down:

{PROMOTION_TRIGGER} is the explicit upgrade condition the responder commits to watching. Examples: "if error rate crosses 10%", "if affected user count exceeds 1k", "if duration exceeds 4 hours". Don't leave it unstated — without a trigger, P2 incidents drift.

Workaround communication¶

If a workaround exists, communicate it carefully — workarounds that are wrong cause more harm than no workaround.

Workaround pattern	Wording template
Retry-after-N-seconds	"If you encounter the issue, please retry after {N} seconds."
Use-different-path	"Affected users can use {ALTERNATIVE} as an alternative until the issue is resolved."
Wait-it-out	"No action is required; the issue will resolve automatically."
Clear-cache / re-login	"If you experience the issue, signing out and signing back in resolves it for most users."

Don't suggest a workaround you haven't validated. Confirm with the on-call engineer that the workaround actually works before publishing.

ETA framing — what's allowed¶

P2 incidents often last hours. The ETA framing rules:

✅ "Estimated time to resolution: 1–4 hours."
✅ "We expect resolution by end of business day."
✅ "We are evaluating multiple fix paths; we will provide an updated ETA when we have one."
❌ "We expect resolution within 30 minutes." (concrete short-promise — overshoots look bad)
❌ "The issue will be fixed soon." (meaningless)

If the ETA passes without resolution, post an updated estimate before the original deadline — never after.

When to update the status page¶

The status page entry stays in investigating (degraded service) state until:

Resolved: confirmed fix landed; transition to a single resolution post (use the incident-resolved.md format, but lighter — no formal RCA timeline for P2 unless customer requests it).
Promoted to P1: scope grew. Close this status entry with "scope expanded — see incident {NEW_INCIDENT_ID}" and use incident-acknowledgement.md for the new entry.
Re-classified as expected behavior: rare, but possible — what looked like a degradation is actually a known limitation. Close with one sentence linking to the relevant doc.

Don't let a P2 entry sit on the status page for > 24 hours without an update. Either it resolved, or it warrants a fresh statement of what's still being investigated.

Variables to fill¶

Variable	Notes
`{INCIDENT_ID}`	Internal incident ID
`{IMPACTED_FEATURE}`	Customer-language feature name. "Bet history", "Wallet transfers", "Leaderboards". Not the internal service or endpoint.
`{USER_VISIBLE_SYMPTOM}`	One customer-language sentence — the user-side observation, not the engineering signal
`{WORKAROUND_OR_NONE}`	Specific workaround or "no workaround is currently available"
`{ETA_RANGE}`	Always a range; wider than P0/P1 is OK
`{NEXT_UPDATE_BY}`	Default +4 hours for P2
`{PROMOTION_TRIGGER}`	Internal only — the explicit condition that escalates this to P1
`{TICKET_LINK}`	Working ticket in the team tracker

Cross-references¶

README.md — decision tree, channel matrix
incident-acknowledgement.md — what to use if scope grows to P1
incident-resolved.md — resolution wording (use the "lighter" P2 form, no public RCA unless requested)
../oncall-runbook.md §1 — the P2 row of the severity table that gates this template
../support-model.md — Tier 1's authority to post a P2 directly without sign-off