Customer Communications Templates¶

Reusable copy the customer team's support / comms / on-call uses during real incidents and scheduled work. The goal is to take guesswork out of "what do we tell customers right now" — pick the template, fill the variables, send.

Audience: customer-team Tier 1 / Tier 2 first responders, support, customer-comms owner. Templates are role-shaped, not channel-shaped — every template carries both a public version (status page / customer email / Twitter) and an internal version (Slack / Teams / PagerDuty incident comms).

Cross-links: ../oncall-runbook.md, ../support-model.md, ../escalation-matrix.md.

Decision tree — which template, when¶

The severity of the incident (set per ../oncall-runbook.md §1) drives both which template to use and which approval gate applies. Use this tree:

Is it an incident?
├── No  →  Is it scheduled work?
│           ├── Yes →  scheduled-maintenance.md (T-7d / T-24h / T-0 / done)
│           └── No  →  not a customer comm; nothing to send
└── Yes →  What's the severity?
            ├── P0 (full outage / data integrity / breach)
            │   1. acknowledgement (within 5 min)
            │   2. progress-update (every 30 min)
            │   3. resolved (immediate)
            ├── P1 (significant degraded service)
            │   1. acknowledgement (within 10 min)
            │   2. progress-update (every 60 min)
            │   3. resolved (within 1 hr of recovery)
            ├── P2 (partial issue)
            │   →  service-degradation.md (single notice; update only if scope grows)
            └── P3 (cosmetic / single-user)
                →  no public comm; ticket-tracker reply only

The four incident-flow templates (incident-acknowledgement.md, incident-progress-update.md, incident-resolved.md, service-degradation.md) carry the full template bodies. scheduled-maintenance.md carries the four maintenance-window stages.

Approval workflow¶

Approvals are designed for speed at high severity, safety at low — the more public the audience and the higher the impact, the more eyes on the message before it goes out.

Severity	Public message (status page / email)	Internal message (Slack / PagerDuty)
P0	IC drafts; on-call lead signs off before send. If lead unreachable: send the templated acknowledgement as-is, escalate sign-off to leadership for the next update.	Templated send OK — IC posts directly.
P1	IC drafts; Tier 2 senior signs off before send.	Templated send OK.
P2	Templated send OK (use `service-degradation.md` as-is).	Templated send OK.
P3	No public comm.	Optional internal note.
Scheduled maintenance	Drafts go through customer-comms owner; T-7d notice requires leadership sign-off, T-24h / T-0 / done can use the templated send.	n/a — Slack reminder is internal only, no approval needed.

Sign-off doesn't mean "wait for a meeting" — it means a single message in #oncall from the named role saying "approved, send." The expected turnaround is < 2 minutes for P0, < 5 for P1.

Channels¶

Pick channels by audience. A single incident often hits multiple channels in parallel — the IC owns sequencing.

Channel	Use for	Don't use for
Status page ({{TBD: customer-team-owned URL}})	Every P0 / P1 / scheduled maintenance — both public-facing and the canonical timeline of record	P3, internal-only investigations
Customer email	P0 / P1 customers under SLA contract; scheduled-maintenance T-7d notice	Routine P2/P3
Twitter / public social ({{TBD: customer-team-owned handle}})	P0 only, mirrored from status-page wording	Anything below P0
Slack `#oncall`	All severities — internal play-by-play	Customer comms
Customer Slack bridge ({{TBD: shared channel name}})	P0 / P1 active triage with the customer's engineering counterparts	One-off questions (use email instead)
PagerDuty incident comms	P0 / P1 — automatic timeline of every status-page update	Manual updates already in Slack
Video bridge ({{TBD: zoom/meet/teams URL}})	P0 / P1 active triage	Anything routine

Variable conventions¶

Every template uses {NAMED_LIKE_THIS} placeholders for find/replace. The most common variables across templates:

{INCIDENT_ID} — internal ID (e.g., INC-2026-0042)
{TIME_DETECTED} — ISO 8601 UTC, e.g., 2026-04-25T14:32:00Z
{IMPACTED_SERVICES} — short list, customer language: dropbet sign-in, bet placement
{NEXT_UPDATE_BY} — ISO 8601 UTC, the "we'll update by" promise
{IC_NAME} — incident commander
{CUSTOMER_NAME} — operator / partner brand (kept generic in templates)
{ROOT_CAUSE_SHORT} — one customer-language sentence; used in resolved + RCA templates
{ETA_RANGE} — never a single time; always a range with a hard upper bound (e.g., "30–60 minutes")

Stick to these names — the customer team's eventual incident-comms automation will rely on them.

Tone¶

Factual, no marketing, no over-apologizing. The customer trusts honest measurement more than "we deeply regret any inconvenience."

Reference: Atlassian's incident communication principles and Google SRE workbook ch. 9 — incident response — both align with this tone.
Don't speculate on root cause before you're sure. "We are investigating" is fine; "We believe this is caused by …" is not, until §3 of the incident-runbook confirms it.
Don't promise a fix time. Use ranges ({ETA_RANGE}) and always with an explicit re-confirmation of the next-update window.
Do name what users can / cannot do right now. "Sign-in is unavailable; existing sessions are unaffected" is more useful than "service is degraded."