Shadow vs Canary
Shadow-canary combines two distinct traffic-splitting mechanisms. Understanding each one separately makes it easier to operate the system.
The two mechanisms
| Shadow | Canary | |
|---|---|---|
| Branch | master | production |
| Traffic share | Fixed 1% | 0% → 100% ramp |
| Duration | Permanent — always on | Ephemeral — active during a release |
| Purpose | Catch regressions before promoting | Safely promote a new release |
| Rollback | Kill-switch via Edge Config | Auto-rollback on SLO failure |
| Cookie value | shadow | prod-new or prod-previous |
Shadow: the permanent 1%
Every push to master ships a new shadow deploy. The middleware routes approximately 1% of production traffic to the shadow deploy, permanently. This gives you:
- Continuous real-traffic testing of your latest code before it is promoted
- Regression detection before the production merge
- A live canary for infrastructure changes (dependencies, Next.js upgrades) that are not yet in
production
The 1% is configurable via trafficShadowPercent in Edge Config. Setting it to 0 is the kill-switch — the shadow deploy still exists but receives no traffic.
Shadow users get a sticky shadow-bucket=shadow cookie (24-hour TTL). If a user is assigned to shadow, they stay on shadow for the session. IP allowlist entries bypass this — they go straight to shadow with no cookie.
Canary: the ephemeral ramp
A canary starts when master merges into production. The workflow:
- Deploys the new
productioncode and preserves the current prod URL asdeploymentDomainProdPrevious - Sets
trafficProdCanaryPercentto0— all prod traffic goes to the previous deploy - Every 15 minutes, the cron checks
/api/sloon the new deploy twice - If both checks pass, increments
trafficProdCanaryPercentby 4 (configurable) - At 100%, new users go straight to the new deploy; sessions already on
prod-previousfinish their journey there
A canary ends when trafficProdCanaryPercent reaches 100, or when the next deploy-prod.yml run overwrites the previous URL.
Swim-lane diagram
sequenceDiagram
participant User
participant Middleware
participant Shadow as Shadow deploy (master)
participant New as New prod deploy
participant Prev as Previous prod deploy
Note over Shadow: Always receives 1% of traffic
Note over New,Prev: Canary in progress (pct = 40%)
User->>Middleware: Request (no cookie)
Middleware->>Middleware: random() < 1%?
alt yes — shadow bucket
Middleware->>Shadow: Rewrite + set cookie=shadow
Shadow-->>User: Response (master branch code)
else no — prod bucket
Middleware->>Middleware: random() < 40%?
alt yes — new prod
Middleware->>New: Passthrough + set cookie=prod-new
New-->>User: Response (new code)
else no — previous prod
Middleware->>Prev: Rewrite + set cookie=prod-previous
Prev-->>User: Response (old code)
end
end
Why both
Shadow alone tells you whether the new code works under real traffic before you promote. Canary gives you a controlled rollout path after promotion.
Together they form a release pipeline where:
- Every
masterpush is tested live at 1% - Every
productionmerge starts at 0% and ramps gradually - A bad release is caught either by the shadow (before promotion) or by the SLO check (after promotion but early in the ramp)
The sticky cookie ensures users do not experience multiple versions in a single session, which matters for checkout flows, authentication, and any stateful interaction.
Next:
- Routing — the full middleware decision tree
- Edge Config — the configuration key that drives both mechanisms