Troubleshooting

Symptom table

Symptom	Likely cause	Fix
Local `next dev` logs `[shadow-canary] middleware passthrough (VERCEL_ENV != 'production')` on every request	`VERCEL_GIT_REPO_SLUG` not set locally	Run `vercel env pull` OR add `VERCEL_GIT_REPO_SLUG=<your-repo-slug>` to `.env.local`. Middleware degrades to passthrough in dev — behavior is correct but the warn is telling you the Edge Config key can’t be derived
Production deploy returns 500 on every request with `VERCEL_GIT_REPO_SLUG is not set`	The Vercel project isn’t linked to a Git repo, OR the env var was explicitly overridden to empty	Re-link the Git repo in Vercel Project Settings > Git, then redeploy. `VERCEL_GIT_REPO_SLUG` is auto-injected on every linked-project deploy
404 on JS/CSS chunks after deploy	Skew Protection is OFF	Enable Skew Protection in Vercel Project Settings (Pro/Enterprise required)
`/admin` shows “unconfigured” or fails to load Edge Config data	Edge Config store not linked to the project, OR `shadow-<repo-slug>-canary` key not yet populated	Vercel dashboard > Storage > your store > Connected Projects — add your project. Run `deploy-shadow.yml` once to populate the key
Cross-deploy rewrites return 401	Deployment Protection is blocking shadow/previous deploy URLs	Disable SSO Protection or enable Protection Bypass for Automation (see Prerequisites)
Canary cron does not fire	Default branch is not `master`	Rename default branch to `master` (GitHub Settings > Branches) or update the `on.push.branches` trigger in the workflow files
Canary stuck at 4% (or any low %)	`canaryPaused: true` in Edge Config	Use the admin UI Resume button, or set `canaryPaused: false` directly in Edge Config
Canary stuck at 0% after deploy	First deploy with no previous prod URL	Use `[skip-canary]` on first production push, or push again after the bootstrap deploy
Shadow deploy gets 0% traffic	`trafficShadowPercent: 0` in Edge Config	Set `trafficShadowPercent: 1` in Edge Config (propagates in 60s)
`/debug` page shows wrong branch	You are hitting the shadow or previous prod URL directly	This is expected — those deploys always show their own branch. Visit via the custom domain to see the routing in action
Cookie does not stick across requests	Cookie set with wrong domain or `SameSite` mismatch	Ensure the middleware sets `sameSite: 'lax'` and `path: '/'`; check that the custom domain matches what the browser expects
SLO check always fails	`/api/slo` returns non-200	Check the endpoint response: `curl https://your-app.vercel.app/api/slo`. If it is a stub, it returns 200 by default — something is wrong with your custom implementation
`vercel promote` fails in CI	Token does not have team scope or wrong org ID	Regenerate the token with team scope and verify `VERCEL_ORG_ID` matches the team’s `orgId` in `.vercel/project.json`
Admin login returns 401	`ADMIN_USER` or `ADMIN_PASS` env var not set, or wrong value	Check Vercel Project Settings > Environment Variables. Defaults are `admin` / `12345` if vars are absent
Edge Config reads fail at runtime	`EDGE_CONFIG` connection string not injected	Vercel injects this automatically when the store is linked. Re-link the store to the project and redeploy
Middleware runs on shadow/previous deploy and routes again	`x-shadow-routed` header not set or stripped	Verify `rewriteTo` sets `x-shadow-routed: 1`. Check that no other middleware or proxy strips it before reaching the target deploy
Rollback button in admin returns 500	`VERCEL_API_TOKEN` not set or expired	Add/refresh `VERCEL_API_TOKEN` in Vercel env vars
Workflow fails with `::error::Edge Config read failed (HTTP NNN)`	Vercel API transient — `401`/`403` token expired or wrong scope, `429` rate limit, `5xx` Vercel outage, `000` runner network error	Fail-safe, not a bug. This step refuses to write Edge Config when the read can’t be trusted, preventing the historical state-clobber bug. Check Vercel status, wait for recovery, then re-trigger the workflow from the Actions UI. Edge Config was not mutated — the step exits before the PATCH. For `401`/`403`, regenerate `VERCEL_TOKEN` with team scope. For `404` on the project lookup specifically, verify `VERCEL_PROJECT_ID` matches the project the token can access
Workflow fails with `::error:: … body is not valid JSON`	Vercel returned `200 OK` with a non-JSON body (CDN error page during an incident)	Same recovery as the HTTP-NNN error above — re-trigger once the API is healthy. The fail-fast is intentional: a malformed 200 response would otherwise be parsed as `{}` and clobber state

Canary stuck: detailed checklist

If trafficProdCanaryPercent has not changed in over 15 minutes:

Check GitHub Actions > Canary ramp — is the workflow running? Look for a failed or skipped run.
In the failing run, check the “Skip if no canary” step — is paused=true? Use admin UI to resume.
Check the “Run 2 SLO checks” step — what HTTP code is /api/slo returning?
Verify VERCEL_TOKEN has not expired and VERCEL_EDGE_CONFIG_ID is correct.
Manually trigger the workflow (Actions > Canary ramp > Run workflow) to test.

Shadow not routing

If you visit /debug from multiple incognito windows and never get the shadow deploy:

Verify deploymentDomainShadow is set in Edge Config (not empty string)
Verify trafficShadowPercent is greater than 0
Check that the middleware is running on the production deploy (not preview) — VERCEL_ENV must be production
Bot detection may be filtering your client — check the user-agent

At 1%, you need roughly 100 requests to statistically expect one shadow assignment. Use shadowForceIPs or the /debug Force Shadow button for testing.

SLO false positives

If the canary rolls back but the deploy looks healthy:

The SLO endpoint may be timing out — the cron uses --max-time 10. If your check makes slow external API calls, it may be cut off.
The SLO endpoint may be returning 500 due to an unrelated dependency issue. Make the endpoint fail open if monitoring is unavailable.
Two checks 30 seconds apart is a small sample. If your error rate is naturally spiky (e.g. a scheduled job that briefly spikes errors), the check timing may coincide with a spike. Consider adding a moving average or increasing the check interval.

Related:

Workflows — workflow internals and customization
Edge Config — field reference for manual edits
Dashboard — Pause, Resume, Cancel controls