SLO integration
By default, /api/slo is a stub that always returns HTTP 200. The canary ramp only works as a safety gate if you replace it with a real health signal.
The contract is simple: return HTTP 200 if the deploy is healthy, any other status code (or a timeout) if it is not. The cron checks twice, 30 seconds apart. Both must be 200 to bump.
What to measure
The most useful signal for a canary gate is the error rate of the new deploy, compared to a baseline. Options:
- Absolute threshold — fail if error rate exceeds N% in the last 5 minutes
- Relative threshold — fail if error rate is more than 2x the previous deploy’s rate
- Composite — error rate + p99 latency + custom business metric
A simple absolute threshold (e.g. “fail if error rate > 1% in last 5 minutes”) is usually enough to start. Tune it once you understand your baseline.
Sentry
Sentry provides a REST API to query error rates per release.
import { NextResponse } from 'next/server';
const SENTRY_TOKEN = process.env.SENTRY_AUTH_TOKEN!;const SENTRY_ORG = process.env.SENTRY_ORG!;const SENTRY_PROJECT = process.env.SENTRY_PROJECT!;const ERROR_RATE_THRESHOLD = 0.01; // 1%
export async function GET() { // Get the current Vercel deployment ID to scope the query const release = process.env.VERCEL_DEPLOYMENT_ID; if (!release) { return NextResponse.json({ ok: true, reason: 'no release id' }); }
const url = new URL( `https://sentry.io/api/0/projects/${SENTRY_ORG}/${SENTRY_PROJECT}/stats/` ); url.searchParams.set('query', `release:${release}`); url.searchParams.set('stat', 'received'); url.searchParams.set('resolution', '1m'); url.searchParams.set('since', String(Math.floor(Date.now() / 1000) - 300)); // last 5m
const res = await fetch(url.toString(), { headers: { Authorization: `Bearer ${SENTRY_TOKEN}` }, });
if (!res.ok) { // If Sentry is unreachable, fail safe (do not bump) return NextResponse.json({ ok: false, reason: 'sentry unreachable' }, { status: 503 }); }
const data = await res.json(); const errorRate = data.errorRate ?? 0;
if (errorRate > ERROR_RATE_THRESHOLD) { return NextResponse.json( { ok: false, errorRate, threshold: ERROR_RATE_THRESHOLD }, { status: 503 } ); }
return NextResponse.json({ ok: true, errorRate });}Add these env vars to Vercel (Production only — the SLO endpoint runs on prod deploys):
| Variable | Value |
|---|---|
SENTRY_AUTH_TOKEN | Sentry API token with project:read scope |
SENTRY_ORG | Your Sentry organization slug |
SENTRY_PROJECT | Your Sentry project slug |
Datadog
Query a metric over the last 5 minutes using the Datadog Metrics API.
import { NextResponse } from 'next/server';
const DD_API_KEY = process.env.DD_API_KEY!;const DD_APP_KEY = process.env.DD_APP_KEY!;const DD_SITE = process.env.DD_SITE ?? 'datadoghq.com';const ERROR_RATE_THRESHOLD = 0.01;
export async function GET() { const now = Math.floor(Date.now() / 1000); const from = now - 300; // 5 minutes
const deployId = process.env.VERCEL_DEPLOYMENT_ID ?? 'unknown';
const query = encodeURIComponent( `sum:next.server.request.error{deployment_id:${deployId}}.as_rate()` );
const res = await fetch( `https://api.${DD_SITE}/api/v1/query?from=${from}&to=${now}&query=${query}`, { headers: { 'DD-API-KEY': DD_API_KEY, 'DD-APPLICATION-KEY': DD_APP_KEY, }, } );
if (!res.ok) { return NextResponse.json({ ok: false, reason: 'datadog unreachable' }, { status: 503 }); }
const data = await res.json(); const points = data.series?.[0]?.pointlist ?? []; const errorRate = points.length > 0 ? points.reduce((sum: number, [, v]: [number, number]) => sum + v, 0) / points.length : 0;
if (errorRate > ERROR_RATE_THRESHOLD) { return NextResponse.json({ ok: false, errorRate }, { status: 503 }); }
return NextResponse.json({ ok: true, errorRate });}PostHog
Use PostHog’s Insights API to query error events for the current deploy.
import { NextResponse } from 'next/server';
const PH_HOST = process.env.POSTHOG_HOST ?? 'https://app.posthog.com';const PH_API_KEY = process.env.POSTHOG_PERSONAL_API_KEY!;const PH_PROJECT = process.env.POSTHOG_PROJECT_ID!;const ERROR_COUNT_THRESHOLD = 10; // absolute error count in last 5 minutes
export async function GET() { const deployId = process.env.VERCEL_DEPLOYMENT_ID ?? '';
const body = { events: [{ id: '$exception', math: 'total' }], properties: deployId ? [{ key: 'vercel_deployment_id', value: deployId, operator: 'exact' }] : [], date_from: '-5m', };
const res = await fetch(`${PH_HOST}/api/projects/${PH_PROJECT}/insights/trend/`, { method: 'POST', headers: { Authorization: `Bearer ${PH_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify(body), });
if (!res.ok) { return NextResponse.json({ ok: false, reason: 'posthog unreachable' }, { status: 503 }); }
const data = await res.json(); const count = data.result?.[0]?.data?.at(-1) ?? 0;
if (count > ERROR_COUNT_THRESHOLD) { return NextResponse.json({ ok: false, errorCount: count }, { status: 503 }); }
return NextResponse.json({ ok: true, errorCount: count });}Custom metric
If you have a custom monitoring solution, implement the endpoint to match your signal. The only requirement: return 200 if healthy, non-200 otherwise.
// Minimal custom check — e.g. query your own database for error countsimport { NextResponse } from 'next/server';import { db } from '@/lib/db';
export async function GET() { const errors = await db.errors.count({ where: { createdAt: { gte: new Date(Date.now() - 5 * 60 * 1000) }, deployId: process.env.VERCEL_DEPLOYMENT_ID, }, });
if (errors > 5) { return NextResponse.json({ ok: false, errors }, { status: 503 }); }
return NextResponse.json({ ok: true, errors });}Thresholds and tuning
Start conservative and tighten once you understand your baseline:
| Signal | Conservative | Aggressive |
|---|---|---|
| Error rate | 1% | 0.1% |
| Error count (5m window) | 20 | 5 |
| p99 latency increase | +500ms | +100ms |
Fail safe: If the SLO endpoint itself is unreachable (network error, timeout, 500 from a dependency), return 503. The cron treats any non-200 as a failure and does not bump. This prevents a broken monitoring system from allowing a bad canary to complete.
Query the new deploy, not the previous: Scope your metrics to process.env.VERCEL_DEPLOYMENT_ID. Without scoping, you might measure the healthy previous deploy’s metrics and mask a broken new deploy.
Related:
- Workflows — how the cron calls this endpoint
- Troubleshooting — SLO check failures and false positives