Gatekeeper is a control plane for multi-tenant SaaS: identity, API keys, quotas, entitlements, billing, webhooks, and audit behind one HTTP API. Every protected request funnels through a single access-decision engine before any route logic runs. That engine is fail-closed by construction, and the same core runs unchanged on Cloudflare Workers and on Bun.
One decision point#
There is exactly one place where access is decided: the authorize() pipeline in the gatekeeper engine. Routes never call domain services directly for access control. They call a guard() facade that builds the engine and runs the pipeline, so isolation and quota rules cannot be skipped per route.
The pipeline takes an access request and returns a single decision: allow, deny, or error. An allow is only ever produced by running off the end of the pipeline. Any halt, thrown exception, or absent-but-required dependency becomes a deny or error instead. There is no path where a backend fault silently becomes an allow.
The pipeline#
The seven steps run in a fixed order. The cheapest gates front the expensive ones, state-mutating work runs last, and identity is established before anything that needs an actor.
flowchart TD
A[authorize request] --> V[1 validate<br/>shape / tenant -> 400]
V --> R[2 rateLimit<br/>brute-force gate -> 429 / 503]
R --> I[3 identity<br/>resolve credential -> 401 / 503]
I --> B[4 tenant-binding<br/>cross-tenant guard -> 403]
B --> Z[5 authorize<br/>policy + role / perm -> 401 / 403 / 503]
Z --> E[6 entitlement<br/>plan feature gate -> 402 / 503]
E --> Q[7 quota<br/>usage gate, atomic -> 402 / 503]
Q --> OK[allow<br/>actor + tenantRole + quota]
V -.halt.-> D[deny / error]
R -.halt.-> D
I -.halt.-> D
B -.halt.-> D
Z -.halt.-> D
E -.halt.-> D
Q -.halt.-> D| Step | Gate | Why here |
|---|---|---|
| 1 | validate | Reject malformed input before spending any backend call. |
| 2 | rateLimit | Runs before identity so brute force on the credential itself is throttled. Cheapest gate, fronts everything. |
| 3 | identity | Establishes the actor. Everything after needs it. Anonymous is allowed to continue; the policy step decides. |
| 4 | tenant-binding | An API-key actor pointed at a tenant other than its own is denied before any role lookup. |
| 5 | authorize | The actual "may this actor do this" check, including membership and permission backend calls. |
| 6 | entitlement | Plan-level feature gating. Runs only when the request carries an entitlement spec. |
| 7 | quota | Usage metering. Last because it mutates state - it must only fire once everything else has passed. |
The optional gates (rate limit, entitlement, quota) execute only when the request carries the matching spec, so the common authenticated path stays cheap.
Engine guarantees#
The engine wraps the pipeline with defense in depth:
- A thrown exception is caught and converted to an
errordecision (503), with the actor reset to anonymous. A throw never becomes an allow. - An inconsistent allow (an allow that does not carry an authenticated actor) is downgraded to
503by an allow-consistency assertion. - Every decision - allow, deny, and error alike - is audited and observed. Audit and observability faults are swallowed; they never break a decision.
- Time is injected through a clock port, so decision latency is measured deterministically.
Runtime-portable by design#
Gatekeeper is hexagonal: the domain logic depends only on interfaces, and concrete adapters are wired in at one composition root. No domain package imports a Cloudflare or Bun type. The runtime entrypoint builds a single portable dependency bundle and forwards it down.
L4 clients TypeScript SDK / Python SDK
| HTTP
L3 composition apps/api (routes -> guard() -> service factories -> stores)
| authorize()
L2 engine the decision pipeline + ports
| service interfaces
L1 domain auth, apikeys, quota, permissions, entitlements,
services billing, payments, webhooks, audit, jobs, ratelimit
| foundation interfaces
L0 foundation common, crypto, database, events, observabilityThe same engine runs on two runtimes. The only difference is which adapters the entrypoint constructs:
| Dependency | Workers | Bun |
|---|---|---|
| Database | D1 adapter | bun:sqlite adapter |
| Cache | KV namespace | in-memory or Redis |
| Audit queue | Cloudflare Queue + DLQ | inline deferred write |
| Client IP | CF-Connecting-IP | socket peer address |
| Rate limiters | native [[ratelimits]] bindings | cache-backed fixed-window limiter |
| Usage counter | USAGE_COUNTER Durable Object | atomic SQL path |
Adding a new runtime (Node, Lambda, Vercel) is a thin entrypoint, not a fork: construct the dependency bundle from the platform's primitives, then forward the request to the app.
How a decision reaches HTTP#
The engine returns a transport-neutral decision. The guard() facade in the API app turns it into an HTTP response: it maps the decision's status to HTTP, optionally remaps a 403 to 404 to hide resource existence, and sets Retry-After on rate-limited responses. The engine package never reads transports and never writes HTTP. That boundary lives entirely in the app.
Related#
- Responses and errors - the uniform envelope and error-code table
- Tenancy and actors - tenants, memberships, and actor kinds
- Rate limits - the fixed-window tiers