Add trusted-server-adapter-cloudflare crate (PR 17)#644
Add trusted-server-adapter-cloudflare crate (PR 17)#644prk-Jr wants to merge 16 commits intofeature/edgezero-pr16-axum-dev-serverfrom
Conversation
…ator wasm32-unknown-unknown (Cloudflare Workers) does not support std::time::Instant — it panics at runtime. web_time::Instant is a zero-cost drop-in on native and JS-backed on WASM.
Implements the Cloudflare Workers adapter following the same pattern as trusted-server-adapter-axum: TrustedServerApp implements the Hooks trait, platform services use noop stubs on native (CI-compilable), and the #[event(fetch)] entry point delegates to edgezero_adapter_cloudflare::run_app. Also adds UnavailableHttpClient to trusted-server-core platform module, parallel to the existing UnavailableKvStore.
CI: test-cloudflare job checks native host compile, wasm32-unknown-unknown compile (with cloudflare feature), and runs host-target unit tests. CLAUDE.md: add cloudflare crate to workspace layout and build commands.
…un_app The rev (38198f9) of edgezero used in this workspace requires worker 0.7 (not 0.6) and run_app() takes a manifest_src: &str as first argument. Updated Cargo.toml and lib.rs accordingly.
- Implement CloudflareHttpClient (wasm32 only) using worker::Fetch for real
outbound proxy requests; strip content-encoding/transfer-encoding headers
since the Workers runtime auto-decompresses responses
- Add build.sh with cd-to-SCRIPT_DIR guard so worker-build always runs from
the correct crate root regardless of invocation directory
- Switch wrangler dev task to use --cwd from workspace root (same DX as Fastly)
- Add js-sys to workspace dependencies; reference it via { workspace = true }
- Fix #[ignore] messages on Cloudflare integration tests
- Replace std::time::{SystemTime,UNIX_EPOCH} with web_time in test code for
signing.rs and proxy.rs (consistency with production paths)
- Add NoopConfigStore/NoopSecretStore TODO comment tracking the gap
- Add extract_client_ip unit tests (parses cf-connecting-ip, absent, invalid)
- Remove empty crate_compiles_on_host_target test
- Add CloudflareHttpClient timeout doc noting Workers CPU-budget tradeoff
Replace direct worker::Env store construction with edgezero handles already injected by run_app, reducing #[cfg(target_arch = "wasm32")] blocks from 5 to 2. - ConfigStoreHandleAdapter: bridges ctx.config_store() to PlatformConfigStore — reuses the already-parsed TRUSTED_SERVER_CONFIG JSON handle rather than re-parsing it on every request - KvHandleAdapter: bridges ctx.kv_handle() to PlatformKvStore — reuses the env.kv() handle opened by run_app rather than opening a new one - CloudflareGeo: moved outside #[cfg]; reads cf-ipcountry and related headers via ctx.request().headers() which needs no platform import - CloudflareSecretStoreAdapter: kept under #[cfg] — env.secret() is synchronous at the JS level but PlatformSecretStore::get_bytes is sync while SecretHandle::get_bytes is async; direct env access is required - Add .dev.vars to .gitignore (wrangler convention for local secrets) - Add bytes workspace dep for KvStore impl
…ezero-pr17-cloudflare-adapter
- Implement CloudflareWorkers::spawn() to start wrangler dev; in CI uses wrangler.ci.toml (no build step, uses pre-built bundle); locally uses wrangler.toml (triggers build.sh rebuild) - Add wrangler.ci.toml: no [build] section so wrangler dev skips the worker-build step when the bundle is pre-built as a CI artifact - Add build-cloudflare input to setup-integration-test-env: adds wasm32-unknown-unknown target and runs build.sh with integration test env vars - In prepare-artifacts: enable build-cloudflare and upload build/ dir as artifact - In integration-tests: restore CF build dir, install wrangler, set CLOUDFLARE_WRANGLER_DIR, and remove --skip flags for cloudflare tests
- Add GeoInfo happy-path test: build_geo_lookup_returns_some_with_populated_country verifies country, city, continent, latitude, and longitude are correctly populated when Cloudflare headers are present - Simplify CloudflareSecretStoreAdapter::get_bytes: collapse brittle JsError string-matching guards into a single error arm with contextual message - Document sequential fan-out latency in CloudflareHttpClient: explain sum(DSP_i) vs max(DSP_i) implication and why true parallelism is blocked by the ?Send bound on PlatformHttpClient - Fix stale wrangler.toml comment: update to reflect ConfigStoreHandleAdapter wiring rather than the now-fallback NoopConfigStore - Extend CI triggers to feature branches: format.yml and test.yml now run fmt/clippy/tests on PRs targeting feature/** so stacking PRs are gated - Fix fmt violations caught during pre-review verification: platform.rs two-line Err wrapping and plan doc Prettier formatting
The adapter's tokio dev-dependency uses rt-multi-thread which is not supported on wasm32. It is already tested in the dedicated test-cloudflare job; exclude it from the workspace wasm test to avoid the compile error.
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Summary
Adds the Cloudflare adapter, runtime wiring, and CI coverage. CI is green, but I found two correctness issues that should be addressed before merge, plus two smaller follow-ups.
| } | ||
| }; | ||
|
|
||
| RouterService::builder() |
There was a problem hiding this comment.
🔧 wrench — Cloudflare routes skip the auth/finalize middleware chain.
Both existing adapters wrap their route tables with FinalizeResponseMiddleware and AuthMiddleware, but this router is built without either layer. That means Cloudflare no longer enforces per-path basic auth and it also skips the standard response finalization headers on every response.
Suggestion:
Port the Cloudflare adapter to use the same middleware chain as Axum/Fastly and add at least one middleware/auth test so this cannot regress silently.
| request: PlatformHttpRequest, | ||
| ) -> Result<PlatformPendingRequest, Report<PlatformError>> { | ||
| let backend_name = request.backend_name.clone(); | ||
| let response = self.execute(request).await?; |
There was a problem hiding this comment.
🔧 wrench — send_async() serializes auctions and does not enforce the request deadline.
send_async() eagerly awaits self.execute(request).await, so the auction orchestrator launches Cloudflare providers one by one instead of handing back true in-flight requests. execute() also waits on worker::Fetch::send().await without any timeout/abort path, so a slow upstream can block past the remaining auction budget.
Suggestion:
Make PlatformPendingRequest hold an actual in-flight fetch plus timeout/abort state and have select() resolve the ready request. If that is out of scope here, it would be safer to reject unsupported multi-provider fan-out on Cloudflare than to silently degrade auction behavior.
|
|
||
| impl PlatformBackend for NoopBackend { | ||
| fn predict_name(&self, spec: &PlatformBackendSpec) -> Result<String, Report<PlatformError>> { | ||
| Ok(format!("{}_{}", spec.scheme, spec.host)) |
There was a problem hiding this comment.
🤔 thinking — backend names are too coarse for reliable response correlation.
This only includes {scheme}_{host}, but the rest of the system treats port / timeout / certificate mode as part of backend identity. If two concurrent requests share a host but differ on those fields, the orchestrator's backend-to-provider map can collide and misattribute a response.
Suggestion:
Mirror the Axum/Fastly naming scheme and include at least port and timeout in the generated name.
| } | ||
|
|
||
| #[test] | ||
| #[ignore = "requires Docker and a running `wrangler dev` instance; see environments/cloudflare.rs"] |
There was a problem hiding this comment.
📝 note — this ignore message is stale now that CloudflareWorkers::spawn() starts wrangler dev itself.
The test no longer requires a separately running wrangler dev instance, so this text will mislead anyone trying to run the ignored test locally.
Suggestion:
Update the ignore reason to describe the real prerequisites instead (Docker, wrangler installed, and a prebuilt Cloudflare bundle).
Resolved conflicts: - .cargo/config.toml: extend test-fastly alias to also exclude trusted-server-adapter-cloudflare - .github/workflows/test.yml: use cargo test-fastly alias in CI (from PR16) and keep test-cloudflare job (from PR17) - CLAUDE.md: keep Cloudflare workspace entry and cargo test-axum alias; merge both adapters' commands - crates/trusted-server-core/src/proxy.rs: use #[tokio::test] + std::time pattern (consistent with PR16 tests) - Cargo.lock: regenerated to reflect merged workspace state
Brings in the default-members fix: restricts workspace default-members to `trusted-server-adapter-fastly` so Viceroy can locate the binary via `cargo run --bin` during `cargo test-fastly`.
aram356
left a comment
There was a problem hiding this comment.
Summary
Adds the Cloudflare Workers adapter, with web_time migration in core and a new CI job. The middleware-bypass and backend-naming concerns from the prior review are addressed and have regression tests, but the second prior 🔧 (sequential fan-out + missing per-request deadline) is mitigated only by documentation and a noisy log warning — the underlying behavior is unchanged. CI has three required gates failing.
Blocking
🔧 wrench
- CI is red on three required gates —
cargo fmt --all -- --check,format-docs, andintegration testsare failing on the latest run. The integration-tests failure is especially concerning because this PR is the one adding theCloudflareWorkersruntime to that matrix and the new env-var/wrangler wiring (integration-tests.yml:74-105). Either the new Cloudflare integration test broke, or it isn't actually running and another regression slipped in. Runcargo fmt --all,cd docs && npm run format, and reproduce the integration-tests failure locally withCLOUDFLARE_WRANGLER_DIRset; address whatever surfaces. Don't merge with red required checks. - Per-provider auction timeout is silently dropped on Cloudflare (
crates/trusted-server-adapter-cloudflare/src/platform.rs:289-315) — see inline. select()warning firesn-1times per auction → log noise on every multi-provider request (crates/trusted-server-adapter-cloudflare/src/platform.rs:329-337) — see inline.
Non-blocking
🤔 thinking
send_asyncalways materializes the body, then re-serializes it forselect(platform.rs:289-315) — see inline.Body::Streamsilently dropped in outbound request body (platform.rs:228-235) — see inline.
♻️ refactor
get_fallback/post_fallbacknear-duplicates (app.rs:307-363) — see inline.auth_middleware_rejects_with_401_when_credentials_requireddoesn't actually test 401 (tests/routes.rs:43-75) — see inline.
📌 out of scope
wrangler.tomlships a placeholder KV id (wrangler.toml:11) —id = "REPLACE_WITH_YOUR_KV_NAMESPACE_ID"is fine forwrangler dev --localbut will fail at deploy. Either call this out ingetting-started.mdnext to the deploy instructions, or use akv_namespaces[].preview_idso dev still works while the prod id stays empty/required. Follow-up issue is fine.
📝 note
worker::Env::clone()per request (platform.rs:444-451) — see inline.
Prior-review status
- ✅ Cloudflare routes skip middleware → fixed (new
src/middleware.rs+ regression tests intests/routes.rs:23-75). ⚠️ send_asyncserializes / no deadline → only documentation added; the underlying serialization and missing per-request timeout remain (see blocking findings above).- ✅ Backend names too coarse → fixed (
platform.rs:62-77includes scheme, host, port, timeout_ms, cert flag). - ✅ Stale ignore message → fixed (
integration.rs:139,147).
CI Status
- cargo fmt: FAIL
- format-docs: FAIL
- integration tests: FAIL
- cargo check (cloudflare native + wasm32-unknown-unknown): PASS
- cargo test: PASS
- cargo test (axum native): PASS
- vitest: PASS
- format-typescript: PASS
- browser integration tests: PASS
| request: PlatformHttpRequest, | ||
| ) -> Result<PlatformPendingRequest, Report<PlatformError>> { | ||
| let backend_name = request.backend_name.clone(); | ||
| let response = self.execute(request).await?; |
There was a problem hiding this comment.
🔧 wrench — Per-provider auction timeout is silently dropped on Cloudflare.
The orchestrator computes effective_timeout = remaining_ms.min(provider.timeout_ms()) per provider (orchestrator.rs:294) and bakes it into the backend name so Phase 1 can pick a budget. On Fastly, first_byte_timeout is enforced by the host. On Cloudflare, CloudflareHttpClient::execute calls Fetch::Request(...).send().await with no timeout/abort at all — the only ceiling is the Workers global CPU limit (~30s on paid plans).
This is the prior review's #2 concern. The PR addresses latency-blowup awareness (doc + a log::warn! in select()) but does not address the correctness half: a single slow upstream blocks the whole worker invocation and ignores the auction budget. With sequential send_async, an unhealthy DSP at position 0 prevents any other DSP from even being attempted. With Workers' default 30s wall-clock and a typical 300 ms auction budget, that's a 100× budget overrun.
Fix (any of):
- Wire an
AbortControllerper fetch:
use worker::{AbortController, RequestInit};
let aborter = AbortController::default();
init.with_signal(Some(&aborter.signal()));
// schedule `aborter.abort()` after `request.first_byte_timeout` via setTimeoutThis is the proper fix; worker::AbortController exists in worker 0.7.
- If timeout-aware fan-out is genuinely out of scope for PR-17, fail loud rather than degrade silently: detect
pending_requests.len() >= 2insend_async(not inselect) and returnPlatformError::HttpClientwith "multi-provider fan-out is not supported on Cloudflare; configure a single provider or use Fastly." This matches what the prior reviewer suggested ("safer to reject … than to silently degrade").
The current log::warn! runs n-1 times per auction (see separate finding) and a slow provider can still blow the worker's wall-clock budget regardless of any TS-level timeout config.
| degrade to sequential latency. Use the Fastly adapter for parallel \ | ||
| DSP fan-out.", | ||
| pending_requests.len() | ||
| ); |
There was a problem hiding this comment.
🔧 wrench — select() warning fires n-1 times per auction → log noise on every multi-provider request.
The orchestrator calls select() once per pending request in a loop (orchestrator.rs:381-388). The if pending_requests.len() >= 2 check is true for the first n-1 calls of every auction. With 5 providers at 1k QPS that's 4,000 warnings/sec, all identical, before any real diagnostic value. This is exactly the kind of thing that gets a log alert page-flooded and then silenced, defeating the purpose.
Fix: Move the warning to send_async and gate it on a OnceCell/AtomicBool so it fires once per worker instance, or downgrade to log::debug! and emit one warning per auction from app.rs. Better: combine with the timeout finding above and turn it into a single Err(...) if multi-provider mode is intentionally unsupported.
| let body_bytes = match response.response.into_body() { | ||
| edgezero_core::body::Body::Once(bytes) => bytes.to_vec(), | ||
| edgezero_core::body::Body::Stream(_) => vec![], | ||
| }; |
There was a problem hiding this comment.
🤔 thinking — send_async always materializes the body, then re-serializes it for select.
execute() already calls resp.bytes().await, so the body is Body::Once. send_async then deconstructs that into (status, headers, body_bytes) only to have select() rebuild a Response. The Body::Stream(_) => vec![] branch at line 306 is unreachable in this code path but silently lossy if the upstream contract ever changes. Either pass the assembled PlatformResponse straight through (skip the deconstruction) or assert unreachable!() on the Stream variant with a useful message.
| "CloudflareHttpClient: Body::Stream is not supported; \ | ||
| outbound request body will be empty" | ||
| ); | ||
| vec![] |
There was a problem hiding this comment.
🤔 thinking — Body::Stream is silently dropped in outbound request body too.
CloudflareHttpClient::execute warns then sends an empty body when the outbound request is Body::Stream. For auction/DSP traffic the body is bid JSON in Body::Once, so this never trips today. But for /first-party/proxy POSTs of large bodies, a streamed inbound request body would be dropped on the floor and the upstream would see an empty POST. A 4xx-with-context error would be safer than producing a malformed request.
|
|
||
| Ok(result.unwrap_or_else(|e| http_error(&e))) | ||
| } | ||
| }; |
There was a problem hiding this comment.
♻️ refactor — get_fallback and post_fallback are near-duplicates.
Two ~30-line closures differ only in whether the /static/tsjs= branch exists. Extract the common dispatch(method, path, ...) helper used by both. Cuts ~30 lines and makes the routing intent obvious. Same pattern is in the Axum adapter — could be done across both. (Also: let path = req.uri().path().to_string() allocates every request when &str would do; minor.)
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn auth_middleware_rejects_with_401_when_credentials_required() { |
There was a problem hiding this comment.
♻️ refactor — auth_middleware_rejects_with_401_when_credentials_required doesn't actually test 401.
The test name says "rejects with 401" but the body comment admits it can't actually verify that (settings may not have basic_auth) and only asserts x-geo-info-available and status != 404. That's a middleware-wiring smoke test, not an auth test. Either rename to middleware_chain_runs_for_auction_endpoint (matches what's tested) or stand up a settings fixture with basic_auth configured and assert status == 401 (matches the name).
| let secret_store: Arc<dyn PlatformSecretStore> = | ||
| edgezero_adapter_cloudflare::CloudflareRequestContext::get(ctx.request()) | ||
| .map(|cf_ctx| { | ||
| Arc::new(CloudflareSecretStoreAdapter { |
There was a problem hiding this comment.
📝 note — every request clones the JS-backed Env to construct CloudflareSecretStoreAdapter. worker::Env clones are Rc<…> under the hood so it's cheap, but worth noting if the secret store ends up in a hot path. Not action-required.
Summary
Changes
Closes
Closes #498
Test plan
Checklist
Known deferred items