fix(proxy): increase ResponseHeaderTimeout from 30s to 5m by andybons · Pull Request #23 · majorcontext/gatekeeper

andybons · 2026-04-26T18:20:08Z

Summary

Increases ResponseHeaderTimeout from 30 seconds to 5 minutes across all four transport paths (CONNECT interception, HTTP relay, MCP relay, default transport).

Problem

LLM inference — especially with extended thinking models like Claude 3.7 Sonnet via Bedrock — can take well over 30 seconds before the first response byte. The 30s ResponseHeaderTimeout caused intermittent read timeouts on slow-to-start completions.

Why 5 minutes

AWS recommends 1000-3600s for Bedrock LLM timeouts
Portkey (major LLM proxy) uses per-target configurable timeouts, default unset
LiteLLM uses 45s per fallback attempt
5 minutes covers extended thinking (~30-120s typical) with margin, while still catching genuinely dead connections
ResponseHeaderTimeout is time-to-first-byte only — once streaming starts, there's no further timeout (matching Portkey's behavior: "timeout not triggered if it gets at least a chunk")

Changes

proxy/proxy.go — two transports (default + interception)
proxy/relay.go — HTTP relay transport
proxy/mcp.go — MCP relay transport
CHANGELOG.md — v0.9.1 entry

LLM inference (especially extended thinking models) can take well over 30s before the first response byte arrives. The 30s timeout caused read timeouts on slow-to-start completions through Bedrock and other LLM providers. 5 minutes covers extended thinking while still catching genuinely dead connections. Once the first byte arrives, Go's transport has no further timeout — streaming continues indefinitely, matching the behavior of LLM proxies like Portkey.

claude · 2026-04-26T18:20:24Z

Claude finished @andybons's task in 1m 15s —— View job

PR Review: fix(proxy): increase ResponseHeaderTimeout from 30s to 5m

Gather diff and context
Read changed files for full context
Analyze security, correctness, and edge cases
Post review

Overview

Correct, well-reasoned fix for a real problem. The change bumps ResponseHeaderTimeout from 30s to 5m uniformly across all four transport paths. The 30s limit was silently killing slow-to-start LLM inference requests (Bedrock extended thinking can take 60-120s before the first response byte).

Correctness

All four transport paths are covered and consistent:

httpTransport — proxy/proxy.go:129
Per-request interception transport — proxy/proxy.go:1925
relayClient — proxy/relay.go:24
mcpRelayClient — proxy/mcp.go:42

No transport path is left behind. Good.

MCP / SSE is safe. The mcpRelayClient comment notes SSE is long-lived, but ResponseHeaderTimeout only measures time until the response headers arrive — not the body. Once the 200 OK + Content-Type: text/event-stream headers land, the timeout stops. The change doesn't affect the stream itself.

The per-request transport in handleConnectWithInterception (proxy.go:1914) is allocated fresh per CONNECT request. The 5m value here is fine, but it's worth noting this transport never pools connections (by design — it carries per-host TLS config). Each goroutine now holds its transport for up to 5m on a dead upstream instead of 30s, but for LLM proxy workloads this is the right trade-off.

Design consideration (non-blocking)

The timeout is hardcoded in four separate places. Since this is a library module meant to be imported (per CLAUDE.md), downstream callers might need different values for different service profiles. A single exported constant or a ResponseHeaderTimeout field on Proxy or the config struct would be cleaner and avoid drift:

// proxy/proxy.go
const defaultResponseHeaderTimeout = 5 * time.Minute

Then referenced as ResponseHeaderTimeout: defaultResponseHeaderTimeout in each transport. Not required for this fix, but worth a follow-up if more timeout knobs are needed.

No issues found

No security implications — this is a timing constant with no credential or TLS changes.
No race conditions introduced — all four transports are either package-level vars (initialized once) or per-request stack values.
No test coverage gaps beyond what already exists — timeout integration tests are not present in this codebase, and adding them for a constant change would be overkill.
Changelog entry is accurate and appropriately scoped.

Verdict: LGTM. The fix is complete, safe, and well-motivated.

andybons merged commit 116934c into main Apr 26, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(proxy): increase ResponseHeaderTimeout from 30s to 5m#23

fix(proxy): increase ResponseHeaderTimeout from 30s to 5m#23
andybons merged 1 commit intomainfrom
fix/response-header-timeout

andybons commented Apr 26, 2026

Uh oh!

claude Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andybons commented Apr 26, 2026

Summary

Problem

Why 5 minutes

Changes

Uh oh!

claude Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(proxy): increase ResponseHeaderTimeout from 30s to 5m

Overview

Correctness

Design consideration (non-blocking)

No issues found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Apr 26, 2026 •

edited

Loading