Skip to content

fix(proxy): increase ResponseHeaderTimeout from 30s to 5m#23

Merged
andybons merged 1 commit intomainfrom
fix/response-header-timeout
Apr 26, 2026
Merged

fix(proxy): increase ResponseHeaderTimeout from 30s to 5m#23
andybons merged 1 commit intomainfrom
fix/response-header-timeout

Conversation

@andybons
Copy link
Copy Markdown
Contributor

Summary

Increases ResponseHeaderTimeout from 30 seconds to 5 minutes across all four transport paths (CONNECT interception, HTTP relay, MCP relay, default transport).

Problem

LLM inference — especially with extended thinking models like Claude 3.7 Sonnet via Bedrock — can take well over 30 seconds before the first response byte. The 30s ResponseHeaderTimeout caused intermittent read timeouts on slow-to-start completions.

Why 5 minutes

  • AWS recommends 1000-3600s for Bedrock LLM timeouts
  • Portkey (major LLM proxy) uses per-target configurable timeouts, default unset
  • LiteLLM uses 45s per fallback attempt
  • 5 minutes covers extended thinking (~30-120s typical) with margin, while still catching genuinely dead connections
  • ResponseHeaderTimeout is time-to-first-byte only — once streaming starts, there's no further timeout (matching Portkey's behavior: "timeout not triggered if it gets at least a chunk")

Changes

  • proxy/proxy.go — two transports (default + interception)
  • proxy/relay.go — HTTP relay transport
  • proxy/mcp.go — MCP relay transport
  • CHANGELOG.md — v0.9.1 entry

LLM inference (especially extended thinking models) can take well over
30s before the first response byte arrives. The 30s timeout caused
read timeouts on slow-to-start completions through Bedrock and other
LLM providers.

5 minutes covers extended thinking while still catching genuinely dead
connections. Once the first byte arrives, Go's transport has no further
timeout — streaming continues indefinitely, matching the behavior of
LLM proxies like Portkey.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 26, 2026

Claude finished @andybons's task in 1m 15s —— View job


PR Review: fix(proxy): increase ResponseHeaderTimeout from 30s to 5m

  • Gather diff and context
  • Read changed files for full context
  • Analyze security, correctness, and edge cases
  • Post review

Overview

Correct, well-reasoned fix for a real problem. The change bumps ResponseHeaderTimeout from 30s to 5m uniformly across all four transport paths. The 30s limit was silently killing slow-to-start LLM inference requests (Bedrock extended thinking can take 60-120s before the first response byte).

Correctness

All four transport paths are covered and consistent:

  • httpTransportproxy/proxy.go:129
  • Per-request interception transport — proxy/proxy.go:1925
  • relayClientproxy/relay.go:24
  • mcpRelayClientproxy/mcp.go:42

No transport path is left behind. Good.

MCP / SSE is safe. The mcpRelayClient comment notes SSE is long-lived, but ResponseHeaderTimeout only measures time until the response headers arrive — not the body. Once the 200 OK + Content-Type: text/event-stream headers land, the timeout stops. The change doesn't affect the stream itself.

The per-request transport in handleConnectWithInterception (proxy.go:1914) is allocated fresh per CONNECT request. The 5m value here is fine, but it's worth noting this transport never pools connections (by design — it carries per-host TLS config). Each goroutine now holds its transport for up to 5m on a dead upstream instead of 30s, but for LLM proxy workloads this is the right trade-off.

Design consideration (non-blocking)

The timeout is hardcoded in four separate places. Since this is a library module meant to be imported (per CLAUDE.md), downstream callers might need different values for different service profiles. A single exported constant or a ResponseHeaderTimeout field on Proxy or the config struct would be cleaner and avoid drift:

// proxy/proxy.go
const defaultResponseHeaderTimeout = 5 * time.Minute

Then referenced as ResponseHeaderTimeout: defaultResponseHeaderTimeout in each transport. Not required for this fix, but worth a follow-up if more timeout knobs are needed.

No issues found

  • No security implications — this is a timing constant with no credential or TLS changes.
  • No race conditions introduced — all four transports are either package-level vars (initialized once) or per-request stack values.
  • No test coverage gaps beyond what already exists — timeout integration tests are not present in this codebase, and adding them for a constant change would be overkill.
  • Changelog entry is accurate and appropriately scoped.

Verdict: LGTM. The fix is complete, safe, and well-motivated.

@andybons andybons merged commit 116934c into main Apr 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant