Skip to content

Opt-in HTTP keep-alive via keep_alive_connections#1

Closed
erimicel wants to merge 1 commit intomasterfrom
keepalive-connections
Closed

Opt-in HTTP keep-alive via keep_alive_connections#1
erimicel wants to merge 1 commit intomasterfrom
keepalive-connections

Conversation

@erimicel
Copy link
Copy Markdown
Member

@erimicel erimicel commented Apr 27, 2026

Summary

  • Adds an opt-in keep_alive_connections config flag (default false).
  • When enabled, Typesense::ApiCall reuses Faraday connections per (thread, node) via the :net_http_persistent adapter instead of building a fresh Faraday.new on every request.
  • Drops the cached connection on any rescued network error so a half-closed keep-alive socket cannot fail the retry as well.

Motivation

Sentry profiling of a hot endpoint in our API showed that ~140 ms of a ~150 ms request was spent in the Typesense round-trip — most of it in OpenSSL::X509::Store#set_default_paths and the TLS handshake, repeated on every search call. The root cause is ApiCall#perform_request building a brand-new Faraday.new(...) (and therefore a new TCP + TLS handshake) on every call. Aggregate metrics confirm ~50% of this endpoint's wall time is I/O.

Design notes

  • Default-off, so the version bump is a no-op. The original per-request Faraday.new path is preserved when keep_alive_connections is false. Existing users (and our own production deploys) see zero behaviour change until they opt in.
  • Per-thread caching. Net::HTTP is not thread-safe. Caching per Thread.current means each Puma/Sidekiq worker thread maintains its own keep-alive socket per node, with no cross-thread sharing.
  • Per-instance isolation. Cache is keyed by ApiCall#object_id, so multiple Typesense::Client instances in the same process do not share sockets.
  • Node round-robin preserved. The cache is keyed by protocol://host:port, so the existing healthcheck/round-robin logic still selects nodes the same way.
  • Stale-socket recovery. Server- or LB-side idle timeouts will eventually drop a kept-alive socket. The next request sees Faraday::ConnectionFailed / Errno::ECONNRESET etc. — the gem already rescues these inside the retry loop, and we now also evict the cached connection so the retry opens a fresh socket. Pair with num_retries >= 1 for transparent recovery.
  • Idle timeout 30s. Tune your load balancer to match or exceed.

Usage

Typesense::Client.new(
  api_key: ENV['TYPESENSE_API_KEY'],
  nodes: [{ host: 'localhost', port: 8108, protocol: 'https' }],
  connection_timeout_seconds: 3,
  num_retries: 1,
  keep_alive_connections: true
)

Test plan

  • bundle exec rspec spec/typesense/api_call_spec.rb — 50 examples, 0 failures (was 42; +8 new).
  • bundle exec rspec (full suite) — 159 examples, 1 failure, 27 pending. The single failure is collections_spec.rb:156 (truncate_len schema mismatch with the running Typesense container) — pre-existing on master, verified before this branch, unrelated.
  • Validate in staging once consumed downstream: confirm P95 latency drops on the v3 search endpoint with no rise in Faraday::ConnectionFailed / HTTPStatus0 rate.

New spec coverage

  • Connection reuse on the same thread.
  • Per-node cache keying.
  • Per-thread isolation.
  • Per-instance isolation.
  • Eviction on network error so retries open a fresh socket.
  • Configured timeouts propagate to the cached connection.
  • Default-off path: flag defaults to false, no thread-local cache populated.

@lloydwatkin
Copy link
Copy Markdown
Member

This is being contributed back to typesense?

Currently `Typesense::ApiCall#perform_request` builds a fresh
`Faraday.new(...)` (and therefore a new TCP and TLS handshake) on every
request. On hot endpoints this can dominate the Typesense round-trip
latency.

This adds an opt-in `keep_alive_connections` configuration option (default
`false`, so existing users see no behaviour change). When enabled:

* Faraday connections are cached per `(thread, node)` rather than
  constructed per request. Net::HTTP is not thread-safe, so per-thread
  caching keeps concurrent callers isolated while still respecting the
  existing node round-robin.
* Connections use the `:net_http_persistent` Faraday adapter with a 30s
  idle timeout, so reused sockets are dropped before most load balancers
  cull them.
* On any rescued network error, the cached connection is dropped before
  the gem retries, so a half-closed keep-alive socket cannot fail the
  retry as well. Pair with `num_retries >= 1` for transparent recovery
  from server- or load-balancer-side idle timeouts.

The `:net_http_persistent` adapter and its `net-http-persistent` runtime
dependency are listed in the gemspec, and `require 'faraday/net_http_persistent'`
is gated on the option being enabled, so loading the gem with the option
off does not import the new dependency at runtime.

New RSpec coverage:

* connection reuse on the same thread
* per-node cache keying
* per-thread cache isolation
* per-instance cache isolation
* eviction on network error
* timeouts propagate to the cached connection
* the option defaults to false and the legacy per-request connection
  path is preserved
@erimicel erimicel force-pushed the keepalive-connections branch from eafb23d to 86908f7 Compare April 27, 2026 16:39
@erimicel
Copy link
Copy Markdown
Member Author

Replaced by upstream typesense#55.

@erimicel erimicel closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants