fix(api_call): protect node rotation and health state with a mutex#59
Open
erimicel wants to merge 1 commit intotypesense:masterfrom
Open
fix(api_call): protect node rotation and health state with a mutex#59erimicel wants to merge 1 commit intotypesense:masterfrom
erimicel wants to merge 1 commit intotypesense:masterfrom
Conversation
`@current_node_index` and per-node `is_healthy`/`last_access_timestamp` were unsynchronised. With multiple threads sharing one client (Puma, Sidekiq), two threads in `next_node` could compute the same incremented index from the same starting value (skipping a node), and a healthcheck write could be observed mid-update with a stale timestamp. Wraps the round-robin loop in `next_node` and the writes in `set_node_healthcheck` in a single per-instance Mutex. The mutex is acquired briefly (no I/O inside it), so contention is negligible. Adds regression specs covering concurrent multi-node rotation and concurrent single-node health updates.
Contributor
|
Test coverage looks good for the regression intent, though the exact even distribution is proving serialized round-robin behavior more than general thread safety. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Typesense::ApiCallkeeps its rotation cursor (@current_node_index) and per-node health metadata (node[:is_healthy],node[:last_access_timestamp]) as plain unsynchronised instance state. When several threads share a singleTypesense::Client— the standard pattern in Puma, Sidekiq, etc. — two concurrent calls tonext_nodecan compute the same incremented index from the same starting value, causing the rotation to skip a node. A failing request can also flipis_healthytofalsewhile another thread is mid-way through reading the pair, observing a stalelast_access_timestampwith the new health flag.The race is small in MRI thanks to the GVL, but it is real and grows under load. With keep-alive added recently, more state flows through the same path, so it's worth tightening up.
Fix
Add a single per-instance
Mutexand use it to:next_node(only the cursor advance + the healthy/due check) so only one thread mutates@current_node_indexat a time;is_healthy+last_access_timestampwrites inset_node_healthcheck.The lock is acquired briefly — no I/O inside it — so contention is negligible.
Single-node deployments
The fix is safe and beneficial for single-node setups too: the rotation race is degenerate there (
(idx + 1) % 1 == 0), but the health-state pair still benefits from atomic writes when a request fails and setsis_healthy: false.Tests
Adds two regression specs in
spec/typesense/api_call_spec.rb:is_healthyand callingnext_node; asserts the final state is internally consistent.bundle exec rspec spec/typesense/api_call_spec.rb— 44 examples, 0 failures.bundle exec rubocop lib/typesense/api_call.rb spec/typesense/api_call_spec.rb— clean.PR Checklist