Skip to content

Wait for consensus start before answering region requests#17546

Merged
CRZbulabula merged 4 commits intomasterfrom
check_consensus_before_answering_region_request
Apr 26, 2026
Merged

Wait for consensus start before answering region requests#17546
CRZbulabula merged 4 commits intomasterfrom
check_consensus_before_answering_region_request

Conversation

@jt2594838
Copy link
Copy Markdown
Contributor

Summary

  • Add a lightweight polling utility (Await / ConditionAwaiter) in node-commons that provides a fluent API for waiting until a condition becomes true, with configurable timeout and poll interval.
  • Introduce DataNodeContext inner class in DataNode to expose consensus initialization state (isAllConsensusStarted()), with volatile visibility on the underlying flags.
  • Guard all region management RPCs in DataNodeInternalRPCServiceImpl (create/delete region, change leader, add/remove/delete/reset peer, notify migration) with a waitForConsensusStarted() check that polls up to 30 seconds before rejecting with CONSENSUS_NOT_INITIALIZED.
  • This prevents race conditions where ConfigNode dispatches region requests to a DataNode whose consensus layer hasn't finished initializing after restart.

Test plan

  • Unit tests for Await / ConditionAwaiter covering: already-true condition, condition becomes true, timeout, poll delay, exception ignoring, untilAsserted, forever mode (AwaitTest)
  • Unit tests for consensus wait behavior: createSchemaRegion, createDataRegion, deleteRegion, changeRegionLeader all reject with CONSENSUS_NOT_INITIALIZED when consensus is not started (ConsensusWaitTest)
  • Existing DataNodeInternalRPCServiceImplTest updated to pass a mocked DataNodeContext with consensus started

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 70.45455% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.82%. Comparing base (acaabb8) to head (51b6674).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...ol/thrift/impl/DataNodeInternalRPCServiceImpl.java 55.31% 21 Missing ⚠️
...che/iotdb/commons/concurrent/ConditionAwaiter.java 86.56% 9 Missing ⚠️
...e/iotdb/db/service/DataNodeInternalRPCService.java 0.00% 5 Missing ⚠️
...ain/java/org/apache/iotdb/db/service/DataNode.java 50.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17546      +/-   ##
============================================
+ Coverage     39.80%   39.82%   +0.01%     
  Complexity      312      312              
============================================
  Files          5142     5147       +5     
  Lines        347882   348088     +206     
  Branches      44404    44430      +26     
============================================
+ Hits         138489   138618     +129     
- Misses       209393   209470      +77     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Overview

This PR adds a defensive guard to DataNode internal RPCs: before handling any region management request (create/delete region, change leader, add/remove/reset peer, notify migration), the handler polls until consensus is initialized or rejects after 30 seconds with CONSENSUS_NOT_INITIALIZED.

Three layers of change:

  1. Await / ConditionAwaiter — a custom lightweight polling utility in node-commons (Awaitility clone for production code, since Awaitility is test-scoped)
  2. DataNodeContext — inner class on DataNode exposing isAllConsensusStarted() via volatile flags
  3. RPC guardswaitForConsensusStarted() check inserted into 10 region management RPCs

The existing status code CONSENSUS_NOT_INITIALIZED (904) is already in the NEED_RETRY set in StatusUtils, so ConfigNode will automatically retry these requests — good.


Issues & Suggestions

1. Race condition: getImpl() can be called before setDataNodeContext() (Bug — High)

DataNodeInternalRPCService.getImpl() uses compareAndSet:

public DataNodeInternalRPCServiceImpl getImpl() {
    impl.compareAndSet(null, new DataNodeInternalRPCServiceImpl(dataNodeContext));
    return impl.get();
}

If getImpl() is called before setDataNodeContext(), the impl is created with dataNodeContext = null. The compareAndSet then prevents it from ever being recreated. A subsequent waitForConsensusStarted() call will NPE on dataNodeContext.isAllConsensusStarted().

This can happen: ClusterConfigTaskExecutor.java:1292 calls DataNodeInternalRPCService.getInstance().getImpl().clearCacheImpl(...) — if this runs before registerInternalRPCService() sets the context, the impl is permanently null-contexted.

Suggestion: Either:

  • Reset impl to null in setDataNodeContext() so it's recreated on next getImpl() call
  • Or add a null-check in waitForConsensusStarted() (treat null context as "not started")
  • Or inject DataNodeContext via getImpl() parameter instead of storing it on the service

2. Blocking RPC threads for 30 seconds with Thread.sleep (Performance — Medium)

waitForConsensusStarted() polls with Thread.sleep(100ms) for up to 30 seconds, blocking the Thrift handler thread. If ConfigNode dispatches multiple region requests simultaneously to a restarting DataNode, each one blocks a handler thread.

Suggestion: Replace polling with a CountDownLatch or java.util.concurrent.locks.Condition that is signaled when consensus starts. This would wake all waiting threads immediately instead of polling:

private final CountDownLatch consensusLatch = new CountDownLatch(1);
// In DataNodeContext: call consensusLatch.countDown() when ready
// In waitForConsensusStarted(): consensusLatch.await(30, TimeUnit.SECONDS)

This is also more efficient (no CPU wakeups every 100ms) and would eliminate the need for the custom Await utility in this use case.

3. DataNodeContext should be a static inner class or interface (Design)

DataNodeContext is a non-static inner class, so it holds an implicit reference to the DataNode instance. This works but:

  • Makes it impossible to instantiate without a DataNode instance (complicates testing — hence the need for Mockito)
  • Couples the RPC service to the DataNode class directly

Suggestion: Extract an interface (e.g., ConsensusStateProvider) or make it a static inner class that receives the volatile references explicitly:

public interface ConsensusStateProvider {
    boolean isAllConsensusStarted();
}

4. null return convention from waitForConsensusStarted() (Style — Minor)

Returning null to mean "success" and a non-null TSStatus to mean "failure" is error-prone — easy to forget the null check at call sites. Consider Optional<TSStatus> or a helper that throws (letting the caller handle the exception once).

5. consensusWaitTimeoutSeconds is mutable without synchronization (Minor)

private long consensusWaitTimeoutSeconds = 30;
// ...
public void setConsensusWaitTimeoutSeconds(long consensusWaitTimeoutSeconds) {
    this.consensusWaitTimeoutSeconds = consensusWaitTimeoutSeconds;
}

This non-final, non-volatile field is written from test threads and read from RPC handler threads. In practice the setter is only called before any RPC handling, so this is safe, but it should be either volatile or passed via constructor. Annotating with @VisibleForTesting would also clarify intent.

6. Await/ConditionAwaiter — well-built but consider justification (Design)

The custom utility is ~200 lines duplicating Awaitility's core API. Awaitility is already a managed dependency (4.2.0 in the parent POM), just test-scoped. Two options:

  • Promote Awaitility to compile scope (simplest, but adds a production dependency)
  • Keep the custom utility (current approach — no new dependency, but maintenance cost)

The current approach is reasonable, but worth a comment in Await.java explaining why Awaitility isn't used (test-scope only).

7. Missing test: consensus becomes available mid-wait (Test coverage)

ConsensusWaitTest covers:

  • Consensus not started → timeout rejection ✓

But doesn't test:

  • Consensus becomes available during the 30s wait → request should succeed

This is the primary happy-path scenario the feature is designed for. A test with a ScheduledExecutorService flipping the mock after a short delay would validate the polling behavior end-to-end.


Positives

  • Correct use of volatile on the consensus flags for cross-thread visibility
  • CONSENSUS_NOT_INITIALIZED is already in NEED_RETRY — seamless ConfigNode retry
  • Comprehensive unit tests for the Await utility (8 test cases covering timeout, delay, exception handling, untilAsserted, forever)
  • Clean separation of concerns (utility / context / guard)
  • Existing DataNodeInternalRPCServiceImplTest correctly updated to pass mocked context

Summary

The defensive guard is a sound idea for protecting against startup race conditions. The main concerns are:

  1. Bug: getImpl() race can create the impl with null context → NPE (must fix)
  2. Performance: Polling with Thread.sleep blocks handler threads — consider CountDownLatch
  3. Test gap: No test for the primary scenario (consensus becomes ready mid-wait)
  4. Design: DataNodeContext as non-static inner class is awkward for testing — extract interface

Copy link
Copy Markdown
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@CRZbulabula CRZbulabula merged commit 859c719 into master Apr 26, 2026
30 of 31 checks passed
@CRZbulabula CRZbulabula deleted the check_consensus_before_answering_region_request branch April 26, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants