Skip to content

Fix InProcessTestHost ContinueAsNew stuck-instance race condition#707

Merged
bachuv merged 4 commits intomainfrom
vabachu/currentworkitems-fix
Apr 22, 2026
Merged

Fix InProcessTestHost ContinueAsNew stuck-instance race condition#707
bachuv merged 4 commits intomainfrom
vabachu/currentworkitems-fix

Conversation

@bachuv
Copy link
Copy Markdown
Contributor

@bachuv bachuv commented Apr 21, 2026

Summary

What changed?

This pull request focuses on improving concurrency safety and correctness in the in-process test host, particularly around work item tracking and message handling during orchestration restarts. The most important changes are grouped below.

Concurrency Safety Improvements:

  • Replaced direct access to the currentWorkItems field with thread-safe operations using Interlocked.Increment, Interlocked.Decrement, and Volatile.Read in WorkItemDispatcher.cs, ensuring accurate and race-free tracking of concurrent work items.

Orchestration Restart and Message Handling Fixes:

  • In TaskOrchestrationDispatcher.cs, added logic to clear accumulated activity, timer, and orchestrator messages when a ContinueAsNew occurs, preventing stale messages from being re-enqueued and causing duplicate or stuck orchestrations.
  • In InMemoryOrchestrationService.cs, added logic to drop messages from previous executions after a restart, preventing stale activity completions or timer fires from interfering with new orchestration runs.

Version Update:

  • Bumped the package version in InProcessTestHost.csproj from 0.2.2-preview.1 to 0.2.3-preview.1 to reflect these changes.

Project checklist

  • Release notes are not required for the next release
    • Otherwise: Notes added to release_notes.md
  • Backport is not required
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • All required tests have been added/updated (unit tests, E2E tests)
  • Breaking change?
    • If yes:
      • Impact:
      • Migration guidance:

AI-assisted code disclosure (required)

Was an AI tool used? (select one)

  • No
  • Yes, AI helped write parts of this PR (e.g., GitHub Copilot)
  • Yes, an AI agent generated most of this PR

If AI was used:

  • Tool(s): VS Code GitHub CoPilot (Claude Opus 4.6)
  • AI-assisted areas/files:
  • What you changed after AI output:

AI verification (required if AI was used):

  • I understand the code and can explain it
  • I verified referenced APIs/types exist and are correct
  • I reviewed edge cases/failure paths (timeouts, retries, cancellation, exceptions)
  • I reviewed concurrency/async behavior
  • I checked for unintended breaking or behavior changes

Testing

Automated tests

  • Result: Passed
  • 19/19 InProcessTestHost tests pass (15 existing + 4 new)
  • New tests in ContinueAsNewRaceConditionTests:
    • ConcurrentContinueAsNew_MultipleIterations_NoneGetStuck — 6 concurrent instances, 5 ContinueAsNew iterations each with activity + 500ms timer
    • SingleInstance_ManyContinueAsNewIterations_CompletesCorrectly — 8 ContinueAsNew iterations with activity + timer per iteration
    • ContinueAsNew_MultipleActivitiesPerIteration_AllComplete — 4 concurrent instances with 3 parallel activities per iteration across 4 ContinueAsNew cycles
    • ReproScenario_RepeatedRounds_AllComplete — exact bug report scenario (4 instances, activity + 5s timer + ContinueAsNew) repeated across 3 fresh hosts

Copilot AI review requested due to automatic review settings April 21, 2026 18:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves correctness in the in-process sidecar used by Microsoft.DurableTask.InProcessTestHost, targeting a stuck-instance race condition triggered by ContinueAsNew under concurrency (especially when stale activity/timer messages from a prior execution interleave with a restarted execution).

Changes:

  • Adds stress/regression tests that reproduce and validate the ContinueAsNew stuck-instance scenario under concurrency.
  • Improves work-item concurrency tracking in the sidecar dispatcher using Interlocked and Volatile.
  • Prevents stale message re-enqueue during ContinueAsNew by clearing accumulated message lists and dropping messages from prior executions.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/InProcessTestHost.Tests/ContinueAsNewTests.cs Adds new regression/stress tests targeting the ContinueAsNew stale-message race.
src/InProcessTestHost/Sidecar/InMemoryOrchestrationService.cs Drops messages for completed instances and attempts to drop messages from prior executions after restart.
src/InProcessTestHost/Sidecar/Dispatcher/WorkItemDispatcher.cs Makes active work-item tracking thread-safe via Interlocked/Volatile.
src/InProcessTestHost/Sidecar/Dispatcher/TaskOrchestrationDispatcher.cs Clears accumulated activity/timer/orchestrator message buffers when ContinueAsNew occurs.
src/InProcessTestHost/InProcessTestHost.csproj Bumps package version to 0.2.3-preview.1.

Comment thread src/InProcessTestHost/Sidecar/InMemoryOrchestrationService.cs Outdated
Comment thread src/InProcessTestHost/Sidecar/Dispatcher/WorkItemDispatcher.cs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants