Fix data race on should_stop_ flag in LLM runner by kirklandsign · Pull Request #18652 · pytorch/executorch

kirklandsign · 2026-04-01T21:10:05Z

Summary

should_stop_ is written from the caller thread via stop() and read from the inference thread in the generate loop. A plain bool without synchronization is undefined behavior per the C++ standard and can cause the compiler to optimize away the cross-thread visibility on ARM targets.

Change bool to std::atomic with relaxed memory ordering, which is sufficient for a simple cancellation flag and has negligible overhead.

Test plan

CI

should_stop_ is written from the caller thread via stop() and read from the inference thread in the generate loop. A plain bool without synchronization is undefined behavior per the C++ standard and can cause the compiler to optimize away the cross-thread visibility on ARM targets. Change bool to std::atomic<bool> with relaxed memory ordering, which is sufficient for a simple cancellation flag and has negligible overhead.

pytorch-bot · 2026-04-01T21:10:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18652

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit d03f001 with merge base b24535b ():

NEW FAILURE - The following job has failed:

pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t 25457bda62a6b4f4a0cdd1357d3bc5524ac61d3c4b97684e0ba56d25fdf73a27 /exec failed with exit code 56

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / unittest / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-01T21:10:49Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Fixes a cross-thread cancellation data race in the LLM text generation path by making stop flags atomic, improving correctness on weakly-ordered architectures (e.g., ARM).

Changes:

Convert TextTokenGenerator::should_stop_ to std::atomic<bool> and use relaxed loads/stores in the generate loop and stop().
Convert TextDecoderRunner::should_stop_ to std::atomic<bool> and update stop() to use an atomic store.
Convert TextLLMRunner::shouldStop_ to std::atomic<bool> (though it currently appears unused).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
extension/llm/runner/text_token_generator.h	Makes the generation cancellation flag atomic with relaxed load/store usage.
extension/llm/runner/text_llm_runner.h	Changes runner-level stop flag to atomic (but currently not wired into behavior).
extension/llm/runner/text_decoder_runner.h	Changes decoder-level stop flag to atomic; stop() now does an atomic store.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T21:14:38Z

extension/llm/runner/text_llm_runner.h

 private:
-  bool shouldStop_{false};
+  std::atomic<bool> shouldStop_{false};



shouldStop_ was changed to std::atomic<bool>, but it’s not referenced anywhere else in TextLLMRunner (stop() delegates to text_token_generator_->stop() and generate() doesn’t check shouldStop_). This adds an unused state flag and <atomic> dependency without affecting cancellation behavior. Either remove shouldStop_ entirely, or wire it into stop() / generate() (with consistent memory ordering) so it actually controls generation cancellation at the runner level.

Copilot · 2026-04-01T21:14:38Z

extension/llm/runner/text_decoder_runner.h

-    should_stop_ = true;
+    should_stop_.store(true, std::memory_order_relaxed);
  }



TextDecoderRunner::stop() sets should_stop_, but should_stop_ is never read in TextDecoderRunner::step() (or elsewhere), so calling stop() currently has no effect on execution/cancellation. Either remove this stop flag/API if it’s not meant to be supported, or add a check (e.g., early-return with a cancelled/aborted error) in step() (and/or in the outer loop) that observes should_stop_.

Suggested change

inline bool is_stopped() const {

return should_stop_.load(std::memory_order_relaxed);

}

…nner These flags were written but never read — cancellation is handled entirely through TextTokenGenerator::should_stop_. Removing dead state and the unnecessary <atomic> dependency from these two classes.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T20:35:45Z

extension/llm/runner/text_token_generator.h

        token_data.data(), token_shape, executorch::aten::ScalarType::Long);

-    should_stop_ = false;
+    should_stop_.store(false, std::memory_order_relaxed);


should_stop_ is reset to false inside generate(). If stop() is called from another thread very early (e.g., immediately after generation starts), the subsequent store(false, ...) here can overwrite that stop request, making cancellation unreliable for that run. Consider resetting the flag before the operation becomes externally stoppable (or track cancellation via a generation id / use exchange with a protocol that can’t lose a concurrent stop request).

Suggested change

should_stop_.store(false, std::memory_order_relaxed);

// Clear any stale stop request from a previous run without losing a

// concurrent early stop for this run. If a stop was already requested,

// honor it immediately for this generation call.

if (should_stop_.exchange(false, std::memory_order_relaxed)) {

return 0;

}

Copilot · 2026-04-02T20:35:46Z

extension/llm/runner/text_token_generator.h

  inline void stop() {
-    should_stop_ = true;
+    should_stop_.store(true, std::memory_order_relaxed);
  }


There are existing unit tests for the runner/token generation path (e.g., test_text_llm_runner.cpp), but none appear to cover calling stop() concurrently with generate() to validate cancellation behavior and prevent regressions of this race fix. Adding a focused test (potentially in Python bindings where the GIL is released) would better exercise the cross-thread stop path.

Copilot · 2026-04-02T20:35:46Z

extension/llm/runner/text_decoder_runner.h

-    should_stop_ = true;
-  }
-
  /**


Removing TextDecoderRunner::stop() is an API-breaking change for a public header (it’s exported and also has a torch::executor alias below). If downstream code may be calling this, consider keeping the method (even as a deprecated no-op) or providing a migration path rather than deleting it outright.

Suggested change

/**

/**

* Deprecated compatibility shim for older callers. TextDecoderRunner no

* longer requires explicit stop behavior, so this method is now a no-op.

*/

[[deprecated(

"TextDecoderRunner::stop() is deprecated and is now a no-op; remove "

"calls to this method.")]] virtual void stop() {}

/**

mergennachin · 2026-04-03T14:11:12Z

@claude Review this PR

claude · 2026-04-03T14:11:58Z

Claude finished @mergennachin's task in 6m 57s —— View job

PR Review: Fix data race on should_stop_ flag in LLM runner

Read changed files and understand the diff
Check for correctness of atomic usage
Review removal of unused flags
Check CI failure
Post review feedback

Overall: This is a clean, well-scoped fix. The core change (commit 1) correctly addresses a real data race, and the cleanup (commit 2) removes dead code. LGTM with one minor note.

Commit 1: `std::atomic<bool>` with relaxed ordering

The fix in text_token_generator.h is correct:

should_stop_ is written from the caller thread via stop() and read from the inference thread in the generate() loop — a textbook data race on plain bool.
std::memory_order_relaxed is the right choice here. The flag is a simple cancellation signal with no ordering dependencies on other shared data. Relaxed atomics guarantee atomicity (no torn reads/writes) and eventual cross-core visibility, which is all that's needed for a stop flag.
TextTokenGenerator is always held via std::unique_ptr, so std::atomic<bool> making the class non-copyable/non-movable is not an issue.

Commit 2: Removing unused `should_stop_` flags

TextDecoderRunner::should_stop_ — was set by stop() but never read anywhere (not in step() or any other method). The flag had no effect on execution. Clean removal.
TextLLMRunner::shouldStop_ — was reset in generate() but never checked. TextLLMRunner::stop() already delegates to text_token_generator_->stop(), so this flag was entirely redundant. Clean removal.
No external callers of TextDecoderRunner::stop() exist in the codebase (all stop() calls go through the runner-level APIs which delegate to TextTokenGenerator::stop()), so removing it is safe.

Minor note

The ASCII diagram in extension/llm/runner/README.md:532 still shows should_stop_ as a member of TextDecoderRunner and MultimodalDecoderRunner. It would be worth updating the diagram to reflect that should_stop_ now lives solely in TextTokenGenerator. This is cosmetic and doesn't need to block the PR.

CI

The initially reported unittest-editable / linux failure (exit code 56) does not appear related to this change — CI is now fully green (13/13 passing, 1 cancelled).

mergennachin · 2026-04-03T14:20:09Z

@claude Explain me the reasoning of the race before this PR and how this actually solves it

claude · 2026-04-03T14:20:53Z

Claude finished @mergennachin's task in 5m 20s —— View job

Read the changed files and understand the before/after
Explain the race condition
Explain how the fix works

The race before this PR

The core issue is in TextTokenGenerator (text_token_generator.h). Two threads interact with a single bool should_stop_ field:

Thread A (inference thread) — runs generate(), which contains a tight loop:

should_stop_ = false;              // reset at start of generate()

while (pos < start_pos + max_new_tokens) {
    // ... run model step, sample token, decode ...

    if (should_stop_) {            // read should_stop_
        break;
    }
}

Thread B (caller/UI thread) — calls stop() at any time to cancel generation:

void stop() {
    should_stop_ = true;           // write should_stop_
}

There is no synchronization between these two threads. Under the C++ memory model ([intro.races]/21), when two threads access the same non-atomic variable and at least one access is a write, that's a data race, and the behavior is undefined — not just "might read a stale value" but fully UB, meaning the compiler and hardware are free to do anything.

In practice, the consequences are architecture-dependent:

Compiler optimization (any platform): Because should_stop_ is a plain bool with no synchronization, the compiler is within its rights to assume no other thread modifies it. It can hoist the read of should_stop_ out of the loop entirely, turning if (should_stop_) break; into a single check before the loop starts — after which the loop can never observe the stop request regardless of what thread B does. This is a well-known optimization that GCC and Clang actually perform at -O2 and above (the variable gets loaded into a register once and the register is re-checked each iteration, never re-reading from memory).
Store buffer visibility (ARM): ARM has a weakly-ordered memory model. A write on one core sits in that core's store buffer before being flushed to the cache coherence domain. Without a fence or atomic operation, there is no guarantee that thread A's core will ever see thread B's write in a timely manner — or at all, from the compiler's perspective, since it's UB.

How `std::atomic<bool>` fixes it

The fix changes the declaration from:

bool should_stop_ = false;

to:

std::atomic<bool> should_stop_{false};

And all accesses use explicit atomic operations:

// In generate() — reset:
should_stop_.store(false, std::memory_order_relaxed);

// In generate() — check:
if (should_stop_.load(std::memory_order_relaxed)) { break; }

// In stop() — signal:
should_stop_.store(true, std::memory_order_relaxed);

This fixes the race because:

No more UB. std::atomic operations are explicitly defined to be data-race-free. Concurrent loads and stores on the same atomic variable are well-defined behavior in the C++ standard.
Guaranteed atomicity. The compiler must emit load/store instructions that are indivisible (no torn reads). On ARM this maps to plain ldr/str of an aligned 32-bit word, which is inherently atomic at the hardware level.
No hoisting out of the loop. The compiler treats every atomic::load() as potentially returning a different value (even with relaxed ordering). It cannot cache the result in a register and skip re-reading. Every loop iteration re-executes the load from memory.
Why relaxed is sufficient. memory_order_relaxed provides only atomicity — no ordering guarantees relative to other memory operations. That's all we need here because should_stop_ is a standalone cancellation flag. There's no other shared data whose visibility needs to be ordered relative to seeing the stop flag. Stronger orderings (acquire/release, seq_cst) would add unnecessary fence instructions. On ARM, relaxed atomic loads/stores compile to plain ldr/str with no barriers — same instructions as the original code, but with the critical difference that the compiler now knows not to optimize away the cross-thread communication.

In short: the original code was UB that happened to work in unoptimized builds but could silently break under compiler optimizations or on weakly-ordered architectures. The fix makes the cross-thread signaling well-defined with zero runtime overhead.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 1, 2026

kirklandsign marked this pull request as ready for review April 1, 2026 21:11

kirklandsign requested review from larryliu0820 and mergennachin as code owners April 1, 2026 21:11

Copilot AI review requested due to automatic review settings April 1, 2026 21:11

Copilot started reviewing on behalf of kirklandsign April 1, 2026 21:11 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Remove unused should_stop_ flags from TextDecoderRunner and TextLLMRu…

d03f001

…nner These flags were written but never read — cancellation is handled entirely through TextTokenGenerator::should_stop_. Removing dead state and the unnecessary <atomic> dependency from these two classes.

lucylq approved these changes Apr 1, 2026

View reviewed changes

kirklandsign requested a review from Copilot April 2, 2026 20:31

Copilot started reviewing on behalf of kirklandsign April 2, 2026 20:32 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

mergennachin approved these changes Apr 3, 2026

View reviewed changes

+  inline bool is_stopped() const {
+    return should_stop_.load(std::memory_order_relaxed);
+  }

-    should_stop_.store(false, std::memory_order_relaxed);
+    // Clear any stale stop request from a previous run without losing a
+    // concurrent early stop for this run. If a stop was already requested,
+    // honor it immediately for this generation call.
+    if (should_stop_.exchange(false, std::memory_order_relaxed)) {
+      return 0;
+    }

-  /**
+  /**
+   * Deprecated compatibility shim for older callers. TextDecoderRunner no
+   * longer requires explicit stop behavior, so this method is now a no-op.
+   */
+  [[deprecated(
+      "TextDecoderRunner::stop() is deprecated and is now a no-op; remove "
+      "calls to this method.")]] virtual void stop() {}
+  /**

Conversation

kirklandsign commented Apr 1, 2026

Summary

Test plan

Uh oh!

pytorch-bot bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18652

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

github-actions bot commented Apr 1, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

mergennachin commented Apr 3, 2026

Uh oh!

claude bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Fix data race on should_stop_ flag in LLM runner

Commit 1: std::atomic<bool> with relaxed ordering

Commit 2: Removing unused should_stop_ flags

Minor note

CI

Uh oh!

mergennachin commented Apr 3, 2026

Uh oh!

claude bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The race before this PR

How std::atomic<bool> fixes it

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot bot commented Apr 1, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude bot commented Apr 3, 2026 •

edited

Loading

Commit 1: `std::atomic<bool>` with relaxed ordering

Commit 2: Removing unused `should_stop_` flags

claude bot commented Apr 3, 2026 •

edited

Loading

How `std::atomic<bool>` fixes it