Type-Erased Dispatch Silently Drops Coroutines, Causing Deadlock
Executive Summary
There is a fundamental API incompatibility between executor_ref::dispatch() and strand::dispatch() that causes deadlocks when using strand with when_all or any code path that dispatches through a type-erased executor_ref while running inside the strand's execution context.
The bug: executor_ref::dispatch() returns void and discards the return value from the underlying executor. When that executor is a strand, the discarded return value is a coroutine handle that was supposed to be resumed via symmetric transfer. The coroutine is never resumed, causing a deadlock.
Impact: Any code running on a strand that uses when_all, or dispatches work through executor_ref, will deadlock.
Recommended fix: Change executor_ref::dispatch() to return coro instead of void, preserving the symmetric transfer return value.
Reproduction
#include <boost/capy.hpp>
#include <boost/capy/ex/strand.hpp>
#include <iostream>
#include <latch>
using namespace boost::capy;
int main()
{
thread_pool pool;
strand s{pool.get_executor()};
std::latch done(1);
auto on_complete = [&done](auto&&...) { done.count_down(); };
auto on_error = [&done](std::exception_ptr) { done.count_down(); };
auto task_a = []() -> task<> {
std::cout << "Task A running!\n";
co_return;
};
auto task_b = []() -> task<> {
std::cout << "Task B running!\n";
co_return;
};
auto run_both = [&]() -> task<> {
std::cout << "Before when_all\n";
co_await when_all(task_a(), task_b()); // HANGS HERE
std::cout << "After when_all\n";
};
run_async(s, on_complete, on_error)(run_both());
done.wait(); // Never completes
return 0;
}
Output:
(Program hangs indefinitely)
Note: A simple task without when_all works correctly:
run_async(s, on_complete, on_error)(task_a()); // Works fine
Background
What is Capy?
Boost.Capy is a C++20 coroutine library providing:
task<T>: A lazy coroutine type (doesn't start until awaited)
- Executors: Objects that schedule and run coroutines (
thread_pool, strand, io_context)
- Concurrency primitives:
when_all for parallel execution, async_event for signaling
- Type-erased wrappers:
executor_ref and any_executor for runtime polymorphism
Boost.Corosio is a companion library providing I/O primitives (sockets, timers) that integrate with Capy.
Key Concepts
Executors
An executor is an object that can schedule coroutines for execution. In Capy, executors provide two key methods:
void post(coro h); // Queue coroutine for later execution
coro dispatch(coro h); // Execute now if possible, else queue
The dispatch() method is an optimization: if the caller is already running on this executor's thread, it can resume the coroutine immediately instead of queuing it.
Type Erasure with executor_ref
executor_ref is a lightweight, non-owning wrapper that can hold any executor type. It uses a vtable (virtual function table) for runtime polymorphism without inheritance:
// Can wrap any executor type
void schedule_work(executor_ref ex) {
ex.dispatch(some_coroutine); // Works with any executor
}
thread_pool pool;
strand s{pool.get_executor()};
schedule_work(pool.get_executor()); // Works
schedule_work(s); // Works (but has the bug!)
Symmetric Transfer
Symmetric transfer is a C++20 coroutine optimization that avoids stack growth when switching between coroutines. Instead of one coroutine calling another (which adds a stack frame), coroutines "transfer" control directly via std::coroutine_handle.
// WITHOUT symmetric transfer (stack grows):
coro await_suspend(coro h) {
next_coroutine.resume(); // Adds stack frame
return std::noop_coroutine();
}
// WITH symmetric transfer (stack stays flat):
coro await_suspend(coro h) {
return next_coroutine; // Caller resumes this handle directly
}
The returned handle tells the coroutine machinery which coroutine to resume next. Returning std::noop_coroutine() means "I've handled it, don't resume anything."
Strand
A strand serializes execution: coroutines dispatched through a strand never run concurrently, even on a multi-threaded executor. This is useful for protecting shared state without explicit locking.
thread_pool pool;
strand s{pool.get_executor()};
// These will never run simultaneously, even though pool has multiple threads
run_async(s)(task_a());
run_async(s)(task_b());
Root Cause Analysis
The Two Dispatch APIs
The bug stems from a mismatch between how strand and executor_ref define their dispatch() methods.
strand::dispatch() — Returns coro for Symmetric Transfer
// strand.hpp
coro dispatch(coro h) const
{
return detail::strand_service::dispatch(*impl_, executor_ref(ex_), h);
}
// strand_service.cpp
coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
// Optimization: if we're already running in this strand,
// return the handle for immediate symmetric transfer
if (running_in_this_thread(impl))
return h; // ← Caller is expected to resume this!
// Otherwise, queue the coroutine and start the invoker
if (strand_service_impl::enqueue(impl, h))
ex.post(strand_service_impl::make_invoker(impl).h_);
return std::noop_coroutine(); // Caller does nothing
}
When dispatch() is called from within the strand (i.e., running_in_this_thread() is true), it returns the coroutine handle h directly. The caller is expected to resume this handle via symmetric transfer.
executor_ref::dispatch() — Returns void, Ignores Return Value
// executor_ref.hpp
void dispatch(coro h) const
{
vt_->dispatch(ex_, h); // Calls strand::dispatch(), IGNORES return value!
}
// The vtable entry for dispatch:
static constexpr executor_vtable vtable_for = {
// ...
// dispatch lambda - note it returns void
[](void const* p, std::coroutine_handle<> h) {
static_cast<Ex const*>(p)->dispatch(h); // Return value discarded!
},
// ...
};
The type-erased executor_ref calls the underlying executor's dispatch() but discards the return value. When wrapping a strand, this means the handle returned for symmetric transfer is lost.
Why thread_pool Works
thread_pool::executor_type::dispatch() always queues work and returns noop_coroutine():
// thread_pool.hpp
coro dispatch(coro h) const
{
post(h); // Always queue, never inline
return std::noop_coroutine(); // "I handled it, nothing for caller to do"
}
Since it always returns noop_coroutine(), ignoring the return value is harmless.
Why strand Fails
When strand::dispatch() is called from within the strand's invoker thread:
running_in_this_thread() returns true (we're inside the strand)
strand::dispatch() returns h directly (expecting the caller to resume it)
executor_ref::dispatch() ignores this return value
- The coroutine handle
h is never resumed
- Deadlock: the coroutine waits forever
Detailed Execution Trace
1. run_async(strand, ...) is called with run_both() task
2. strand::dispatch() is called from main thread (NOT in strand)
└─ running_in_this_thread() == false
└─ Coroutine is enqueued
└─ Strand invoker is posted to thread_pool
└─ Returns noop_coroutine() ✓
3. Thread pool worker picks up strand invoker
4. Invoker sets dispatch_thread_ = current thread ID
5. Invoker dispatches pending coroutines (including run_both)
6. run_both() starts executing
7. run_both() calls: co_await when_all(task_a(), task_b())
8. when_all creates runner coroutines for task_a and task_b
9. when_all calls: executor_ref::dispatch(runner_0)
└─ executor_ref wraps the strand
└─ Calls strand::dispatch(runner_0)
└─ running_in_this_thread() == TRUE (we're in the invoker!)
└─ strand::dispatch() returns runner_0 handle
└─ executor_ref::dispatch() IGNORES this return value ✗
└─ runner_0 is NEVER resumed!
10. Same happens for runner_1
11. Neither runner executes
└─ when_all's completion counter never reaches zero
└─ when_all waits forever
└─ DEADLOCK
Affected Code Paths
Any code that:
- Runs on a strand, AND
- Dispatches work through
executor_ref while inside that strand's context
This includes:
when_all launching child tasks (uses executor_ref::dispatch)
io_awaitable_support::complete() dispatching continuations
- Any user code calling
executor_ref::dispatch() from within a strand
Potential Solutions
Option 1: Change executor_ref::dispatch to Return coro (Recommended)
Change:
// executor_ref.hpp - BEFORE
void dispatch(coro h) const
{
vt_->dispatch(ex_, h);
}
// executor_ref.hpp - AFTER
coro dispatch(coro h) const
{
return vt_->dispatch(ex_, h);
}
// vtable - BEFORE
void (*dispatch)(void const*, std::coroutine_handle<>);
// vtable - AFTER
coro (*dispatch)(void const*, std::coroutine_handle<>);
// vtable lambda - AFTER
[](void const* p, std::coroutine_handle<> h) -> coro {
return static_cast<Ex const*>(p)->dispatch(h);
},
Analysis:
- Correct semantic: dispatch can return a handle for symmetric transfer
- Callers of
executor_ref::dispatch() must handle the return value
- Aligns with how concrete executor types (
strand, thread_pool) already work
- Preserves symmetric transfer optimization
Option 2: Change strand::dispatch to Never Rely on Symmetric Transfer
Change strand to always enqueue, even when in-thread:
coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
// Remove the running_in_this_thread optimization entirely
if (strand_service_impl::enqueue(impl, h))
ex.post(strand_service_impl::make_invoker(impl).h_);
return std::noop_coroutine();
}
Analysis:
- Simple fix
- Performance regression: loses inline execution when already in strand
- Every dispatch from within a strand now goes through the queue
- Defeats the purpose of the
running_in_this_thread optimization
Option 3: Strand Resumes Inline Without Symmetric Transfer
Change strand to call resume() directly instead of returning the handle:
coro strand_service::dispatch(strand_impl& impl, executor_ref ex, coro h)
{
if (running_in_this_thread(impl))
{
h.resume(); // Resume immediately, don't use symmetric transfer
return std::noop_coroutine();
}
if (strand_service_impl::enqueue(impl, h))
ex.post(strand_service_impl::make_invoker(impl).h_);
return std::noop_coroutine();
}
Analysis:
- Preserves inline execution optimization
- Stack depth increases: each nested dispatch adds a stack frame
- Risk of stack overflow with deeply nested coroutine chains
- This is how
io_context currently works (see table below)
Option 4: Vtable Dispatch Resumes Returned Handle Internally
Change vtable to handle symmetric transfer transparently:
// vtable dispatch wrapper
[](void const* p, std::coroutine_handle<> h) {
auto result = static_cast<Ex const*>(p)->dispatch(h);
if (result && result != std::noop_coroutine())
result.resume(); // Transparently handle symmetric transfer
},
Analysis:
- No change to
executor_ref public API
- Hidden behavior makes debugging harder
- Performance overhead: checks return value on every dispatch
- Stack depth issues (same as Option 3)
Recommendation
Option 1 (change executor_ref::dispatch to return coro) is the most correct solution. The current void return type is fundamentally incompatible with executors that support symmetric transfer.
This aligns executor_ref with how concrete executor types already define their dispatch() methods—both strand and thread_pool return coro.
Option 3 (resume inline) could be considered if there's a strong reason to keep executor_ref::dispatch() returning void, but it sacrifices symmetric transfer's stack efficiency.
Executor Types in Codebase
| Executor |
dispatch() Returns |
In-Thread Behavior |
Works with executor_ref? |
thread_pool::executor_type |
coro |
Returns noop_coroutine() (always queues) |
Yes |
strand<Ex> |
coro |
Returns h for symmetric transfer |
No (BUG) |
basic_io_context::executor_type |
void |
Calls h.resume() directly |
Yes |
test::run_blocking::executor_type |
void |
Calls h.resume() directly |
Yes |
mock_executor (test helper) |
void |
Calls h.resume() directly |
Yes |
executor_ref |
void |
Calls wrapped dispatch, ignores return |
N/A (is the wrapper) |
any_executor |
void |
Calls wrapped dispatch, ignores return |
N/A (is a wrapper) |
Observations
-
Inconsistent return types: Some executors return coro (for symmetric transfer), others return void (handle inline execution internally by calling resume()).
-
strand is unique: It's the only executor that returns a non-noop handle from dispatch() for symmetric transfer optimization.
-
io_context avoids the issue: basic_io_context::executor_type::dispatch() returns void and handles inline execution internally via h.resume(). This works with executor_ref but loses symmetric transfer benefits.
-
any_executor has the same bug: Like executor_ref, it also uses a vtable that ignores the return value.
Design Question
Should all executors in Capy:
- (A) Return
coro from dispatch() to support symmetric transfer? (Requires fixing executor_ref and any_executor)
- (B) Return
void and handle inline execution internally via h.resume()? (Requires changing strand and thread_pool)
Option A preserves symmetric transfer's stack efficiency. Option B is simpler but loses that optimization.
Related Files
| File |
Description |
include/boost/capy/ex/executor_ref.hpp |
Type-erased non-owning executor wrapper (has the bug) |
include/boost/capy/ex/any_executor.hpp |
Type-erased owning executor wrapper (has the same bug) |
include/boost/capy/ex/strand.hpp |
Strand executor adaptor |
src/ex/detail/strand_service.cpp |
Strand dispatch implementation |
include/boost/capy/ex/thread_pool.hpp |
Thread pool executor |
include/boost/capy/when_all.hpp |
Uses executor_ref::dispatch for child tasks |
include/boost/capy/ex/io_awaitable_support.hpp |
Uses executor_ref::dispatch in complete() |
include/boost/corosio/basic_io_context.hpp |
I/O context executor (returns void, calls resume internally) |
Test Case
After fixing, this should work:
#include <boost/capy.hpp>
#include <boost/capy/ex/strand.hpp>
#include <latch>
using namespace boost::capy;
int main()
{
thread_pool pool;
strand s{pool.get_executor()};
auto outer = [&]() -> task<> {
co_await when_all(
[]() -> task<> { co_return; }(),
[]() -> task<> { co_return; }()
);
};
std::latch done(1);
run_async(s,
[&](auto&&...) { done.count_down(); }, // on_complete
[&](auto) { done.count_down(); } // on_error
)(outer());
done.wait(); // Should complete, not hang
return 0;
}
Glossary
| Term |
Definition |
| coro |
Alias for std::coroutine_handle<> — a type-erased handle to any coroutine |
| Symmetric transfer |
C++20 optimization where await_suspend returns a coroutine handle for the runtime to resume, avoiding stack growth |
| noop_coroutine |
A special coroutine handle that does nothing when resumed; returned to indicate "no transfer needed" |
| Strand |
An executor wrapper that serializes execution — work dispatched through it never runs concurrently |
| Type erasure |
A technique for runtime polymorphism without inheritance, typically using function pointers or vtables |
| vtable |
Virtual function table — a struct of function pointers used for type erasure |
| executor_ref |
A non-owning type-erased wrapper for any Capy executor |
| when_all |
A primitive that runs multiple tasks concurrently and waits for all to complete |
Type-Erased Dispatch Silently Drops Coroutines, Causing Deadlock
Executive Summary
There is a fundamental API incompatibility between
executor_ref::dispatch()andstrand::dispatch()that causes deadlocks when usingstrandwithwhen_allor any code path that dispatches through a type-erasedexecutor_refwhile running inside the strand's execution context.The bug:
executor_ref::dispatch()returnsvoidand discards the return value from the underlying executor. When that executor is astrand, the discarded return value is a coroutine handle that was supposed to be resumed via symmetric transfer. The coroutine is never resumed, causing a deadlock.Impact: Any code running on a
strandthat useswhen_all, or dispatches work throughexecutor_ref, will deadlock.Recommended fix: Change
executor_ref::dispatch()to returncoroinstead ofvoid, preserving the symmetric transfer return value.Reproduction
Output:
(Program hangs indefinitely)
Note: A simple task without
when_allworks correctly:Background
What is Capy?
Boost.Capy is a C++20 coroutine library providing:
task<T>: A lazy coroutine type (doesn't start until awaited)thread_pool,strand,io_context)when_allfor parallel execution,async_eventfor signalingexecutor_refandany_executorfor runtime polymorphismBoost.Corosio is a companion library providing I/O primitives (sockets, timers) that integrate with Capy.
Key Concepts
Executors
An executor is an object that can schedule coroutines for execution. In Capy, executors provide two key methods:
The
dispatch()method is an optimization: if the caller is already running on this executor's thread, it can resume the coroutine immediately instead of queuing it.Type Erasure with executor_ref
executor_refis a lightweight, non-owning wrapper that can hold any executor type. It uses a vtable (virtual function table) for runtime polymorphism without inheritance:Symmetric Transfer
Symmetric transfer is a C++20 coroutine optimization that avoids stack growth when switching between coroutines. Instead of one coroutine calling another (which adds a stack frame), coroutines "transfer" control directly via
std::coroutine_handle.The returned handle tells the coroutine machinery which coroutine to resume next. Returning
std::noop_coroutine()means "I've handled it, don't resume anything."Strand
A strand serializes execution: coroutines dispatched through a strand never run concurrently, even on a multi-threaded executor. This is useful for protecting shared state without explicit locking.
thread_pool pool; strand s{pool.get_executor()}; // These will never run simultaneously, even though pool has multiple threads run_async(s)(task_a()); run_async(s)(task_b());Root Cause Analysis
The Two Dispatch APIs
The bug stems from a mismatch between how
strandandexecutor_refdefine theirdispatch()methods.strand::dispatch() — Returns coro for Symmetric Transfer
When
dispatch()is called from within the strand (i.e.,running_in_this_thread()is true), it returns the coroutine handlehdirectly. The caller is expected to resume this handle via symmetric transfer.executor_ref::dispatch() — Returns void, Ignores Return Value
The type-erased
executor_refcalls the underlying executor'sdispatch()but discards the return value. When wrapping astrand, this means the handle returned for symmetric transfer is lost.Why thread_pool Works
thread_pool::executor_type::dispatch()always queues work and returnsnoop_coroutine():Since it always returns
noop_coroutine(), ignoring the return value is harmless.Why strand Fails
When
strand::dispatch()is called from within the strand's invoker thread:running_in_this_thread()returnstrue(we're inside the strand)strand::dispatch()returnshdirectly (expecting the caller to resume it)executor_ref::dispatch()ignores this return valuehis never resumedDetailed Execution Trace
Affected Code Paths
Any code that:
executor_refwhile inside that strand's contextThis includes:
when_alllaunching child tasks (usesexecutor_ref::dispatch)io_awaitable_support::complete()dispatching continuationsexecutor_ref::dispatch()from within a strandPotential Solutions
Option 1: Change executor_ref::dispatch to Return coro (Recommended)
Change:
Analysis:
executor_ref::dispatch()must handle the return valuestrand,thread_pool) already workOption 2: Change strand::dispatch to Never Rely on Symmetric Transfer
Change strand to always enqueue, even when in-thread:
Analysis:
running_in_this_threadoptimizationOption 3: Strand Resumes Inline Without Symmetric Transfer
Change strand to call
resume()directly instead of returning the handle:Analysis:
io_contextcurrently works (see table below)Option 4: Vtable Dispatch Resumes Returned Handle Internally
Change vtable to handle symmetric transfer transparently:
Analysis:
executor_refpublic APIRecommendation
Option 1 (change executor_ref::dispatch to return coro) is the most correct solution. The current
voidreturn type is fundamentally incompatible with executors that support symmetric transfer.This aligns
executor_refwith how concrete executor types already define theirdispatch()methods—bothstrandandthread_poolreturncoro.Option 3 (resume inline) could be considered if there's a strong reason to keep
executor_ref::dispatch()returningvoid, but it sacrifices symmetric transfer's stack efficiency.Executor Types in Codebase
thread_pool::executor_typecoronoop_coroutine()(always queues)strand<Ex>corohfor symmetric transferbasic_io_context::executor_typevoidh.resume()directlytest::run_blocking::executor_typevoidh.resume()directlymock_executor(test helper)voidh.resume()directlyexecutor_refvoidany_executorvoidObservations
Inconsistent return types: Some executors return
coro(for symmetric transfer), others returnvoid(handle inline execution internally by callingresume()).strandis unique: It's the only executor that returns a non-noop handle fromdispatch()for symmetric transfer optimization.io_contextavoids the issue:basic_io_context::executor_type::dispatch()returnsvoidand handles inline execution internally viah.resume(). This works withexecutor_refbut loses symmetric transfer benefits.any_executorhas the same bug: Likeexecutor_ref, it also uses a vtable that ignores the return value.Design Question
Should all executors in Capy:
corofromdispatch()to support symmetric transfer? (Requires fixingexecutor_refandany_executor)voidand handle inline execution internally viah.resume()? (Requires changingstrandandthread_pool)Option A preserves symmetric transfer's stack efficiency. Option B is simpler but loses that optimization.
Related Files
include/boost/capy/ex/executor_ref.hppinclude/boost/capy/ex/any_executor.hppinclude/boost/capy/ex/strand.hppsrc/ex/detail/strand_service.cppinclude/boost/capy/ex/thread_pool.hppinclude/boost/capy/when_all.hppexecutor_ref::dispatchfor child tasksinclude/boost/capy/ex/io_awaitable_support.hppexecutor_ref::dispatchincomplete()include/boost/corosio/basic_io_context.hppTest Case
After fixing, this should work:
Glossary
std::coroutine_handle<>— a type-erased handle to any coroutineawait_suspendreturns a coroutine handle for the runtime to resume, avoiding stack growth