binder: Avoid potential deadlock when canceling AsyncSecurityPolicy futures by HyunsangHan · Pull Request #12283 · grpc/grpc-java

HyunsangHan · 2025-08-16T16:08:52Z

Move future cancellation outside of synchronized block in BinderClientTransport.notifyTerminated() to prevent deadlock if AsyncSecurityPolicy uses directExecutor() for callbacks.

Fixes #12190

…utures Move future cancellation outside of synchronized block in BinderClientTransport.notifyTerminated() to prevent deadlock if AsyncSecurityPolicy uses directExecutor() for callbacks. Fixes grpc#12190

linux-foundation-easycla · 2025-08-16T16:08:58Z

The committers listed above are authorized under a signed CLA.

✅ login: HyunSangHan / name: HYUNSANG HAN (c54cfff, 9111970, 1c3a0ea, 4d7ed66, 0af3f85, fc77da4, 7138614)

jdcormie · 2025-08-18T16:54:10Z

Help me understand the change here? All those cancel() calls still appear to come from inside the @GuardedBy("this") method notifyTerminated() method ...

…o fix-binder-deadlock-12190 Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

…ted() Move future cancellation to offloadExecutor to avoid deadlock when AsyncSecurityPolicy uses directExecutor() for callbacks. Fixes grpc#12190 Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

HyunsangHan · 2025-08-24T14:23:58Z

Help me understand the change here? All those cancel() calls still appear to come from inside the @GuardedBy("this") method notifyTerminated() method ...

OMG! Sorry. I realized that I missed committing the actual fix!
I've just pushed the missing commit with the proper solution.

@jdcormie Could you please check the latest commit?

Extract future cancellation logic into cancelAsync method and only cancel futures that are not already done for better performance. Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

Rename cancelAsync to cancelAsyncIfNeeded, move future cancellation next to readyTimeoutFuture, and remove unnecessary null assignments. Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

HyunsangHan · 2025-09-02T22:39:32Z

@jdcormie
I’ve addressed the review comments. :)

jdcormie · 2025-09-03T08:35:42Z

Woke up this morning with a small new concern: Would this PR cause us to declare the Channel terminated before all work we've enqueued on the offload Executor is complete (or cancelled) ? Take a look at how releaseExecutors() is called right after notifyTerminated() (the site of your changes) returns. Would this need to move into the shutdown path instead?

HyunsangHan · 2025-09-08T01:04:02Z

Woke up this morning with a small new concern: Would this PR cause us to declare the Channel terminated before all work we've enqueued on the offload Executor is complete (or cancelled) ? Take a look at how releaseExecutors() is called right after notifyTerminated() (the site of your changes) returns. Would this need to move into the shutdown path instead?

I agree with your concern. That's a very good point.
Since cancellation is enqueued on a separate thread, there's no guarantee that previously enqueued tasks have completed by the time we declare the Channel as terminated.
To address this fundamentally, notifyTerminated should ideally only be called once those tasks have finished.

That said, while thinking about ways to improve the code, two questions came up 🤔 :

Because cancellation itself is enqueued onto the executor and then handled asynchronously, even if we enqueue it earlier there's still no guarantee that the cancel operation will finish before notifyTerminated is invoked. The probability might be higher, but the guarantee isn't there.
Looking at releaseExecutors more closely, it does attempt to gracefully release resources once queued tasks complete. However, the method itself seems to return immediately without waiting for that completion. If that's correct, then simply moving releaseExecutors before notifyTerminated wouldn't necessarily solve the problem either.

Could you elaborate a bit more on what you meant by "move into the shutdown path instead"? I want to make sure I understand your idea fully.

jdcormie · 2025-09-29T23:51:59Z

Regarding (1), You're right that enqueuing it earlier, in shutdown() say, doesn't solve anything. My suggestion here wasn't a good one.

Regarding (2), You're right that releaseExecutors really has nothing to do with this. We do promise not to put any new work on an Executor after releasing it, but you never proposed doing this so I don't know why I mentioned it.

I do still think we should cancel any internally scheduled Runnables and any in-flight work behind our calls to SecurityPolicy.checkAuthorization() before declaring termination. Otherwise, the application might be surprised when calling shutdownNow() on its own Executor returns one of our Runnable that it doesn't know what to do with. And if that Runnable is your proposed call to ListenableFuture#cancel which never gets run(), some AsyncSecurityPolicy resource might never get cleaned up.

The simplest way to achieve cancel() before notifyTerminate() is just to do one before the other in the same thread like today. We know calling cancel while holding a lock is problematic but putting this work on an executor is hard too because now we've got to somehow wait for it to finish, which is kind of the same problem we started with. Instead, what if we collected outstanding futures with the lock, but called cancel after releasing it? This would be similar to how shutdownInternal cleans up ongoingCalls when forceTerminate is true. Have a look at jdcormie@d0bcf4b and LMK what you think.

HyunsangHan · 2025-10-01T00:23:38Z

@jdcormie
Thanks for sharing your opinion!

I’ve reviewed your commit and I agree this "collect with lock, cancel without lock" pattern is a much cleaner solution. It avoids both the race condition and the deadlock scenario while staying consistent with the existing cleanup logic.

The only trade-off I see is that the logic might feel a bit less cohesive, but overall I think it’s the right direction :)

jdcormie · 2025-10-01T03:22:24Z

Great! Can you update your PR to take that approach and I'll merge it?

…ng locks Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

…o fix-binder-deadlock-12190

HyunsangHan · 2025-10-15T13:44:08Z

@jdcormie
Thanks for merging this! I really enjoyed discussing it with you. It was both fun and meaningful.

Fixes BinderClientTransportTest which has been flaky since grpc#12283.

@HyunsangHan

Fixes BinderClientTransportTest#testAsyncSecurityPolicyCancelledUponExternalTermination and others which have been flaky since #12283. @HyunsangHan

…utures (grpc#12283) Move future cancellation outside of synchronized block in `BinderClientTransport.notifyTerminated()` to prevent deadlock if `AsyncSecurityPolicy` uses `directExecutor()` for callbacks. Fixes grpc#12190 --------- Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

@HyunsangHan

Fixes BinderClientTransportTest#testAsyncSecurityPolicyCancelledUponExternalTermination and others which have been flaky since grpc#12283. @HyunsangHan

binder: Avoid potential deadlock when canceling AsyncSecurityPolicy f…

c54cfff

…utures Move future cancellation outside of synchronized block in BinderClientTransport.notifyTerminated() to prevent deadlock if AsyncSecurityPolicy uses directExecutor() for callbacks. Fixes grpc#12190

HyunsangHan added 2 commits August 24, 2025 23:02

Merge branch 'master' of https://github.com/HyunSangHan/grpc-java int…

9111970

…o fix-binder-deadlock-12190 Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

binder: Fix potential deadlock in BinderClientTransport.notifyTermina…

7138614

…ted() Move future cancellation to offloadExecutor to avoid deadlock when AsyncSecurityPolicy uses directExecutor() for callbacks. Fixes grpc#12190 Signed-off-by: Hyunsang Han <gustkd3@gmail.com>

HyunsangHan commented Aug 24, 2025

View reviewed changes