Fix IBM J9 crash: replace pthread_cleanup_push with abi::__forced_unwind catch (SCP-1154)#492
Open
Fix IBM J9 crash: replace pthread_cleanup_push with abi::__forced_unwind catch (SCP-1154)#492
Conversation
…ind catch SCP-1154 / regression from ddprof 1.39.0 (commit 2063c65). pthread_cleanup_push in C++ mode creates __pthread_cleanup_class whose destructor is implicitly noexcept. When IBM J9's thread teardown raises _Unwind_ForcedUnwind (via libgcc, sourced from libj9thr29.so), the C++ runtime calls std::terminate() -> abort() as the forced-unwind exception propagates through the noexcept destructor context. Replace pthread_cleanup_push/pop in both start_routine_wrapper variants with an explicit catch(abi::__forced_unwind&) block that performs the same cleanup (unregisterThread + release) and re-throws to let the thread exit cleanly. Apply the same guard to J9WallClock::timerLoop to ensure VM::detachThread() is always called even when J9 cancels the sampler thread during JVM shutdown. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Tests verify that catch(abi::__forced_unwind&)+rethrow correctly: - intercepts pthread_cancel-triggered forced unwind and runs cleanup - lets the thread exit as PTHREAD_CANCELED after rethrow - keeps ProfiledThread::release() safe to call in the catch block - also covers pthread_exit() which uses the same exception on glibc Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
8be1c43 to
574d47e
Compare
Contributor
CI Test ResultsRun: #24728001460 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-04-21 14:55:38 UTC |
Contributor
|
Claude found two relevant issues:
|
- j9WallClock.cpp: move VM::attachThread and malloc into the try block (jni/frames pre-initialized to null) so catch(abi::__forced_unwind&) runs cleanup if the unwind fires during setup. - libraryPatcher_linux.cpp: merge init_thread_tls and start_window_and_register into a single noinline init_tls_and_register that holds one SignalBlocker across initCurrentThread() + startInitWindow() + registerThread(), closing the signal-unblocked gap on aarch64/musl/jdk11. Co-Authored-By: muse <muse@noreply>
Collaborator
Author
|
Thanks — both gaps are real. Fixed in 1cb5f6c:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?:
Fixes a JVM crash on IBM J9 / OpenJ9 when profiling is enabled (SCP-1154, regression from ddprof 1.39.0).
Replaces
pthread_cleanup_push/popin bothstart_routine_wrappervariants with an explicitcatch(abi::__forced_unwind&)guard. Also adds the same guard toJ9WallClock::timerLoopto ensureVM::detachThread()is always called when J9 cancels the sampler thread during JVM shutdown.Motivation:
IBM J9's thread teardown raises
_Unwind_ForcedUnwindvia libgcc (sourced fromlibj9thr29.so). In C++ mode,pthread_cleanup_pushexpands to a__pthread_cleanup_classobject whose destructor is implicitlynoexcept(C++11). When J9's forced-unwind propagates through this noexcept context,std::terminate()→abort()is called, crashing the JVM.This was a regression introduced in commit
2063c659("Prevent potential race in thread startup and cleanup dead code"), which addedpthread_cleanup_push/poptostart_routine_wrapper. The fix replaces that mechanism with an explicitcatch(abi::__forced_unwind&)+throw;pattern, which is the GCC/glibc-recommended approach for cleanup code that must run during POSIX thread cancellation.Root cause (from Reshmi Anand's investigation):
abort() ← libjavaProfiler.so ← _Unwind_ForcedUnwind ← libj9thr29.so-Ddd.profiling.enabled=trueon IBM WAS 9.0.5.16 with OpenJDK 1.8.0_462Additional Notes:
abi::__forced_unwindis correct: J9 calls_Unwind_ForcedUnwindfrom libgcc (confirmed by crash stack frame_Unwind_ForcedUnwind ← libj9thr29.so), which uses the GCC/Itanium ABI__forced_unwindtype.pthread_exit()is also covered: on glibc it raises its own__forced_unwind.thread_cleanupstatic helper is removed (no longer needed).start_routine_wrapper_spec(aarch64),tidis now captured before thetryblock to avoid the lazy-allocatingProfiledThread::current()path inside the catch.J9WallClock::timerLoopPushLocalFrame/PopLocalFramebalance: if cancellation fires between push/pop,DetachCurrentThread(viaVM::detachThread()) releases the outstanding JNI local frame per the JNI spec.How to test the change?:
New regression test
ddprof-lib/src/test/cpp/forced_unwind_ut.cppruns on Linux without a JVM and verifies:catch(abi::__forced_unwind&)fires onpthread_canceland the cleanup block runs.throw;re-throw lets the thread exit asPTHREAD_CANCELED(not viastd::terminate()).ProfiledThread::release()completes safely inside the catch block.pthread_exit()(which also uses__forced_unwindon glibc) is caught by the same pattern.Run:
./gradlew :ddprof-lib:gtestRelease --tests "ForcedUnwindTest*"End-to-end IBM J9 reproduction requires the Docker-based reproducer from the SCP-1154 investigation artifacts.
For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.🤖 Generated with Claude Code via muse implement