Port native malloc allocation profiling from async-profiler#398
Port native malloc allocation profiling from async-profiler#398
Conversation
CI Test ResultsRun: #24854441810 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-04-23 19:53:53 UTC |
9fc994d to
5141dd8
Compare
a938cd3 to
f70abcb
Compare
|
@copilot resolve the merge conflicts in this pull request |
Resolved: I merged Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
4303fcc to
a12cc1e
Compare
a12cc1e to
44be5c7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- mallocTracer: move _running=true before patchLibraries() (race fix) - mallocTracer: use __atomic_fetch_sub; wrap detectNestedMalloc() in UnloadProtection - mallocTracer: MallocHooker refactor; xoroshiro128+ PRNG; maybeRecord helper - profiler: _cstack fallback from CSTACK_DEFAULT to DWARF_SUPPORTED?CSTACK_DWARF:CSTACK_NO - tests: assertNotNull for size/weight/addr fields; weight>=1.0 on sampled path - doc: update code snippet to reflect actual _cstack fallback Co-Authored-By: muse <muse@noreply>
…ting ByteBuffer.allocateDirect() triggers malloc inside libjvm.so, which on CI is built with -Bsymbolic-functions — malloc calls never go through the PLT, so GOT patching is bypassed and no profiler.Malloc events are produced. Replace with NativeAllocHelper.nativeMalloc() backed by a new JNI function in libddproftest.so, whose PLT is reliably patched by patchLibraries(). Add a Java wrapper triggerAllocations() so the frame appears in stack traces for all cstack modes including fp/dwarf (ASGCT does not return native method frames themselves). Also emit samplingInterval in JFR settings and use putVar64 for tid in recordMallocSample for consistency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On glibc JVMs, System.loadLibrary may not route through the patched dlopen GOT entry, so libddproftest.so can be loaded after patchLibraries() runs and its malloc GOT is never hooked. Force class initialization (and thus System.loadLibrary) in @BeforeAll so the library is in native_libs when MallocTracer::start() calls patchLibraries(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What does this PR do?:
Ports the native malloc allocation profiler from async-profiler and integrates it with the Datadog JFR pipeline. When enabled via
nativemem=<interval>, the profiler interceptsmalloc,calloc,realloc,posix_memalign, andaligned_allocacross all loaded native libraries using GOT patching, and emitsprofiler.MallocJFR events with Java stack traces. Thefreefunction is hooked to forward correctly through the GOT but free events are not recorded — sampled mallocs mean most frees would match nothing, and the immense event volume with no stack traces provides no actionable insight.Changes:
mallocTracer.cpp/h— ported from async-profiler; GOT-patching hooks, Poisson byte-interval sampling with PID rate-limiting, nested-malloc detection for musl compatibilityflightRecorder.cpp/h—recordMallocSample()forprofiler.MallocJFR events with profiling context (spanId, localRootSpanId, contextAttributes)jfrMetadata.cpp/h— newprofiler.Malloc(T_MALLOC) event type definition with weight and context fieldsprofiler.cpp/h—BCI_NATIVE_MALLOCpath inrecordSample,dlopen_hookpatching of newly loaded libraries,CSTACK_VMpromotion when VMStructs availablehotspot/hotspotSupport.cpp—BCI_NATIVE_MALLOCcase ineventTypeFromBCIandwalkJavaStackjvmSupport.cpp—BCI_NATIVE_MALLOCallowed inwalkJavaStackassertarguments.cpp/h—nativemem=<bytes>argument parsingcodeCache.cpp/h—im_posix_memalign/im_aligned_allocimport IDsevent.h—MallocEventstruct with weight fieldvmEntry.h—BCI_NATIVE_MALLOC = -20frame type constantdoc/architecture/NativeMemoryProfiling.md— architecture documentMotivation:
Native heap allocations (malloc/free) are a significant source of memory pressure and latency in JVM applications that rely on JNI, off-heap buffers, or native libraries. This feature gives users visibility into native allocation patterns alongside existing JVM heap profiling.
Additional Notes:
Upstream source:
mallocTracer.cppandmallocTracer.hare a port of the equivalent files from async-profiler. The porting involved:recordSample) instead of async-profiler's own serialisationpatchLibrariesloop to use Datadog'sCodeCache/UnloadProtectionAPIStack walking: Native malloc events have no signal context (
ucontext == NULL).CSTACK_VM(HotSpot VMStructs +JavaFrameAnchor) is the only mode that can produce meaningful Java stack traces in this situation.CSTACK_DEFAULTis the initial default; at profiler start it is promoted toCSTACK_VMwhen VMStructs are available. On JVMs where VMStructs are unavailable the profiler stays atCSTACK_DEFAULT.Sampling: Uses Poisson-interval sampling (
shouldSample()) with a lock-free CAS loop. A PID controller (updateConfiguration()) periodically adjusts the interval to maintain ~100 samples/second. Each sample carries a statistical weight reflecting the Poisson sampling probability.No free event tracking: Free calls are hooked (to forward through the GOT correctly) but not recorded. With Poisson sampling on mallocs, most frees correspond to unsampled allocations and would produce meaningless events. The volume of free calls with no stack traces provides no actionable insight.
Reentrancy: Allocations made by the profiler itself during recording (stack walking, JFR buffer writes) will re-enter the hooks. This is a deliberate design trade-off (no TLS guard) documented in the source — it does not cause infinite recursion but may produce minor double-accounting.
How to test the change?:
Automated integration tests covering malloc sampling:
Tests pass for both
cstack=vmandcstack=vmxvariants of:NativememProfilerTest#shouldRecordMallocSamplesFor Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.Unsure? Have a question? Request a review!