Skip to content

Port native malloc allocation profiling from async-profiler#398

Merged
jbachorik merged 15 commits intomainfrom
jb/native_allocs
Apr 23, 2026
Merged

Port native malloc allocation profiling from async-profiler#398
jbachorik merged 15 commits intomainfrom
jb/native_allocs

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented Feb 25, 2026

⚠️ Code origin notice: The core implementation (mallocTracer.cpp, mallocTracer.h) was directly ported from async-profiler (Apache-2.0). All source files carry the original Copyright The async-profiler authors header. Datadog modifications are attributed separately in each file.

What does this PR do?:
Ports the native malloc allocation profiler from async-profiler and integrates it with the Datadog JFR pipeline. When enabled via nativemem=<interval>, the profiler intercepts malloc, calloc, realloc, posix_memalign, and aligned_alloc across all loaded native libraries using GOT patching, and emits profiler.Malloc JFR events with Java stack traces. The free function is hooked to forward correctly through the GOT but free events are not recorded — sampled mallocs mean most frees would match nothing, and the immense event volume with no stack traces provides no actionable insight.

Changes:

  • mallocTracer.cpp/h — ported from async-profiler; GOT-patching hooks, Poisson byte-interval sampling with PID rate-limiting, nested-malloc detection for musl compatibility
  • flightRecorder.cpp/hrecordMallocSample() for profiler.Malloc JFR events with profiling context (spanId, localRootSpanId, contextAttributes)
  • jfrMetadata.cpp/h — new profiler.Malloc (T_MALLOC) event type definition with weight and context fields
  • profiler.cpp/hBCI_NATIVE_MALLOC path in recordSample, dlopen_hook patching of newly loaded libraries, CSTACK_VM promotion when VMStructs available
  • hotspot/hotspotSupport.cppBCI_NATIVE_MALLOC case in eventTypeFromBCI and walkJavaStack
  • jvmSupport.cppBCI_NATIVE_MALLOC allowed in walkJavaStack assert
  • arguments.cpp/hnativemem=<bytes> argument parsing
  • codeCache.cpp/him_posix_memalign / im_aligned_alloc import IDs
  • event.hMallocEvent struct with weight field
  • vmEntry.hBCI_NATIVE_MALLOC = -20 frame type constant
  • doc/architecture/NativeMemoryProfiling.md — architecture document

Motivation:
Native heap allocations (malloc/free) are a significant source of memory pressure and latency in JVM applications that rely on JNI, off-heap buffers, or native libraries. This feature gives users visibility into native allocation patterns alongside existing JVM heap profiling.

Additional Notes:

Upstream source: mallocTracer.cpp and mallocTracer.h are a port of the equivalent files from async-profiler. The porting involved:

  • Replacing async-profiler's internal types and helpers with Datadog profiler equivalents
  • Routing events through the Datadog JFR pipeline (recordSample) instead of async-profiler's own serialisation
  • Adapting the patchLibraries loop to use Datadog's CodeCache / UnloadProtection API

Stack walking: Native malloc events have no signal context (ucontext == NULL). CSTACK_VM (HotSpot VMStructs + JavaFrameAnchor) is the only mode that can produce meaningful Java stack traces in this situation. CSTACK_DEFAULT is the initial default; at profiler start it is promoted to CSTACK_VM when VMStructs are available. On JVMs where VMStructs are unavailable the profiler stays at CSTACK_DEFAULT.

Sampling: Uses Poisson-interval sampling (shouldSample()) with a lock-free CAS loop. A PID controller (updateConfiguration()) periodically adjusts the interval to maintain ~100 samples/second. Each sample carries a statistical weight reflecting the Poisson sampling probability.

No free event tracking: Free calls are hooked (to forward through the GOT correctly) but not recorded. With Poisson sampling on mallocs, most frees correspond to unsampled allocations and would produce meaningless events. The volume of free calls with no stack traces provides no actionable insight.

Reentrancy: Allocations made by the profiler itself during recording (stack walking, JFR buffer writes) will re-enter the hooks. This is a deliberate design trade-off (no TLS guard) documented in the source — it does not cause infinite recursion but may produce minor double-accounting.

How to test the change?:
Automated integration tests covering malloc sampling:

./utils/run-docker-tests.sh --mount --tests="*NativememProfilerTest*"

Tests pass for both cstack=vm and cstack=vmx variants of:

  • NativememProfilerTest#shouldRecordMallocSamples

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-13909

Unsure? Have a question? Request a review!

@jbachorik jbachorik added the AI label Feb 25, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Feb 25, 2026

CI Test Results

Run: #24854441810 | Commit: 7e0a8ce | Duration: 24m 56s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-04-23 19:53:53 UTC

@jbachorik jbachorik changed the title Port native malloc/free allocation profiling from async-profiler [WIP] Port native malloc/free allocation profiling from async-profiler Feb 25, 2026
@jbachorik jbachorik force-pushed the jb/native_allocs branch 2 times, most recently from 9fc994d to 5141dd8 Compare March 4, 2026 15:43
Comment thread ddprof-test/src/test/java/com/datadoghq/profiler/context/TagContextTest.java Outdated
Comment thread ddprof-lib/src/main/cpp/arguments.h Outdated
Comment thread ddprof-lib/src/main/cpp/event.h
Comment thread ddprof-lib/src/main/cpp/arguments.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/profiler.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/profiler.cpp Outdated
@jbachorik jbachorik changed the title [WIP] Port native malloc/free allocation profiling from async-profiler Port native malloc allocation profiling from async-profiler Mar 4, 2026
@jbachorik jbachorik marked this pull request as ready for review March 4, 2026 18:29
@jbachorik jbachorik requested a review from a team as a code owner March 4, 2026 18:29
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Mar 5, 2026

Integration Tests

All 40 integration tests passed

📊 Dashboard · 👷 Pipeline · 📦 8d7dcd3c

@jbachorik
Copy link
Copy Markdown
Collaborator Author

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 20, 2026

@copilot resolve the merge conflicts in this pull request

Resolved: I merged origin/main into this PR branch, fixed all merge conflicts, and pushed the merge commit 9e1f33f.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • groovy.jfrog.io
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.prefs/java.util.prefs=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.prefs/java.util.prefs=ALL-UNNAMED --add-opens=java.base/java.nio.charset=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.xml/javax.xml.namespace=ALL-UNNAMED --add-opens=java.base/java.time=ALL-UNNAMED -XX:MaxMetaspaceSize=384m -XX:&#43;HeapDumpOnOutOfMemoryError -Xms256m -Xmx512m -Dfile.encoding=UTF-8 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@jbachorik jbachorik force-pushed the jb/native_allocs branch 4 times, most recently from 4303fcc to a12cc1e Compare April 22, 2026 07:17
Comment thread ddprof-lib/src/main/cpp/flightRecorder.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Comment thread ddprof-lib/src/main/cpp/profiler.cpp
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Comment thread ddprof-lib/src/main/cpp/mallocTracer.h
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Comment thread ddprof-lib/src/main/cpp/mallocTracer.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ddprof-lib/src/main/cpp/profiler.cpp Outdated
jbachorik and others added 5 commits April 23, 2026 17:54
- mallocTracer: move _running=true before patchLibraries() (race fix)
- mallocTracer: use __atomic_fetch_sub; wrap detectNestedMalloc() in UnloadProtection
- mallocTracer: MallocHooker refactor; xoroshiro128+ PRNG; maybeRecord helper
- profiler: _cstack fallback from CSTACK_DEFAULT to DWARF_SUPPORTED?CSTACK_DWARF:CSTACK_NO
- tests: assertNotNull for size/weight/addr fields; weight>=1.0 on sampled path
- doc: update code snippet to reflect actual _cstack fallback

Co-Authored-By: muse <muse@noreply>
…ting

ByteBuffer.allocateDirect() triggers malloc inside libjvm.so, which on CI
is built with -Bsymbolic-functions — malloc calls never go through the PLT,
so GOT patching is bypassed and no profiler.Malloc events are produced.

Replace with NativeAllocHelper.nativeMalloc() backed by a new JNI function
in libddproftest.so, whose PLT is reliably patched by patchLibraries().
Add a Java wrapper triggerAllocations() so the frame appears in stack
traces for all cstack modes including fp/dwarf (ASGCT does not return
native method frames themselves).

Also emit samplingInterval in JFR settings and use putVar64 for tid in
recordMallocSample for consistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On glibc JVMs, System.loadLibrary may not route through the patched
dlopen GOT entry, so libddproftest.so can be loaded after patchLibraries()
runs and its malloc GOT is never hooked. Force class initialization (and
thus System.loadLibrary) in @BeforeAll so the library is in native_libs
when MallocTracer::start() calls patchLibraries().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik merged commit d6d85eb into main Apr 23, 2026
98 checks passed
@jbachorik jbachorik deleted the jb/native_allocs branch April 23, 2026 20:03
@github-actions github-actions Bot added this to the 1.40.1 milestone Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants