Claude/disc config struct nd xr3 by Tuesdaythe13th · Pull Request #1 · Tuesdaythe13th/warp

Tuesdaythe13th · 2026-04-06T15:44:53Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Test plan

Bug fix

import warp as wp
# Code that demonstrates the bug

New feature / enhancement

import warp as wp
# Code that demonstrates the new capability

Adds notebooks/disc_cooling_sim.ipynb with an Open-in-Colab badge, covering 2-D axisymmetric heat diffusion, Avrami crystallinity kinetics, warp-risk scoring, 2-D field visualisations, radial profile plots, and a mould-temperature parameter sweep. https://claude.ai/code/session_016zF8WWzQUxkQpC2hmiRkuB Signed-off-by: Claude <noreply@anthropic.com>

When a launcher runs a module via runpy.run_module(mod, run_name="__main__"), the module may already be imported under its qualified name. The previous approach used inspect.getmodule() first, which matched by filename and returned the pre-imported module's qualified name instead of "__main__". This caused set_module_options() to target a different module than @wp.kernel (which uses f.__module__ == "__main__"), silently ignoring the user's options. Use frame.f_globals["__name__"] as the primary source for module name resolution, ensuring consistency with @wp.kernel's use of f.__module__. Fall back to inspect.getmodule() and filename matching only when __name__ is unavailable. Also: - Use sys._getframe() instead of inspect.stack() to avoid building FrameInfo objects for the entire call stack - Use try/finally to clean up frame references promptly - Use os.path.realpath() instead of os.path.abspath() to handle symlinks Signed-off-by: Eric Shi <ershi@nvidia.com>

Fix caller module detection for runpy-based execution [NVIDIAGH-1274] See merge request omniverse/warp!2101

When multiple processes compile CUDA kernels concurrently with a shared kernel cache, NVRTC's precompiled header files (.pch) are written to the shared `--pch-dir` directory without any synchronization. One process can read a partially-written `.pch` while another is still writing it, causing a segfault inside NVRTC's `nvrtcCompileProgram`. This was observed as intermittent CI failures on Newton's parallel test runner (8 processes, shared kernel cache, Blackwell sm_120 GPU). The crash always occurs in `build_cuda` during the first CUDA kernel compilation in whichever test process loses the race. The fix directs `--pch-dir` to the per-process build directory (already unique via `_p<pid>_t<tid>` suffix) instead of the shared kernel cache. PCH files are cleaned up together with the build directory after `safe_rename` moves the final outputs to the cache. `pch_dir` is a required keyword argument to `build_cuda()` so that future callers cannot silently revert to the racy shared-directory behavior.

Fix PCH race condition in concurrent CUDA compilation [NVIDIAGH-1284] See merge request omniverse/warp!2109

Include cuBQL headers and build files under warp/native/cuBQL/, add Apache 2.0 license to licenses/, and exclude cuBQL from typos pre-commit checks. Signed-off-by: Eric Shi <ershi@nvidia.com>

Add cppcheck suppressions for warp/native/cuBQL/ in both GitLab CI and GitHub Actions, mark the directory as linguist-vendored, and update the contribution guide to note cuBQL as third-party code. Signed-off-by: Eric Shi <ershi@nvidia.com>

Add cuBQL as vendored third-party dependency See merge request omniverse/warp!2113

Signed-off-by: Eric Shi <ershi@nvidia.com>

Fix HashGrid truncation for negative coordinates (NVIDIAGH-1256) See merge request omniverse/warp!2058

Exclude vendored cuBQL and NanoVDB from CodeRabbit reviews See merge request omniverse/warp!2114

) Add an NDim TypeVar with a PEP 696 default (under TYPE_CHECKING) so that Array and its subclasses are parameterized by both DType and NDim. This lets static type checkers accept both array[dtype] and array[dtype, Literal[ndim]] without requiring a new runtime dependency. Signed-off-by: Eric Shi <ershi@nvidia.com>

Fix mypy not recognizing wp.array[dtype] subscript syntax [NVIDIAGH-1278] See merge request omniverse/warp!2119

…AGH-1238)

Add external texture support, refactor texture interop runtime (NVIDIAGH-1238) See merge request omniverse/warp!2112

Integrate the cuBQL library as an optional BVH backend for wp.Mesh, selectable via bvh_constructor="cubql". This backend supports ray queries (closest-hit, any-hit, count-all) on both CPU and GPU but does not support point queries, AABB queries, grouped meshes, or winding numbers. Key changes: - Add CuBQLBVH struct and cuBQL build/refit/rebuild/destroy for host and device in bvh.h, bvh.cpp, bvh.cu - Add templated cubql_ray_traversal in mesh.h with ClosestHit, AnyHit, and CountAll modes - Replace bvh_constructor_values dict with BvhConstructor IntEnum - Block CUBQL on wp.Bvh (standalone BVH, no traversal support) - Unsupported mesh queries silently return no results when cuBQL is active (documented in Mesh docstring)

Add cuBQL BVH backend for wp.Bvh and wp.Mesh [NVIDIAGH-1286] See merge request omniverse/warp!2111

Fixing clang compile issue in cuBQL See merge request omniverse/warp!2121

Mark the 2x multi-GPU runner jobs as allow_failure since they are frequently crashing for non-actionable reasons. Remove allow_failure from the clang build-and-test pipeline now that it has stabilized. Signed-off-by: Eric Shi <ershi@nvidia.com>

Update CI allow_failure for multi-GPU and clang jobs See merge request omniverse/warp!2123

Fix inaccurate "GPU-based" docstring on BvhConstructor.CUBQL since cuBQL also has a CPU path. Add braces to cubql if/else branches in mesh.cpp and mesh.cu for consistent style. Add test_mesh_refit that verifies BVH refit correctness by moving the mesh and checking ray queries. Add ValueError test for invalid bvh_constructor strings. Clarify CuBQLNode comment about child pair storage. Rewrite changelog entry to state supported/unsupported query types. Signed-off-by: Eric Shi <ershi@nvidia.com>

Fix cuBQL docstrings, brace style, and test coverage See merge request omniverse/warp!2122

Increase _SUITE_TIMEOUT from 2400s to 3600s to avoid premature timeouts on slower runners (e.g. Jetson Orin). Bump Windows test job timeouts to 75m to provide buffer over the new suite timeout. Signed-off-by: Eric Shi <ershi@nvidia.com>

Bump test suite timeout and CI job timeouts See merge request omniverse/warp!2127

Disable FEM example tests and remove multi-GPU allow_failure See merge request omniverse/warp!2126

The struct field setter extracted the raw Python value from Warp scalars for the ctypes backing store but then stored that unwrapped value as the Python attribute, causing e.g. wp.uint8 to decay to int after assignment. Re-wrap the value in the declared Warp type when the caller passed a Warp scalar. Plain Python values (int, float, bool) are stored as-is to avoid breaking downstream isinstance checks (e.g. wp.launch dim arguments). Signed-off-by: Eric Shi <ershi@nvidia.com>

Fix struct field assignment unwrapping Warp scalar types [NVIDIAGH-1288] See merge request omniverse/warp!2120

Signed-off-by: Eric Shi <ershi@nvidia.com>

Update uv.lock (Pygments 2.20.0, requests 2.33.1) See merge request omniverse/warp!2182

Move the CUB include before cuBQL to avoid a CCCL bug where <stdexcept> (from cuBQL's math/common.h) makes __throw_out_of_range non-constexpr, breaking a static_assert in typeid.h. Signed-off-by: Eric Shi <ershi@nvidia.com>

Fix bvh.cu compilation with CUDA 13.2 and GCC < 12 See merge request omniverse/warp!2183

…AGH-1270) Rename from wp.get_optimal_block_dim to wp.get_suggested_block_size to better reflect that the result is a suggestion based on per-SM occupancy, not a universally optimal choice. The function now returns both block_size and min_grid_size from cuOccupancyMaxPotentialBlockSize, letting callers check whether their launch is large enough to benefit from the suggested block size. Signed-off-by: Eric Shi <ershi@nvidia.com>

Add wp.get_suggested_block_size for CUDA occupancy queries [NVIDIAGH-1270] See merge request omniverse/warp!2147

Document how to run ASV benchmarks and explain why --launch-method spawn should be used on Linux to avoid leaking NVRTC precompiled-header directories in /tmp. Signed-off-by: Eric Shi <ershi@nvidia.com>

Add ASV benchmarking section to contribution guide See merge request omniverse/warp!2179

* upgrade cu13 libmathdx to latest Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2186

Upgrade cu13 libmathdx dependency to version 0.3.2 See merge request omniverse/warp!2186

… options * Address MR feedback for module_options validation Check isinstance before module="unique" so the most specific error fires first. Remove redundant mark_modified() on freshly constructed modules. Signed-off-by: Eric Shi <ershi@nvidia.com> * Add `module_options` dict parameter to `@wp.kernel` for inline module options Allow per-kernel module compilation options (e.g. `fast_math`, `mode`) via a new `module_options` dict on the `@wp.kernel` decorator. Requires `module="unique"` for any non-None value; raises `ValueError` otherwise. Unknown keys are validated against the module's known options. Empty dicts are accepted as a no-op with unique modules. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Alain Denzler <adenzler@nvidia.com> Approved-by: Lukasz Wawrzyniak <lwawrzyniak@nvidia.com> See merge request omniverse/warp!2067

Add `module_options` dict parameter to `@wp.kernel` for inline module options See merge request omniverse/warp!2067

…IAGH-1310] * Introduce is_cpu local for readability in _compile() Replace bare output_arch checks with a named boolean so the intent (CPU vs CUDA target) is immediately obvious. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nicolas Capens <ncapens@nvidia.com> * Default CPU optimization level to -O2, keep -O3 for CUDA When optimization_level is None (the default), CPU kernels now compile with -O2 while CUDA kernels use -O3. The LLVM backend barely distinguishes O2 from O3 and the O3-only frontend passes have low relevance to Warp's generated code patterns. Users can still set optimization_level=3 explicitly for both targets. Add hash-consistency test and changelog entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nicolas Capens <ncapens@nvidia.com> * Make CPU optimization level configurable Thread config.optimization_level through the Clang frontend (-O flag) and the LLVM backend (CodeGenOptLevel passed to createTargetMachine), so the setting now controls the full CPU compilation pipeline. Previously the frontend was hardcoded to -O2 and the backend always used CodeGenOptLevel::Default regardless of the config value. Add ctypes argtypes for wp_compile_cpp and wp_compile_cuda. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nicolas Capens <ncapens@nvidia.com> Approved-by: Lukasz Wawrzyniak <lwawrzyniak@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2137

Make CPU kernel optimization level configurable, default to -O2 [NVIDIAGH-1310] See merge request omniverse/warp!2137

* Last Greptile comment * Add geometry-driven fp64 precision support to warp.fem (Layers 0-3) Introduce infrastructure for full fp64 FEM pipelines, propagating scalar precision from the geometry through the entire stack. Approved-by: Eric Shi <ershi@nvidia.com> Approved-by: Gilles Daviet <gdaviet@nvidia.com> See merge request omniverse/warp!2172

* More Greptile comments Approved-by: Gilles Daviet <gdaviet@nvidia.com> See merge request omniverse/warp!2192

* Extend tile_fft/tile_ifft to support N-D tiles (NVIDIAGH-1317) Generalize wp.tile_fft() and wp.tile_ifft() from strictly 2-D tiles to arbitrary N-D tiles (N >= 2). The FFT is computed along the last dimension; all leading dimensions are treated as independent batches. Separate FFT tests into test_tile_fft.py. Signed-off-by: snidhan <snidhan@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2178

* Remove Python 3.9 support - Bump requires-python to >=3.10 and remove 3.9 classifier - Remove deprecation warnings from build_lib.py and context.py - Remove inspect.get_annotations() backport and ast.Index/ast.ExtSlice compat code from codegen.py - Remove 3.9 from GitLab CI test matrix - Update docs (README, installation, compatibility, C++ examples) - Regenerate uv.lock - Apply ruff pyupgrade fixes for Python 3.10+ target Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Nicolas Capens <ncapens@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2187

* Fix array annotation repr and matrix type_repr (NVIDIAGH-1341) Fix _ArrayAnnotationBase.__repr__() interpolating raw class objects into the format string, producing unreadable output like `wp.array(dtype=<class 'warp._src.types.uint32'>, ndim=4)`. The dtype is now resolved to a human-readable name: `wp.X` for types available in the warp namespace, struct keys for structs, and the descriptive type_repr form for exotic vector/matrix types. Also fix type_repr for small matrix types emitting a spurious pair of parentheses (e.g. `mat44f(f)` instead of `mat44ff` -> `mat44f`). Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2196

* Fix PCH code review feedback: handle None pch_dir in CUDA path, clean up partial PCH files - Make build_cuda() handle None pch_dir like build_cpu() already does, removing the misleading `or build_dir` fallback for CUDA >= 13.0 - Remove partial .pch files on failed generation to avoid a wasted fallback-retry on the next compilation Signed-off-by: Eric Shi <ershi@nvidia.com> * Address code review feedback for PCH diagnostic ownership - Fix use-after-free: scope setClient ownership transfer to LLVM >= 21 only; LLVM < 21 path correctly passes nullptr to createDiagnostics which creates its own internal printer - Guard get_clang_pch_dir() call with use_precompiled_headers check to avoid unnecessary temp directory allocation when PCH is disabled Signed-off-by: Eric Shi <ershi@nvidia.com> * Add CPU precompiled header support to reduce kernel compile times Extend precompiled header (PCH) support to the CPU compilation path (Clang/LLVM), matching the existing CUDA PCH support via NVRTC. On the first CPU kernel compilation, Clang generates a PCH from builtin.h. Subsequent compilations in the same process reuse the serialized AST, skipping redundant header parsing. For multi-module workloads like warp.fem, this reduces total CPU compile time by ~65% (e.g., FEM diffusion: 45s -> 16s, Stokes transfer: 84s -> 30s). Key details: - Controlled by warp.config.use_precompiled_headers (same as CUDA) - PCH files are per-thread temp directories to avoid races - Fallback: if a PCH is corrupt, Clang retries without it and deletes the stale file - PCH filename encodes block_dim and preprocessor flags so different configurations get separate files Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Nicolas Capens <ncapens@nvidia.com> See merge request omniverse/warp!2170

* Reduce memory usage in array shape int-promotion tests Replace tests that allocated ~3.4 GB to verify numpy integer shape elements are promoted to Python int. The new test uses a small array and asserts the type of shape elements directly. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2199

* Add three recent publications to PUBLICATIONS.md Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2200

* Add Quick Start example to README Show a complete 20-line N-body gravity simulation that demonstrates kernel definition, vec3 math, array creation, constant capture, and launch with one million particles. Signed-off-by: Eric Shi <ershi@nvidia.com> * Fix stale notebook commit hash in basics.rst Update accelerated-computing-hub notebook links to match the newer commit hash already used in README.md. Signed-off-by: Eric Shi <ershi@nvidia.com> * Clean up README examples section Remove unit test instructions (developer-facing, covered in contribution guide), consolidate USD viewing note with example descriptions, and update example descriptions to match docs. Signed-off-by: Eric Shi <ershi@nvidia.com> * Streamline docs landing page and align with product messaging - Slim down index.rst to intro, quickstart, and example gallery - Move tutorial notebooks to basics.rst - Move Omniverse section to installation.rst - Remove sections duplicated in sidebar pages (Learn More, Support, License, Contributing, Publications) - Replace "spatial computing" and "graphics code" with product-aligned language in both index.rst and README.md Signed-off-by: Eric Shi <ershi@nvidia.com> * Fix conda installation docs to match available variants The previous example referenced cuda126 builds which no longer exist. conda-forge now publishes cuda129 and cuda130 variants. Show the default install command and build string filters for specific variants. Signed-off-by: Eric Shi <ershi@nvidia.com> * Add announcement banner linking to latest release notes Signed-off-by: Eric Shi <ershi@nvidia.com> * Reduce TOC depth for changelog, publications, and API reference Prevent per-version changelog entries, per-year publication entries, and full API class/method hierarchies from cluttering the landing page TOC and sidebar navigation. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2201

* Update docs, changelog, and benchmarks for v1.12.1 release Bump version references in docs announcement banner and installation URLs, add v1.12.1 to ASV benchmark tags, and clean up Unreleased changelog entries for clarity and consistency. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2203

review-notebook-app · 2026-04-06T15:45:03Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Merge pull request #1 from Tuesdaythe13th/claude/disc-config-struct-n…

gemini-code-assist

Code Review

This pull request includes significant updates to the Warp documentation, including a new quick-start guide, updated installation instructions, and a new example notebook for disc cooling simulation. It also includes maintenance updates such as removing Kit extensions, updating the minimum Python version to 3.10, and adding support for cuBQL. The review feedback highlights a physical inaccuracy in the crystallisation kinetics model within the new notebook and suggests updating the kernel array type hints to the new subscript syntax for consistency.

gemini-code-assist · 2026-04-06T15:52:10Z

notebooks/disc_cooling_sim.ipynb

+    "@wp.kernel\n",
+    "def update_crystallinity(\n",
+    "    T:       wp.array(dtype=wp.float32),\n",
+    "    chi_in:  wp.array(dtype=wp.float32),\n",
+    "    chi_out: wp.array(dtype=wp.float32),\n",
+    "    config:  DiscConfig,\n",
+    "    params:  CoolingParams,\n",
+    "):\n",
+    "    \"\"\"Simple Avrami-style crystallisation kinetics.\n",
+    "\n",
+    "    Crystal growth is fastest mid-way between T_g and T_m and saturates\n",
+    "    at chi_max.  Replace with a Nakamura model for production use.\n",
+    "    \"\"\"\n",
+    "    tid  = wp.tid()\n",
+    "    temp = T[tid]\n",
+    "    chi  = chi_in[tid]\n",
+    "\n",
+    "    if temp > config.T_g and temp < config.T_m:\n",
+    "        x    = (temp - config.T_g) / (config.T_m - config.T_g)\n",
+    "        x    = wp.max(0.0, wp.min(1.0, x))\n",
+    "        rate = params.avrami_k0 * x * (1.0 - chi / params.chi_max)\n",
+    "        chi  = chi + params.dt * params.avrami_n * rate\n",
+    "        chi  = wp.max(0.0, wp.min(params.chi_max, chi))\n",
+    "\n",
+    "    chi_out[tid] = wp.float32(chi)\n",
+    "\n",


The crystallisation kinetics implementation in update_crystallinity appears to be physically incorrect and contradicts the docstring.

Temperature dependence: The docstring states that growth is fastest mid-way between $T_g$ and $T_m$. However, the code uses x = (temp - T_g) / (T_m - T_g), which makes the rate peak at $T_m$. In reality, the driving force for crystallisation is undercooling ($T_m - T$), so the rate should be zero at $T_m$. Consider using a term like x * (1.0 - x) or a more realistic Nakamura/Hoffman-Lauritzen model.

Avrami exponent: params.avrami_n is used as a linear multiplier for the rate. In the Avrami model, $n$ is an exponent that characterizes the dimensionality of growth. A linear scaling does not capture the sigmoidal nature of the transformation for $n > 1$.

gemini-code-assist · 2026-04-06T15:52:10Z

notebooks/disc_cooling_sim.ipynb

+    "@wp.kernel\n",
+    "def init_temperature(\n",
+    "    T:      wp.array(dtype=wp.float32),\n",
+    "    params: CoolingParams,\n",
+    "):\n",
+    "    tid = wp.tid()\n",
+    "    T[tid] = wp.float32(params.T_init)\n",
+    "\n",
+    "\n",
+    "@wp.kernel\n",
+    "def init_scalar(\n",
+    "    a:     wp.array(dtype=wp.float32),\n",
+    "    value: float,\n",
+    "):\n",
+    "    tid = wp.tid()\n",
+    "    a[tid] = wp.float32(value)\n",
+    "\n",
+    "\n",
+    "@wp.kernel\n",
+    "def step_temperature(\n",
+    "    T_in:   wp.array(dtype=wp.float32),\n",
+    "    T_out:  wp.array(dtype=wp.float32),\n",
+    "    config: DiscConfig,\n",
+    "    params: CoolingParams,\n",
+    "):\n",
+    "    \"\"\"Explicit finite-difference heat diffusion in cylindrical coordinates.\n",
+    "\n",
+    "    Solves  ∂T/∂t = α (∂²T/∂r² + (1/r)∂T/∂r + ∂²T/∂z²)\n",
+    "    with Dirichlet mould-wall BCs on the top and bottom (z) faces and\n",
+    "    Neumann (zero-flux) BCs on the axis (r=0) and outer radius.\n",
+    "    \"\"\"\n",
+    "    tid = wp.tid()\n",
+    "    i = tid // config.nz\n",
+    "    j = tid - i * config.nz\n",
+    "\n",
+    "    alpha = config.k / (config.rho * config.cp)\n",
+    "    dr    = config.dr\n",
+    "    dz    = config.dz\n",
+    "    r     = (float(i) + 0.5) * dr\n",
+    "\n",
+    "    # Dirichlet BC on top/bottom mould walls.\n",
+    "    if j == 0 or j == config.nz - 1:\n",
+    "        T_out[tid] = wp.float32(params.T_mold)\n",
+    "        return\n",
+    "\n",
+    "    # Neumann BC: mirror stencil at axis and outer edge.\n",
+    "    im = clamp_i(i - 1, 0, config.nx - 1)\n",
+    "    ip = clamp_i(i + 1, 0, config.nx - 1)\n",
+    "    if i == 0:\n",
+    "        im = 1\n",
+    "    if i == config.nx - 1:\n",
+    "        ip = config.nx - 2\n",
+    "\n",
+    "    jm = j - 1\n",
+    "    jp = j + 1\n",
+    "\n",
+    "    Tc  = T_in[idx(i,  j,  config.nz)]\n",
+    "    Trm = T_in[idx(im, j,  config.nz)]\n",
+    "    Trp = T_in[idx(ip, j,  config.nz)]\n",
+    "    Tzm = T_in[idx(i,  jm, config.nz)]\n",
+    "    Tzp = T_in[idx(i,  jp, config.nz)]\n",
+    "\n",
+    "    d2Tdr2       = (Trp - 2.0 * Tc + Trm) / (dr * dr)\n",
+    "    dTdr_over_r  = 0.0\n",
+    "    if i > 0:\n",
+    "        dTdr_over_r = (Trp - Trm) / (2.0 * dr * r)\n",
+    "    d2Tdz2 = (Tzp - 2.0 * Tc + Tzm) / (dz * dz)\n",
+    "\n",
+    "    lap   = d2Tdr2 + dTdr_over_r + d2Tdz2\n",
+    "    Tnew  = Tc + params.dt * alpha * lap\n",
+    "    T_out[tid] = wp.float32(Tnew)\n",
+    "\n",
+    "\n",
+    "@wp.kernel\n",
+    "def update_crystallinity(\n",
+    "    T:       wp.array(dtype=wp.float32),\n",
+    "    chi_in:  wp.array(dtype=wp.float32),\n",
+    "    chi_out: wp.array(dtype=wp.float32),\n",
+    "    config:  DiscConfig,\n",
+    "    params:  CoolingParams,\n",
+    "):\n",
+    "    \"\"\"Simple Avrami-style crystallisation kinetics.\n",
+    "\n",
+    "    Crystal growth is fastest mid-way between T_g and T_m and saturates\n",
+    "    at chi_max.  Replace with a Nakamura model for production use.\n",
+    "    \"\"\"\n",
+    "    tid  = wp.tid()\n",
+    "    temp = T[tid]\n",
+    "    chi  = chi_in[tid]\n",
+    "\n",
+    "    if temp > config.T_g and temp < config.T_m:\n",
+    "        x    = (temp - config.T_g) / (config.T_m - config.T_g)\n",
+    "        x    = wp.max(0.0, wp.min(1.0, x))\n",
+    "        rate = params.avrami_k0 * x * (1.0 - chi / params.chi_max)\n",
+    "        chi  = chi + params.dt * params.avrami_n * rate\n",
+    "        chi  = wp.max(0.0, wp.min(params.chi_max, chi))\n",
+    "\n",
+    "    chi_out[tid] = wp.float32(chi)\n",
+    "\n",
+    "\n",
+    "@wp.kernel\n",
+    "def compute_warp_risk(\n",
+    "    T:         wp.array(dtype=wp.float32),\n",
+    "    chi:       wp.array(dtype=wp.float32),\n",
+    "    warp_risk: wp.array(dtype=wp.float32),\n",
+    "    config:    DiscConfig,\n",
+    "    params:    CoolingParams,\n",
+    "):\n",
+    "    \"\"\"Score each radial position by thermal gradient and crystallinity asymmetry.\n",
+    "\n",
+    "    Only threads at the mid-plane (j == nz/2) write to warp_risk[i].\n",
+    "    \"\"\"\n",
+    "    tid = wp.tid()\n",
+    "    i   = tid // config.nz\n",
+    "    j   = tid - i * config.nz\n",
+    "\n",
+    "    if j != config.nz // 2:\n",
+    "        return\n",
+    "\n",
+    "    top = T[idx(i, 0,              config.nz)]\n",
+    "    bot = T[idx(i, config.nz - 1,  config.nz)]\n",
+    "    mid = T[idx(i, j,              config.nz)]\n",
+    "\n",
+    "    chi_top = chi[idx(i, 1,              config.nz)]\n",
+    "    chi_bot = chi[idx(i, config.nz - 2,  config.nz)]\n",
+    "\n",
+    "    dT_thickness = wp.abs(top - bot)\n",
+    "    dT_mid       = wp.abs(mid - 0.5 * (top + bot))\n",
+    "    dchi         = wp.abs(chi_top - chi_bot)\n",
+    "\n",
+    "    risk = (\n",
+    "        params.warp_temp_coeff * (dT_thickness + dT_mid)\n",
+    "        + params.warp_chi_coeff * dchi\n",
+    "    )\n",
+    "    warp_risk[i] = wp.float32(risk)"


The kernels in this notebook use the older wp.array(dtype=wp.float32) syntax for array type hints. To maintain consistency with the extensive documentation updates in this PR (which transition to the subscript syntax), these should be updated to use wp.array[float] or wp.array[wp.float32].

claude and others added 30 commits March 10, 2026 17:58

Merge branch 'ershi/robust-caller-module-detection' into 'main'

e6391e9

Fix caller module detection for runpy-based execution [NVIDIAGH-1274] See merge request omniverse/warp!2101

Merge branch 'adenzler/fix-pch-race-condition' into 'main'

146e676

Fix PCH race condition in concurrent CUDA compilation [NVIDIAGH-1284] See merge request omniverse/warp!2109

Add initial as-received cuBQL source from NVIDIA/cuBQL@5a651b3

2e51e12

Include cuBQL headers and build files under warp/native/cuBQL/, add Apache 2.0 license to licenses/, and exclude cuBQL from typos pre-commit checks. Signed-off-by: Eric Shi <ershi@nvidia.com>

Exclude cuBQL from CI checks and update docs

779215c

Add cppcheck suppressions for warp/native/cuBQL/ in both GitLab CI and GitHub Actions, mark the directory as linguist-vendored, and update the contribution guide to note cuBQL as third-party code. Signed-off-by: Eric Shi <ershi@nvidia.com>

Merge branch 'ershi/intial-cubql-checkout' into 'main'

2d3a111

Add cuBQL as vendored third-party dependency See merge request omniverse/warp!2113

Exclude vendored cuBQL and NanoVDB from CodeRabbit reviews

bf73a32

Signed-off-by: Eric Shi <ershi@nvidia.com>

Fix HashGrid truncation for negative coordinates (NVIDIAGH-1256)

e0ec2d0

Merge branch 'zcorse/hashgrid_truncation_fix' into 'main'

d5fc4f1

Fix HashGrid truncation for negative coordinates (NVIDIAGH-1256) See merge request omniverse/warp!2058

Merge branch 'ershi/coderabbit-ignore-vendored' into 'main'

1bbc7a6

Exclude vendored cuBQL and NanoVDB from CodeRabbit reviews See merge request omniverse/warp!2114

Merge branch 'ershi/fix-mypy-generic-array' into 'main'

c28471c

Fix mypy not recognizing wp.array[dtype] subscript syntax [NVIDIAGH-1278] See merge request omniverse/warp!2119

Add external texture support, refactor texture interop runtime (NVIDI…

5748117

…AGH-1238)

Merge branch 'lwawrzyniak/external-texture-support' into 'main'

2ceda40

Add external texture support, refactor texture interop runtime (NVIDIAGH-1238) See merge request omniverse/warp!2112

Merge branch 'warp-cubql' into 'main'

c9c911c

Add cuBQL BVH backend for wp.Bvh and wp.Mesh [NVIDIAGH-1286] See merge request omniverse/warp!2111

Fixing clang compile issue in cuBQL

4ac851b

Merge branch 'dev/cubql-clang-fix' into 'main'

e989dbe

Fixing clang compile issue in cuBQL See merge request omniverse/warp!2121

Merge branch 'eshi/ci-allow-failure-updates' into 'main'

812aee1

Update CI allow_failure for multi-GPU and clang jobs See merge request omniverse/warp!2123

Merge branch 'ershi/cubql-fixups' into 'main'

4a882c1

Fix cuBQL docstrings, brace style, and test coverage See merge request omniverse/warp!2122

Bump test suite timeout and CI job timeouts

93f8308

Increase _SUITE_TIMEOUT from 2400s to 3600s to avoid premature timeouts on slower runners (e.g. Jetson Orin). Bump Windows test job timeouts to 75m to provide buffer over the new suite timeout. Signed-off-by: Eric Shi <ershi@nvidia.com>

Merge branch 'ershi/bump-test-timeouts' into 'main'

955b0ec

Bump test suite timeout and CI job timeouts See merge request omniverse/warp!2127

Disable FEM example tests and remove multi-GPU allow_failure

0fbf535

Merge branch 'ershi/disable-fem-example-tests' into 'main'

8134cbd

Disable FEM example tests and remove multi-GPU allow_failure See merge request omniverse/warp!2126

Merge branch 'ershi/fix-struct-field-type-unwrap' into 'main'

1f94805

Fix struct field assignment unwrapping Warp scalar types [NVIDIAGH-1288] See merge request omniverse/warp!2120

shi-eric and others added 25 commits March 31, 2026 11:05

Update uv.lock (Pygments 2.20.0, requests 2.33.1)

cf28fa6

Signed-off-by: Eric Shi <ershi@nvidia.com>

Merge branch 'ershi/update-deps' into 'main'

4fd2425

Update uv.lock (Pygments 2.20.0, requests 2.33.1) See merge request omniverse/warp!2182

Fix bvh.cu compilation with CUDA 13.2 and GCC < 12

38caa96

Move the CUB include before cuBQL to avoid a CCCL bug where <stdexcept> (from cuBQL's math/common.h) makes __throw_out_of_range non-constexpr, breaking a static_assert in typeid.h. Signed-off-by: Eric Shi <ershi@nvidia.com>

Merge branch 'eshi/fix-bvh-cub-include-order' into 'main'

32d214d

Fix bvh.cu compilation with CUDA 13.2 and GCC < 12 See merge request omniverse/warp!2183

Merge branch 'ershi/cuda-occupancy' into 'main'

3bab325

Add wp.get_suggested_block_size for CUDA occupancy queries [NVIDIAGH-1270] See merge request omniverse/warp!2147

Add ASV benchmarking section to contribution guide

8fdba62

Document how to run ASV benchmarks and explain why --launch-method spawn should be used on Linux to avoid leaking NVRTC precompiled-header directories in /tmp. Signed-off-by: Eric Shi <ershi@nvidia.com>

Merge branch 'eshi/asv-benchmark-docs' into 'main'

d03e20b

Add ASV benchmarking section to contribution guide See merge request omniverse/warp!2179

Upgrade cu13 libmathdx dependency to version 0.3.2

453f821

* upgrade cu13 libmathdx to latest Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2186

Merge branch 'zcorse/libmathdx-0.3.2' into 'main'

e803a24

Upgrade cu13 libmathdx dependency to version 0.3.2 See merge request omniverse/warp!2186

Merge branch 'ershi/kernel-options-dict' into 'main'

0b37de0

Add `module_options` dict parameter to `@wp.kernel` for inline module options See merge request omniverse/warp!2067

Merge branch 'ncapens/clang-o3' into 'main'

0f69713

Make CPU kernel optimization level configurable, default to -O2 [NVIDIAGH-1310] See merge request omniverse/warp!2137

warp.fem fp64 follow-up (late Greptile comments from !2172)

350dc47

* More Greptile comments Approved-by: Gilles Daviet <gdaviet@nvidia.com> See merge request omniverse/warp!2192

Add recent publications to PUBLICATIONS.md

7f354d9

* Add three recent publications to PUBLICATIONS.md Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2200

Merge branch 'main' into claude/disc-config-struct-ndXr3

97b40f5

Tuesdaythe13th merged commit 845f9a9 into main Apr 6, 2026

Tuesdaythe13th added a commit that referenced this pull request Apr 6, 2026

Merge pull request #2 from Tuesdaythe13th/main

bd34de1

Merge pull request #1 from Tuesdaythe13th/claude/disc-config-struct-n…

Tuesdaythe13th mentioned this pull request Apr 6, 2026

Merge pull request #2 from Tuesdaythe13th/main #3

Merged

3 tasks

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/disc config struct nd xr3#1

Claude/disc config struct nd xr3#1
Tuesdaythe13th merged 175 commits intomainfrom
claude/disc-config-struct-ndXr3

Tuesdaythe13th commented Apr 6, 2026

Uh oh!

review-notebook-app bot commented Apr 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

Conversation

Tuesdaythe13th commented Apr 6, 2026

Description

Checklist

Test plan

Bug fix

New feature / enhancement

Uh oh!

review-notebook-app bot commented Apr 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants