Conversation
Adds notebooks/disc_cooling_sim.ipynb with an Open-in-Colab badge, covering 2-D axisymmetric heat diffusion, Avrami crystallinity kinetics, warp-risk scoring, 2-D field visualisations, radial profile plots, and a mould-temperature parameter sweep. https://claude.ai/code/session_016zF8WWzQUxkQpC2hmiRkuB Signed-off-by: Claude <noreply@anthropic.com>
When a launcher runs a module via runpy.run_module(mod, run_name="__main__"), the module may already be imported under its qualified name. The previous approach used inspect.getmodule() first, which matched by filename and returned the pre-imported module's qualified name instead of "__main__". This caused set_module_options() to target a different module than @wp.kernel (which uses f.__module__ == "__main__"), silently ignoring the user's options. Use frame.f_globals["__name__"] as the primary source for module name resolution, ensuring consistency with @wp.kernel's use of f.__module__. Fall back to inspect.getmodule() and filename matching only when __name__ is unavailable. Also: - Use sys._getframe() instead of inspect.stack() to avoid building FrameInfo objects for the entire call stack - Use try/finally to clean up frame references promptly - Use os.path.realpath() instead of os.path.abspath() to handle symlinks Signed-off-by: Eric Shi <ershi@nvidia.com>
Fix caller module detection for runpy-based execution [NVIDIAGH-1274] See merge request omniverse/warp!2101
When multiple processes compile CUDA kernels concurrently with a shared kernel cache, NVRTC's precompiled header files (.pch) are written to the shared `--pch-dir` directory without any synchronization. One process can read a partially-written `.pch` while another is still writing it, causing a segfault inside NVRTC's `nvrtcCompileProgram`. This was observed as intermittent CI failures on Newton's parallel test runner (8 processes, shared kernel cache, Blackwell sm_120 GPU). The crash always occurs in `build_cuda` during the first CUDA kernel compilation in whichever test process loses the race. The fix directs `--pch-dir` to the per-process build directory (already unique via `_p<pid>_t<tid>` suffix) instead of the shared kernel cache. PCH files are cleaned up together with the build directory after `safe_rename` moves the final outputs to the cache. `pch_dir` is a required keyword argument to `build_cuda()` so that future callers cannot silently revert to the racy shared-directory behavior.
Fix PCH race condition in concurrent CUDA compilation [NVIDIAGH-1284] See merge request omniverse/warp!2109
Include cuBQL headers and build files under warp/native/cuBQL/, add Apache 2.0 license to licenses/, and exclude cuBQL from typos pre-commit checks. Signed-off-by: Eric Shi <ershi@nvidia.com>
Add cppcheck suppressions for warp/native/cuBQL/ in both GitLab CI and GitHub Actions, mark the directory as linguist-vendored, and update the contribution guide to note cuBQL as third-party code. Signed-off-by: Eric Shi <ershi@nvidia.com>
Add cuBQL as vendored third-party dependency See merge request omniverse/warp!2113
Signed-off-by: Eric Shi <ershi@nvidia.com>
Fix HashGrid truncation for negative coordinates (NVIDIAGH-1256) See merge request omniverse/warp!2058
Exclude vendored cuBQL and NanoVDB from CodeRabbit reviews See merge request omniverse/warp!2114
) Add an NDim TypeVar with a PEP 696 default (under TYPE_CHECKING) so that Array and its subclasses are parameterized by both DType and NDim. This lets static type checkers accept both array[dtype] and array[dtype, Literal[ndim]] without requiring a new runtime dependency. Signed-off-by: Eric Shi <ershi@nvidia.com>
Fix mypy not recognizing wp.array[dtype] subscript syntax [NVIDIAGH-1278] See merge request omniverse/warp!2119
Add external texture support, refactor texture interop runtime (NVIDIAGH-1238) See merge request omniverse/warp!2112
Integrate the cuBQL library as an optional BVH backend for wp.Mesh, selectable via bvh_constructor="cubql". This backend supports ray queries (closest-hit, any-hit, count-all) on both CPU and GPU but does not support point queries, AABB queries, grouped meshes, or winding numbers. Key changes: - Add CuBQLBVH struct and cuBQL build/refit/rebuild/destroy for host and device in bvh.h, bvh.cpp, bvh.cu - Add templated cubql_ray_traversal in mesh.h with ClosestHit, AnyHit, and CountAll modes - Replace bvh_constructor_values dict with BvhConstructor IntEnum - Block CUBQL on wp.Bvh (standalone BVH, no traversal support) - Unsupported mesh queries silently return no results when cuBQL is active (documented in Mesh docstring)
Add cuBQL BVH backend for wp.Bvh and wp.Mesh [NVIDIAGH-1286] See merge request omniverse/warp!2111
Fixing clang compile issue in cuBQL See merge request omniverse/warp!2121
Mark the 2x multi-GPU runner jobs as allow_failure since they are frequently crashing for non-actionable reasons. Remove allow_failure from the clang build-and-test pipeline now that it has stabilized. Signed-off-by: Eric Shi <ershi@nvidia.com>
Update CI allow_failure for multi-GPU and clang jobs See merge request omniverse/warp!2123
Fix inaccurate "GPU-based" docstring on BvhConstructor.CUBQL since cuBQL also has a CPU path. Add braces to cubql if/else branches in mesh.cpp and mesh.cu for consistent style. Add test_mesh_refit that verifies BVH refit correctness by moving the mesh and checking ray queries. Add ValueError test for invalid bvh_constructor strings. Clarify CuBQLNode comment about child pair storage. Rewrite changelog entry to state supported/unsupported query types. Signed-off-by: Eric Shi <ershi@nvidia.com>
Fix cuBQL docstrings, brace style, and test coverage See merge request omniverse/warp!2122
Increase _SUITE_TIMEOUT from 2400s to 3600s to avoid premature timeouts on slower runners (e.g. Jetson Orin). Bump Windows test job timeouts to 75m to provide buffer over the new suite timeout. Signed-off-by: Eric Shi <ershi@nvidia.com>
Bump test suite timeout and CI job timeouts See merge request omniverse/warp!2127
Disable FEM example tests and remove multi-GPU allow_failure See merge request omniverse/warp!2126
The struct field setter extracted the raw Python value from Warp scalars for the ctypes backing store but then stored that unwrapped value as the Python attribute, causing e.g. wp.uint8 to decay to int after assignment. Re-wrap the value in the declared Warp type when the caller passed a Warp scalar. Plain Python values (int, float, bool) are stored as-is to avoid breaking downstream isinstance checks (e.g. wp.launch dim arguments). Signed-off-by: Eric Shi <ershi@nvidia.com>
Fix struct field assignment unwrapping Warp scalar types [NVIDIAGH-1288] See merge request omniverse/warp!2120
Signed-off-by: Eric Shi <ershi@nvidia.com>
Update uv.lock (Pygments 2.20.0, requests 2.33.1) See merge request omniverse/warp!2182
Move the CUB include before cuBQL to avoid a CCCL bug where <stdexcept> (from cuBQL's math/common.h) makes __throw_out_of_range non-constexpr, breaking a static_assert in typeid.h. Signed-off-by: Eric Shi <ershi@nvidia.com>
Fix bvh.cu compilation with CUDA 13.2 and GCC < 12 See merge request omniverse/warp!2183
…AGH-1270) Rename from wp.get_optimal_block_dim to wp.get_suggested_block_size to better reflect that the result is a suggestion based on per-SM occupancy, not a universally optimal choice. The function now returns both block_size and min_grid_size from cuOccupancyMaxPotentialBlockSize, letting callers check whether their launch is large enough to benefit from the suggested block size. Signed-off-by: Eric Shi <ershi@nvidia.com>
Add wp.get_suggested_block_size for CUDA occupancy queries [NVIDIAGH-1270] See merge request omniverse/warp!2147
Document how to run ASV benchmarks and explain why --launch-method spawn should be used on Linux to avoid leaking NVRTC precompiled-header directories in /tmp. Signed-off-by: Eric Shi <ershi@nvidia.com>
Add ASV benchmarking section to contribution guide See merge request omniverse/warp!2179
* upgrade cu13 libmathdx to latest Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2186
Upgrade cu13 libmathdx dependency to version 0.3.2 See merge request omniverse/warp!2186
… options * Address MR feedback for module_options validation Check isinstance before module="unique" so the most specific error fires first. Remove redundant mark_modified() on freshly constructed modules. Signed-off-by: Eric Shi <ershi@nvidia.com> * Add `module_options` dict parameter to `@wp.kernel` for inline module options Allow per-kernel module compilation options (e.g. `fast_math`, `mode`) via a new `module_options` dict on the `@wp.kernel` decorator. Requires `module="unique"` for any non-None value; raises `ValueError` otherwise. Unknown keys are validated against the module's known options. Empty dicts are accepted as a no-op with unique modules. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Alain Denzler <adenzler@nvidia.com> Approved-by: Lukasz Wawrzyniak <lwawrzyniak@nvidia.com> See merge request omniverse/warp!2067
Add `module_options` dict parameter to `@wp.kernel` for inline module options See merge request omniverse/warp!2067
…IAGH-1310] * Introduce is_cpu local for readability in _compile() Replace bare output_arch checks with a named boolean so the intent (CPU vs CUDA target) is immediately obvious. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nicolas Capens <ncapens@nvidia.com> * Default CPU optimization level to -O2, keep -O3 for CUDA When optimization_level is None (the default), CPU kernels now compile with -O2 while CUDA kernels use -O3. The LLVM backend barely distinguishes O2 from O3 and the O3-only frontend passes have low relevance to Warp's generated code patterns. Users can still set optimization_level=3 explicitly for both targets. Add hash-consistency test and changelog entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nicolas Capens <ncapens@nvidia.com> * Make CPU optimization level configurable Thread config.optimization_level through the Clang frontend (-O flag) and the LLVM backend (CodeGenOptLevel passed to createTargetMachine), so the setting now controls the full CPU compilation pipeline. Previously the frontend was hardcoded to -O2 and the backend always used CodeGenOptLevel::Default regardless of the config value. Add ctypes argtypes for wp_compile_cpp and wp_compile_cuda. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nicolas Capens <ncapens@nvidia.com> Approved-by: Lukasz Wawrzyniak <lwawrzyniak@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2137
Make CPU kernel optimization level configurable, default to -O2 [NVIDIAGH-1310] See merge request omniverse/warp!2137
* Last Greptile comment * Add geometry-driven fp64 precision support to warp.fem (Layers 0-3) Introduce infrastructure for full fp64 FEM pipelines, propagating scalar precision from the geometry through the entire stack. Approved-by: Eric Shi <ershi@nvidia.com> Approved-by: Gilles Daviet <gdaviet@nvidia.com> See merge request omniverse/warp!2172
* More Greptile comments Approved-by: Gilles Daviet <gdaviet@nvidia.com> See merge request omniverse/warp!2192
* Extend tile_fft/tile_ifft to support N-D tiles (NVIDIAGH-1317) Generalize wp.tile_fft() and wp.tile_ifft() from strictly 2-D tiles to arbitrary N-D tiles (N >= 2). The FFT is computed along the last dimension; all leading dimensions are treated as independent batches. Separate FFT tests into test_tile_fft.py. Signed-off-by: snidhan <snidhan@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2178
* Remove Python 3.9 support - Bump requires-python to >=3.10 and remove 3.9 classifier - Remove deprecation warnings from build_lib.py and context.py - Remove inspect.get_annotations() backport and ast.Index/ast.ExtSlice compat code from codegen.py - Remove 3.9 from GitLab CI test matrix - Update docs (README, installation, compatibility, C++ examples) - Regenerate uv.lock - Apply ruff pyupgrade fixes for Python 3.10+ target Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Nicolas Capens <ncapens@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2187
* Fix array annotation repr and matrix type_repr (NVIDIAGH-1341) Fix _ArrayAnnotationBase.__repr__() interpolating raw class objects into the format string, producing unreadable output like `wp.array(dtype=<class 'warp._src.types.uint32'>, ndim=4)`. The dtype is now resolved to a human-readable name: `wp.X` for types available in the warp namespace, struct keys for structs, and the descriptive type_repr form for exotic vector/matrix types. Also fix type_repr for small matrix types emitting a spurious pair of parentheses (e.g. `mat44f(f)` instead of `mat44ff` -> `mat44f`). Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2196
* Fix PCH code review feedback: handle None pch_dir in CUDA path, clean up partial PCH files - Make build_cuda() handle None pch_dir like build_cpu() already does, removing the misleading `or build_dir` fallback for CUDA >= 13.0 - Remove partial .pch files on failed generation to avoid a wasted fallback-retry on the next compilation Signed-off-by: Eric Shi <ershi@nvidia.com> * Address code review feedback for PCH diagnostic ownership - Fix use-after-free: scope setClient ownership transfer to LLVM >= 21 only; LLVM < 21 path correctly passes nullptr to createDiagnostics which creates its own internal printer - Guard get_clang_pch_dir() call with use_precompiled_headers check to avoid unnecessary temp directory allocation when PCH is disabled Signed-off-by: Eric Shi <ershi@nvidia.com> * Add CPU precompiled header support to reduce kernel compile times Extend precompiled header (PCH) support to the CPU compilation path (Clang/LLVM), matching the existing CUDA PCH support via NVRTC. On the first CPU kernel compilation, Clang generates a PCH from builtin.h. Subsequent compilations in the same process reuse the serialized AST, skipping redundant header parsing. For multi-module workloads like warp.fem, this reduces total CPU compile time by ~65% (e.g., FEM diffusion: 45s -> 16s, Stokes transfer: 84s -> 30s). Key details: - Controlled by warp.config.use_precompiled_headers (same as CUDA) - PCH files are per-thread temp directories to avoid races - Fallback: if a PCH is corrupt, Clang retries without it and deletes the stale file - PCH filename encodes block_dim and preprocessor flags so different configurations get separate files Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Nicolas Capens <ncapens@nvidia.com> See merge request omniverse/warp!2170
* Reduce memory usage in array shape int-promotion tests Replace tests that allocated ~3.4 GB to verify numpy integer shape elements are promoted to Python int. The new test uses a small array and asserts the type of shape elements directly. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2199
* Add three recent publications to PUBLICATIONS.md Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2200
* Add Quick Start example to README Show a complete 20-line N-body gravity simulation that demonstrates kernel definition, vec3 math, array creation, constant capture, and launch with one million particles. Signed-off-by: Eric Shi <ershi@nvidia.com> * Fix stale notebook commit hash in basics.rst Update accelerated-computing-hub notebook links to match the newer commit hash already used in README.md. Signed-off-by: Eric Shi <ershi@nvidia.com> * Clean up README examples section Remove unit test instructions (developer-facing, covered in contribution guide), consolidate USD viewing note with example descriptions, and update example descriptions to match docs. Signed-off-by: Eric Shi <ershi@nvidia.com> * Streamline docs landing page and align with product messaging - Slim down index.rst to intro, quickstart, and example gallery - Move tutorial notebooks to basics.rst - Move Omniverse section to installation.rst - Remove sections duplicated in sidebar pages (Learn More, Support, License, Contributing, Publications) - Replace "spatial computing" and "graphics code" with product-aligned language in both index.rst and README.md Signed-off-by: Eric Shi <ershi@nvidia.com> * Fix conda installation docs to match available variants The previous example referenced cuda126 builds which no longer exist. conda-forge now publishes cuda129 and cuda130 variants. Show the default install command and build string filters for specific variants. Signed-off-by: Eric Shi <ershi@nvidia.com> * Add announcement banner linking to latest release notes Signed-off-by: Eric Shi <ershi@nvidia.com> * Reduce TOC depth for changelog, publications, and API reference Prevent per-version changelog entries, per-year publication entries, and full API class/method hierarchies from cluttering the landing page TOC and sidebar navigation. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2201
* Update docs, changelog, and benchmarks for v1.12.1 release Bump version references in docs announcement banner and installation URLs, add v1.12.1 to ASV benchmark tags, and clean up Unreleased changelog entries for clarity and consistency. Signed-off-by: Eric Shi <ershi@nvidia.com> Approved-by: Eric Shi <ershi@nvidia.com> See merge request omniverse/warp!2203
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Merge pull request #1 from Tuesdaythe13th/claude/disc-config-struct-n…
There was a problem hiding this comment.
Code Review
This pull request includes significant updates to the Warp documentation, including a new quick-start guide, updated installation instructions, and a new example notebook for disc cooling simulation. It also includes maintenance updates such as removing Kit extensions, updating the minimum Python version to 3.10, and adding support for cuBQL. The review feedback highlights a physical inaccuracy in the crystallisation kinetics model within the new notebook and suggests updating the kernel array type hints to the new subscript syntax for consistency.
| "@wp.kernel\n", | ||
| "def update_crystallinity(\n", | ||
| " T: wp.array(dtype=wp.float32),\n", | ||
| " chi_in: wp.array(dtype=wp.float32),\n", | ||
| " chi_out: wp.array(dtype=wp.float32),\n", | ||
| " config: DiscConfig,\n", | ||
| " params: CoolingParams,\n", | ||
| "):\n", | ||
| " \"\"\"Simple Avrami-style crystallisation kinetics.\n", | ||
| "\n", | ||
| " Crystal growth is fastest mid-way between T_g and T_m and saturates\n", | ||
| " at chi_max. Replace with a Nakamura model for production use.\n", | ||
| " \"\"\"\n", | ||
| " tid = wp.tid()\n", | ||
| " temp = T[tid]\n", | ||
| " chi = chi_in[tid]\n", | ||
| "\n", | ||
| " if temp > config.T_g and temp < config.T_m:\n", | ||
| " x = (temp - config.T_g) / (config.T_m - config.T_g)\n", | ||
| " x = wp.max(0.0, wp.min(1.0, x))\n", | ||
| " rate = params.avrami_k0 * x * (1.0 - chi / params.chi_max)\n", | ||
| " chi = chi + params.dt * params.avrami_n * rate\n", | ||
| " chi = wp.max(0.0, wp.min(params.chi_max, chi))\n", | ||
| "\n", | ||
| " chi_out[tid] = wp.float32(chi)\n", | ||
| "\n", |
There was a problem hiding this comment.
The crystallisation kinetics implementation in update_crystallinity appears to be physically incorrect and contradicts the docstring.
-
Temperature dependence: The docstring states that growth is fastest mid-way between
$T_g$ and$T_m$ . However, the code usesx = (temp - T_g) / (T_m - T_g), which makes the rate peak at$T_m$ . In reality, the driving force for crystallisation is undercooling ($T_m - T$ ), so the rate should be zero at$T_m$ . Consider using a term likex * (1.0 - x)or a more realistic Nakamura/Hoffman-Lauritzen model. -
Avrami exponent:
params.avrami_nis used as a linear multiplier for the rate. In the Avrami model,$n$ is an exponent that characterizes the dimensionality of growth. A linear scaling does not capture the sigmoidal nature of the transformation for$n > 1$ .
| "@wp.kernel\n", | ||
| "def init_temperature(\n", | ||
| " T: wp.array(dtype=wp.float32),\n", | ||
| " params: CoolingParams,\n", | ||
| "):\n", | ||
| " tid = wp.tid()\n", | ||
| " T[tid] = wp.float32(params.T_init)\n", | ||
| "\n", | ||
| "\n", | ||
| "@wp.kernel\n", | ||
| "def init_scalar(\n", | ||
| " a: wp.array(dtype=wp.float32),\n", | ||
| " value: float,\n", | ||
| "):\n", | ||
| " tid = wp.tid()\n", | ||
| " a[tid] = wp.float32(value)\n", | ||
| "\n", | ||
| "\n", | ||
| "@wp.kernel\n", | ||
| "def step_temperature(\n", | ||
| " T_in: wp.array(dtype=wp.float32),\n", | ||
| " T_out: wp.array(dtype=wp.float32),\n", | ||
| " config: DiscConfig,\n", | ||
| " params: CoolingParams,\n", | ||
| "):\n", | ||
| " \"\"\"Explicit finite-difference heat diffusion in cylindrical coordinates.\n", | ||
| "\n", | ||
| " Solves ∂T/∂t = α (∂²T/∂r² + (1/r)∂T/∂r + ∂²T/∂z²)\n", | ||
| " with Dirichlet mould-wall BCs on the top and bottom (z) faces and\n", | ||
| " Neumann (zero-flux) BCs on the axis (r=0) and outer radius.\n", | ||
| " \"\"\"\n", | ||
| " tid = wp.tid()\n", | ||
| " i = tid // config.nz\n", | ||
| " j = tid - i * config.nz\n", | ||
| "\n", | ||
| " alpha = config.k / (config.rho * config.cp)\n", | ||
| " dr = config.dr\n", | ||
| " dz = config.dz\n", | ||
| " r = (float(i) + 0.5) * dr\n", | ||
| "\n", | ||
| " # Dirichlet BC on top/bottom mould walls.\n", | ||
| " if j == 0 or j == config.nz - 1:\n", | ||
| " T_out[tid] = wp.float32(params.T_mold)\n", | ||
| " return\n", | ||
| "\n", | ||
| " # Neumann BC: mirror stencil at axis and outer edge.\n", | ||
| " im = clamp_i(i - 1, 0, config.nx - 1)\n", | ||
| " ip = clamp_i(i + 1, 0, config.nx - 1)\n", | ||
| " if i == 0:\n", | ||
| " im = 1\n", | ||
| " if i == config.nx - 1:\n", | ||
| " ip = config.nx - 2\n", | ||
| "\n", | ||
| " jm = j - 1\n", | ||
| " jp = j + 1\n", | ||
| "\n", | ||
| " Tc = T_in[idx(i, j, config.nz)]\n", | ||
| " Trm = T_in[idx(im, j, config.nz)]\n", | ||
| " Trp = T_in[idx(ip, j, config.nz)]\n", | ||
| " Tzm = T_in[idx(i, jm, config.nz)]\n", | ||
| " Tzp = T_in[idx(i, jp, config.nz)]\n", | ||
| "\n", | ||
| " d2Tdr2 = (Trp - 2.0 * Tc + Trm) / (dr * dr)\n", | ||
| " dTdr_over_r = 0.0\n", | ||
| " if i > 0:\n", | ||
| " dTdr_over_r = (Trp - Trm) / (2.0 * dr * r)\n", | ||
| " d2Tdz2 = (Tzp - 2.0 * Tc + Tzm) / (dz * dz)\n", | ||
| "\n", | ||
| " lap = d2Tdr2 + dTdr_over_r + d2Tdz2\n", | ||
| " Tnew = Tc + params.dt * alpha * lap\n", | ||
| " T_out[tid] = wp.float32(Tnew)\n", | ||
| "\n", | ||
| "\n", | ||
| "@wp.kernel\n", | ||
| "def update_crystallinity(\n", | ||
| " T: wp.array(dtype=wp.float32),\n", | ||
| " chi_in: wp.array(dtype=wp.float32),\n", | ||
| " chi_out: wp.array(dtype=wp.float32),\n", | ||
| " config: DiscConfig,\n", | ||
| " params: CoolingParams,\n", | ||
| "):\n", | ||
| " \"\"\"Simple Avrami-style crystallisation kinetics.\n", | ||
| "\n", | ||
| " Crystal growth is fastest mid-way between T_g and T_m and saturates\n", | ||
| " at chi_max. Replace with a Nakamura model for production use.\n", | ||
| " \"\"\"\n", | ||
| " tid = wp.tid()\n", | ||
| " temp = T[tid]\n", | ||
| " chi = chi_in[tid]\n", | ||
| "\n", | ||
| " if temp > config.T_g and temp < config.T_m:\n", | ||
| " x = (temp - config.T_g) / (config.T_m - config.T_g)\n", | ||
| " x = wp.max(0.0, wp.min(1.0, x))\n", | ||
| " rate = params.avrami_k0 * x * (1.0 - chi / params.chi_max)\n", | ||
| " chi = chi + params.dt * params.avrami_n * rate\n", | ||
| " chi = wp.max(0.0, wp.min(params.chi_max, chi))\n", | ||
| "\n", | ||
| " chi_out[tid] = wp.float32(chi)\n", | ||
| "\n", | ||
| "\n", | ||
| "@wp.kernel\n", | ||
| "def compute_warp_risk(\n", | ||
| " T: wp.array(dtype=wp.float32),\n", | ||
| " chi: wp.array(dtype=wp.float32),\n", | ||
| " warp_risk: wp.array(dtype=wp.float32),\n", | ||
| " config: DiscConfig,\n", | ||
| " params: CoolingParams,\n", | ||
| "):\n", | ||
| " \"\"\"Score each radial position by thermal gradient and crystallinity asymmetry.\n", | ||
| "\n", | ||
| " Only threads at the mid-plane (j == nz/2) write to warp_risk[i].\n", | ||
| " \"\"\"\n", | ||
| " tid = wp.tid()\n", | ||
| " i = tid // config.nz\n", | ||
| " j = tid - i * config.nz\n", | ||
| "\n", | ||
| " if j != config.nz // 2:\n", | ||
| " return\n", | ||
| "\n", | ||
| " top = T[idx(i, 0, config.nz)]\n", | ||
| " bot = T[idx(i, config.nz - 1, config.nz)]\n", | ||
| " mid = T[idx(i, j, config.nz)]\n", | ||
| "\n", | ||
| " chi_top = chi[idx(i, 1, config.nz)]\n", | ||
| " chi_bot = chi[idx(i, config.nz - 2, config.nz)]\n", | ||
| "\n", | ||
| " dT_thickness = wp.abs(top - bot)\n", | ||
| " dT_mid = wp.abs(mid - 0.5 * (top + bot))\n", | ||
| " dchi = wp.abs(chi_top - chi_bot)\n", | ||
| "\n", | ||
| " risk = (\n", | ||
| " params.warp_temp_coeff * (dT_thickness + dT_mid)\n", | ||
| " + params.warp_chi_coeff * dchi\n", | ||
| " )\n", | ||
| " warp_risk[i] = wp.float32(risk)" |
There was a problem hiding this comment.
Description
Checklist
Test plan
Bug fix
New feature / enhancement