Skip to content

global_evaluate: shape mismatch writing results back in parallel (DDt semi-Lagrangian) #113

@lmoresi

Description

@lmoresi

Summary

global_evaluate returns an array with fewer rows than the local partition expects when called from SemiLagrangian_DDt.update_pre_solve() in parallel. The upstream points migrate across partitions during the backward-advection step, but the result array doesn't match the local swarm variable size.

Error

File "underworld3/swarm.py", line 639, in __setitem__
    modified_data[key] = value
ValueError: could not broadcast input array from shape (1544,1,1) into shape (1747,1,1)

Stack trace

_run_w2_w3_w4b.py → adv.solve(timestep=dt)
  → solvers.py:2293 → self.DuDt.update_pre_solve()
    → ddt.py:1452 → uw.function.global_evaluate(self.V_fn, mid_pt_coords)
      → functions_unit_system.py:530 → _global_evaluate_nd()
        → _function.pyx:496 → swarm.__setitem__
          → nd_array_callback.py:539 → ValueError: shape (1544,1,1) into (1747,1,1)

Reproduction

import underworld3 as uw
import numpy as np
import sympy
from pathlib import Path

# Requires a checkpointed adapted mesh — or create one:
uw.reset_default_model()
model = uw.Model()
model.set_reference_quantities(
    length=uw.quantity(1000, "km"),
    viscosity=uw.quantity(1e21, "Pa.s"),
    diffusivity=uw.quantity(1e-6, "m**2/s"),
)

mesh = uw.meshing.RegionalGeographicBox(
    lon_range=(135.5, 137.5), lat_range=(-34.5, -33.0),
    depth_range=(uw.quantity(0, "km"), uw.quantity(50, "km")),
    ellipsoid="WGS84", numElements=(28, 28, 14), simplex=True,
)

# Needs a non-zero velocity field
v = uw.discretisation.MeshVariable("v", mesh, mesh.dim, degree=1,
                                    continuous=True, units="m/s")
v.data[:] = 0.1  # uniform for testing

C = uw.discretisation.MeshVariable("C", mesh, 1, degree=1)
adv = uw.systems.AdvDiffusion(mesh, u_Field=C, V_fn=v)
adv.constitutive_model = uw.constitutive_models.DiffusionModel
adv.constitutive_model.Parameters.diffusivity = uw.quantity(1e-10, "m**2/s")
adv.add_essential_bc(1.0, "Bottom")
adv.add_essential_bc(0.0, "Surface")

# This fails on np > 1:
adv.solve(timestep=uw.quantity(200, "Myr"), zero_init_guess=True)

Run with: mpirun -np 8 python repro.py

Works fine serially. Fails in parallel on any mesh with enough nodes that the semi-Lagrangian upstream points cross partition boundaries.

Analysis

The DDt backward-advects psi_star points by v * dt. Some points land in a different rank's partition. global_evaluate communicates the values across ranks, but the returned array has the wrong number of rows — it returns values for fewer points than the local partition owns.

The mismatch (1544 vs 1747) suggests ~200 points migrated out of rank 4's partition, and the result array only contains values for points that stayed local.

Impact

AdvDiffusion is unusable in parallel. This blocks all time-dependent scalar transport on multi-core runs.

Context

Discovered in the H2Ex fault-flow workflow (feature/fault-system-workflow branch). The Stokes and Darcy solvers work correctly in parallel — only the AdvDiffusion DDt step fails.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions