Skip to content

Segmentation fault in json_decode on Python 3.14 — Struct_alloc memset clobbers inline values #989

@alldefector

Description

@alldefector

Description

We're seeing intermittent SIGSEGV crashes in json_decode on Python 3.14.0rc3 with msgspec 0.20.0, running under a threaded ASGI server (FastAPI + gunicorn).

Environment

  • msgspec: 0.20.0
  • Python: 3.14.0rc3 (CPython, GIL enabled, not free-threaded)
  • OS: Linux x86_64 (GCP)
  • Server: FastAPI with gunicorn workers, concurrent JSON decoding

Crash details

Every crash hits the same offset in _core.cpython-314-x86_64-linux-gnu.so:

dmesg:
python3[PID]: segfault at 20 ip 00007f...3a3e2 sp 00007f... error 4 in _core.cpython-314-x86_64-linux-gnu.so

Using nm -n, the faulting address (0x3a3e2) is inside json_decode (0x396900x3c730).

The segfault at 20 signature means dereferencing NULL + 0x20 — reading a field at offset 32 from a NULL pointer.

Root cause analysis

The crash appears to be caused by Struct_alloc in _core.c:

static PyObject *
Struct_alloc(PyTypeObject *type) {
    // ...
    obj = PyObject_GC_New(PyObject, type);
    memset((char *)obj + sizeof(PyObject), '\0', type->tp_basicsize - sizeof(PyObject));
    return obj;
}

On Python 3.14, PyObject_GC_New initializes managed dict / inline values pointers (see cpython@gc.c#L2377-L2379). The subsequent memset zeroes out that initialization. When CPython later tries to materialize the managed dict (e.g., during attribute access, hasattr, or GC traversal), it reads the zeroed pointer and dereferences NULL + 0x20.

This is related to #910 and #868 (segfaults on Python 3.13 from the same Struct_alloc code), though the mechanism differs slightly: on 3.13, PyObject_GC_New didn't initialize inline values at all; on 3.14, it does initialize them but memset clobbers the result.

Likely fix

PR #960 by @shadchin replaces Struct_alloc to use type->tp_alloc(type, type->tp_itemsize) instead of PyObject_GC_New + memset, which would fix this on both 3.13 and 3.14.

Reproducing

The crash is intermittent — it depends on which struct types are decoded and whether the code path triggers managed dict materialization. It's more likely under concurrent load with diverse struct types. We have not been able to produce a minimal reproducer outside of production, but the crash signature is 100% consistent (same offset, same segfault at 20).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions