-
Notifications
You must be signed in to change notification settings - Fork 142
Segmentation fault in json_decode on Python 3.14 — Struct_alloc memset clobbers inline values #989
Description
Description
We're seeing intermittent SIGSEGV crashes in json_decode on Python 3.14.0rc3 with msgspec 0.20.0, running under a threaded ASGI server (FastAPI + gunicorn).
Environment
- msgspec: 0.20.0
- Python: 3.14.0rc3 (CPython, GIL enabled, not free-threaded)
- OS: Linux x86_64 (GCP)
- Server: FastAPI with gunicorn workers, concurrent JSON decoding
Crash details
Every crash hits the same offset in _core.cpython-314-x86_64-linux-gnu.so:
dmesg:
python3[PID]: segfault at 20 ip 00007f...3a3e2 sp 00007f... error 4 in _core.cpython-314-x86_64-linux-gnu.so
Using nm -n, the faulting address (0x3a3e2) is inside json_decode (0x39690–0x3c730).
The segfault at 20 signature means dereferencing NULL + 0x20 — reading a field at offset 32 from a NULL pointer.
Root cause analysis
The crash appears to be caused by Struct_alloc in _core.c:
static PyObject *
Struct_alloc(PyTypeObject *type) {
// ...
obj = PyObject_GC_New(PyObject, type);
memset((char *)obj + sizeof(PyObject), '\0', type->tp_basicsize - sizeof(PyObject));
return obj;
}On Python 3.14, PyObject_GC_New initializes managed dict / inline values pointers (see cpython@gc.c#L2377-L2379). The subsequent memset zeroes out that initialization. When CPython later tries to materialize the managed dict (e.g., during attribute access, hasattr, or GC traversal), it reads the zeroed pointer and dereferences NULL + 0x20.
This is related to #910 and #868 (segfaults on Python 3.13 from the same Struct_alloc code), though the mechanism differs slightly: on 3.13, PyObject_GC_New didn't initialize inline values at all; on 3.14, it does initialize them but memset clobbers the result.
Likely fix
PR #960 by @shadchin replaces Struct_alloc to use type->tp_alloc(type, type->tp_itemsize) instead of PyObject_GC_New + memset, which would fix this on both 3.13 and 3.14.
Reproducing
The crash is intermittent — it depends on which struct types are decoded and whether the code path triggers managed dict materialization. It's more likely under concurrent load with diverse struct types. We have not been able to produce a minimal reproducer outside of production, but the crash signature is 100% consistent (same offset, same segfault at 20).