Summary
topobench/data/utils/split_utils.py has two concurrency bugs in the shared code path used by random_splitting and k_fold_split. Both can surface when multiple workers (e.g. parallel Optuna trials) point at the same data_split_dir and reach the split-generation branch simultaneously, typically on the first run against an uncached dataset.
Bug 1. Directory-creation ToC-ToU race
if not os.path.isdir(split_dir):
os.makedirs(split_dir)
If two workers observe isdir == False within the same short window and both call os.makedirs, the second raises FileExistsError: [Errno 17]. One sweep trial crashes visibly during setup and is recorded as failed.
Bug 2. Non-atomic fold-file writes
np.savez(os.path.join(split_dir, f"{fold_n}.npz"), **split_idx)
If the process dies mid-write (SIGKILL, OOM, preemption), the canonical {fold_n}.npz is left partially written. Every subsequent run then loads the corrupt file and either raises cryptic BadZipFile / EOFError errors or returns an NpzFile with missing keys that downstream code misinterprets as valid splits. Persists until the directory is manually removed.
PR to follow.
Summary
topobench/data/utils/split_utils.pyhas two concurrency bugs in the shared code path used byrandom_splittingandk_fold_split. Both can surface when multiple workers (e.g. parallel Optuna trials) point at the samedata_split_dirand reach the split-generation branch simultaneously, typically on the first run against an uncached dataset.Bug 1. Directory-creation ToC-ToU race
If two workers observe
isdir == Falsewithin the same short window and both callos.makedirs, the second raisesFileExistsError: [Errno 17]. One sweep trial crashes visibly during setup and is recorded as failed.Bug 2. Non-atomic fold-file writes
If the process dies mid-write (SIGKILL, OOM, preemption), the canonical
{fold_n}.npzis left partially written. Every subsequent run then loads the corrupt file and either raises crypticBadZipFile/EOFErrorerrors or returns anNpzFilewith missing keys that downstream code misinterprets as valid splits. Persists until the directory is manually removed.PR to follow.