Skip to content

gossip_signatures grows unbounded during finalization stall #263

@MegaRedHand

Description

@MegaRedHand

Summary

The gossip_signatures HashMap in crates/storage/src/store.rs:265 has no size cap. It is only pruned when finalization advances (prune_gossip_signatures at line 607, called from update_checkpoints at line 508). During a finalization stall, every gossip attestation from every validator at every slot is inserted and never removed.

Observed Impact

In the devnet4 test_1 run (2026-04-07/08, ~18.5h), finalization stalled at slot 10733 for 6+ hours. During this period:

  • Node 0 (aggregator) accumulated attestations indefinitely, with attestation_count in produced blocks growing monotonically from ~5 to 797
  • gossip_signatures entries piled up across all unfinalized slots
  • This contributed to the RocksDB file descriptor exhaustion crash on node 0

Root Cause

gossip_signatures relies entirely on finalization-based pruning:

pub fn prune_gossip_signatures(&mut self, finalized_slot: u64) -> usize {
    let mut gossip = self.gossip_signatures.lock().unwrap();
    gossip.retain(|_, entry| entry.data.slot > finalized_slot);
    // ...
}

When finalization stops, nothing is ever evicted. With 4-6 validators producing attestations every 4 seconds, this is ~5,400 entries/hour, each carrying a ValidatorSignature (~3 KB XMSS signature).

By contrast, known_payloads (cap=512) and new_payloads (cap=64) are correctly bounded with FIFO eviction via PayloadBuffer.

Suggested Fix

Add a hard cap to gossip_signatures with FIFO or slot-based eviction, similar to the PayloadBuffer pattern used for aggregated payloads. This ensures bounded memory usage regardless of finalization progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions