Skip to content

Rework call cache eviction to use ctid-based join deletes#6477

Open
lutter wants to merge 4 commits intomasterfrom
lutter/call-cache
Open

Rework call cache eviction to use ctid-based join deletes#6477
lutter wants to merge 4 commits intomasterfrom
lutter/call-cache

Conversation

@lutter
Copy link
Copy Markdown
Collaborator

@lutter lutter commented Apr 3, 2026

The old eviction approach ran WHERE contract_address = ANY(...) against a multi-billion-row call_cache table repeatedly, requiring expensive index scans for each batch of contracts. The new ctid-based join lets Postgres scan the small filtered call_meta result, join it against call_cache using the contract_address index, and delete by physical row address — the fastest possible delete path.

The old --ttl-max-contracts parameter limited how many contracts were fetched and iterated per invocation, which no longer maps to anything meaningful with the join-based approach. The new --max-contracts instead adjusts the TTL cutoff date so that at most that many contracts fall below it, giving operators a way to cap the scope of a single eviction run.

Changes

  • Replace the old two-loop eviction strategy (fetch stale contracts, then delete their cache rows one-by-one) with a single-loop approach that joins call_cache to call_meta and deletes by ctid
  • Reintroduce --max-contracts with different semantics: instead of limiting how many contracts are processed per batch, it now computes an effective TTL so that at most N contracts are evicted total
  • Remove the GRAPH_STORE_STALE_CALL_CACHE_CONTRACTS_BATCH_SIZE env var since we do not batch by contract any more

lutter added 4 commits April 2, 2026 18:30
The ttl_max_contracts parameter limited how many contracts got processed
per invocation. With the upcoming switch to a more efficient delete
strategy that doesn't iterate by contract, this parameter no longer
maps to anything meaningful. Remove it from the trait, CLI, and all
implementations.
Replace the two-loop structure (fetch stale contracts, then delete
their call_cache rows in tiny batches) with a single adaptive loop
that joins call_cache to call_meta directly and deletes by ctid.

The old approach did `WHERE contract_address = ANY(N addresses)` on
the 6.9B row call_cache table repeatedly, which required expensive
index scans. The new approach lets Postgres join the small filtered
call_meta result against call_cache using the contract_address index,
then deletes by physical row address (ctid), which is the fastest
possible delete path.

Also removes the GRAPH_STORE_STALE_CALL_CACHE_CONTRACTS_BATCH_SIZE
env var since batch size is now fully controlled by AdaptiveBatchSize.
…ache

When --max-contracts is passed alongside --ttl-days, compute an effective
TTL that limits eviction to at most that many contracts. The effective TTL
is determined by looking at the (max_contracts+1)th oldest stale entry in
call_meta; if it exists, the cutoff is set to its accessed_at date so that
contracts at that date are excluded. When multiple contracts share the
boundary date, fewer than max_contracts may be deleted since the cutoff
operates at date granularity.
Return a StaleCallCacheResult struct with effective_ttl_days,
cache_entries_deleted, and contracts_deleted so the graphman command
can report what actually happened to the user.
@lutter lutter requested a review from dimitrovmaksim April 3, 2026 02:28
Copy link
Copy Markdown
Member

@dimitrovmaksim dimitrovmaksim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants