Skip to content

docs: add how to test network instability to devnet-runner skill#276

Open
MegaRedHand wants to merge 1 commit intomainfrom
docs/instability-testing
Open

docs: add how to test network instability to devnet-runner skill#276
MegaRedHand wants to merge 1 commit intomainfrom
docs/instability-testing

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

This PR updates the devnet-runner skill with how to simulate network instability through pausing containers.

@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

This PR adds comprehensive documentation for testing network instability using docker pause. No Rust code changes are present. The documentation is technically accurate regarding the 3/4 supermajority consensus behavior.

Minor suggestions for robustness:

File: .claude/skills/devnet-runner/references/instability-testing.md

  1. Line ~162 (Quick Stats): The awk command divides by zero if no post-pause data exists. Add a guard:

    END{if(n>0) print "avg=" sum/n "ms"; else print "no data"}
  2. Line ~82 (Extraction script): The match() function sets RSTART=0 on failure. Subsequent substr() with RLENGTH-5 yields negative length (undefined behavior in POSIX awk). Consider validating matches:

    match($0, /slot=[0-9]+/); 
    if(RSTART > 0) { s=substr($0, RSTART+5, RLENGTH-5) } else { next }
  3. Line ~95 (to_ms function): The final gsub(/s/, "", raw) removes all 's' characters, not just the unit suffix. While unlikely to cause issues with standard durations, gsub(/s$/, "", raw) is more precise.

Nitpick:

  • Line ~78: The two-pass awk approach reads the full docker logs twice via process substitution. This doubles I/O but is acceptable for devnet debugging. Consider noting this inefficiency or using a single-pass approach with getline if processing large logs.

The supermajority thresholds table (lines 176-180) correctly reflects the 3/4 requirement and distinguishes between pausing aggregators vs. non-aggregators, which is critical for accurate test results.

Verdict: Documentation is clear, accurate, and operationally useful. The shell/awk scripts are appropriate for their debugging context. Addressing the division-by-zero edge case would be ideal but is not blocking.


Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

Findings

  1. The new “Supermajority thresholds” table is inconsistent with the client’s actual justification rule. .claude/skills/devnet-runner/references/instability-testing.md#L186 labels the threshold as 3/4 and gives 6 -> 5, 12 -> 9, but the state transition uses 3 * vote_count >= 2 * validator_count, i.e. ceil(2N/3). See crates/blockchain/state_transition/src/lib.rs#L305. As written, the guide will tell operators to expect non-finality in cases where the client should still finalize.

  2. The instability guide hard-codes the aggregator as ethlambda_3 and tells users to pause ethlambda_0 ethlambda_1, but that is not safe for all devnet setups. .claude/skills/devnet-runner/references/instability-testing.md#L12 and .claude/skills/devnet-runner/references/instability-testing.md#L22 assume the long-lived-devnet layout, where ethlambda_3 is the aggregator (.claude/skills/devnet-runner/references/long-lived-devnet.md#L77), but the repo’s preview config uses ethlambda_0 as the aggregator (preview-config.nix#L71). Since the doc says “local or long-lived”, a local user can follow the quick start and accidentally pause the aggregator, turning the test into aggregator-failure instead of the intended supermajority-loss scenario.

  3. The guide presents 36 attestations, ~1400ms, ~290ms, and ~4.8x as expected steady-state behavior, but those are not protocol invariants. .claude/skills/devnet-runner/references/instability-testing.md#L48 and .claude/skills/devnet-runner/references/instability-testing.md#L131 read as deterministic, while block packing is driven by a 9 MiB proof-byte budget and whatever proof sizes happen to be in the pool, not a fixed attestation cap; see crates/blockchain/src/store.rs#L36 and crates/blockchain/src/store.rs#L1062. This should be framed as one measured run on a specific machine/config, otherwise the doc will create false positives during perf triage.

No Rust correctness or security issues are introduced by the PR itself; the problems are documentation accuracy and the risk of steering consensus testing into the wrong scenario.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

Adds documentation for simulating network instability in devnet testing — a new Testing Network Instability section in SKILL.md and a detailed references/instability-testing.md covering three test scenarios, observable metrics, an awk data-extraction script, and a docker pause vs docker stop comparison table. The content is accurate and well-structured; the two P2 notes (Scenario 2 missing the "non-aggregator" qualifier, and the double docker logs race) are minor cleanup items that don't block the usefulness of this reference.

Confidence Score: 5/5

Documentation-only PR; safe to merge. Remaining findings are P2 style suggestions.

Both findings are P2: one is a missing qualifier that could confuse a reader, and one is a script race condition on live containers that is easily worked around. Neither blocks merge.

references/instability-testing.md — minor wording and script robustness improvements

Important Files Changed

Filename Overview
.claude/skills/devnet-runner/SKILL.md Added a "Testing Network Instability" section with a quick-start snippet and a link to the new reference file; no issues found.
.claude/skills/devnet-runner/references/instability-testing.md New detailed reference file with scenarios, awk data-extraction scripts, and analysis notes; Scenario 2 omits "non-aggregator" qualifier for nodes to pause, and the data-extraction script calls docker logs twice which can produce inconsistent data on live containers.

Sequence Diagram

sequenceDiagram
    participant Tester
    participant ethlambda_0
    participant ethlambda_1
    participant ethlambda_2
    participant ethlambda_3 (aggregator)

    Note over Tester,ethlambda_3 (aggregator): Healthy devnet — all 4 validators active, finalization advancing

    Tester->>ethlambda_0: docker pause
    Tester->>ethlambda_1: docker pause

    Note over ethlambda_2,ethlambda_3 (aggregator): Only 2/4 validators active — supermajority lost
    loop Every slot
        ethlambda_2->>ethlambda_3 (aggregator): attestation (1 of 4)
        ethlambda_3 (aggregator)->>ethlambda_3 (aggregator): aggregate (2/4 sigs only)
        Note over ethlambda_2,ethlambda_3 (aggregator): Justified/finalized slots stall
    end

    Tester->>ethlambda_0: docker unpause
    Tester->>ethlambda_1: docker unpause
    Note over ethlambda_0,ethlambda_3 (aggregator): All 4 validators rejoin — finalization resumes within ~20 slots
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 149

Comment:
**Missing "non-aggregator" qualifier**

Scenario 2 says "Pause 2 nodes" without specifying they should be non-aggregator nodes. A reader following this literally might pause the aggregator, turning it into Scenario 3 (aggregator failure) rather than the intended finalization stall test. Every other pausing instruction in the file is explicit about which nodes to target.

```suggestion
3. Pause 2 non-aggregator nodes (e.g., `ethlambda_0` and `ethlambda_1`)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 94-95

Comment:
**Double `docker logs` call can produce inconsistent two-pass awk data**

The awk script runs `docker logs` twice — once for the `NR==FNR` attestation-count pass and once for the `Processed new block` pass. If the container is still producing logs between the two calls (e.g., you start extraction before pausing), the two streams can diverge and slots in the `att[]` map won't match the processing records, silently yielding `ac=0` for those slots.

Capturing logs to a temp file and feeding it twice avoids this:
```bash
  tmp=$(mktemp)
  docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > "$tmp"
  awk -v node="$c" "..." "$tmp" "$tmp"
  rm -f "$tmp"
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "docs: add how to test network instabilit..." | Re-trigger Greptile

2. Note the current finalized slot
3. Pause 2 nodes
4. Confirm finalization stalls (finalized slot stops advancing for 50+ slots)
5. Unpause both nodes simultaneously
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing "non-aggregator" qualifier

Scenario 2 says "Pause 2 nodes" without specifying they should be non-aggregator nodes. A reader following this literally might pause the aggregator, turning it into Scenario 3 (aggregator failure) rather than the intended finalization stall test. Every other pausing instruction in the file is explicit about which nodes to target.

Suggested change
5. Unpause both nodes simultaneously
3. Pause 2 non-aggregator nodes (e.g., `ethlambda_0` and `ethlambda_1`)
Prompt To Fix With AI
This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 149

Comment:
**Missing "non-aggregator" qualifier**

Scenario 2 says "Pause 2 nodes" without specifying they should be non-aggregator nodes. A reader following this literally might pause the aggregator, turning it into Scenario 3 (aggregator failure) rather than the intended finalization stall test. Every other pausing instruction in the file is explicit about which nodes to target.

```suggestion
3. Pause 2 non-aggregator nodes (e.g., `ethlambda_0` and `ethlambda_1`)
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +94 to +95
" <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g") \
<(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double docker logs call can produce inconsistent two-pass awk data

The awk script runs docker logs twice — once for the NR==FNR attestation-count pass and once for the Processed new block pass. If the container is still producing logs between the two calls (e.g., you start extraction before pausing), the two streams can diverge and slots in the att[] map won't match the processing records, silently yielding ac=0 for those slots.

Capturing logs to a temp file and feeding it twice avoids this:

  tmp=$(mktemp)
  docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > "$tmp"
  awk -v node="$c" "..." "$tmp" "$tmp"
  rm -f "$tmp"
Prompt To Fix With AI
This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 94-95

Comment:
**Double `docker logs` call can produce inconsistent two-pass awk data**

The awk script runs `docker logs` twice — once for the `NR==FNR` attestation-count pass and once for the `Processed new block` pass. If the container is still producing logs between the two calls (e.g., you start extraction before pausing), the two streams can diverge and slots in the `att[]` map won't match the processing records, silently yielding `ac=0` for those slots.

Capturing logs to a temp file and feeding it twice avoids this:
```bash
  tmp=$(mktemp)
  docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > "$tmp"
  awk -v node="$c" "..." "$tmp" "$tmp"
  rm -f "$tmp"
```

How can I resolve this? If you propose a fix, please make it concise.

@MegaRedHand MegaRedHand force-pushed the docs/instability-testing branch from a16e563 to 59244af Compare April 10, 2026 22:15
@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have all the information I need. Let me write the review.


Review: docs: add how to test network instability to devnet-runner skill

This is a documentation-only PR. No production Rust code is changed. The structure is good — the new reference file is well-organized, explains the "why" behind each scenario, and correctly identifies the aggregator-vs-non-aggregator distinction. The quick-start is useful. Three issues need addressing before merge.


Critical — Wrong supermajority threshold

The table in instability-testing.md (lines 176–180) labels the threshold column "Supermajority (3/4)" and lists wrong values for 6 and 12 validators.

The actual code in crates/blockchain/state_transition/src/lib.rs:305 uses:

if 3 * vote_count >= 2 * validator_count {

This is a 2/3 supermajority (ceil(2N/3)), not 3/4. Correct values:

Validators ceil(2N/3) Max paused Table says (threshold / max paused)
4 3 1 3 / 1 ✓ (coincides with 3/4)
6 4 2 5 / 1 ✗
8 6 2 6 / 2 ✓ (coincides with 3/4)
12 8 4 9 / 3 ✗

The 4-node and 8-node rows happen to be numerically identical under both formulas. The 6-node and 12-node rows are wrong. Anyone following this for a 6-node devnet would pause only 1 node expecting finality to stall, but with 4/6 validators still active the network will continue finalizing normally — the test would silently produce a false negative.

The label should read "Supermajority (⌈2N/3⌉)" and the affected rows corrected.


Moderate — Dead pipe in the extraction script

Lines 80–98 (block_data.csv extraction):

docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" | awk -v node="$c" "
  ...
" <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g") \
  <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g")

awk receives two explicit file arguments (the process substitutions). Per POSIX, when file arguments are given, awk ignores stdin. The first docker logs ... | sed ... | pipe produces output that is never consumed — it is dead code. docker logs is actually invoked three times per container but only two results are read.

Remove the leading pipe chain and invoke awk directly:

awk -v node="$c" "PROGRAM" \
  <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g") \
  <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g")

Or, better, capture once to avoid a double read (important for containers with long log histories):

docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > /tmp/logs_$c
awk -v node="$c" "PROGRAM" /tmp/logs_$c /tmp/logs_$c
rm /tmp/logs_$c

Minor — Aggregator index assumption is too implicit

Lines 12–14 and 22 assume ethlambda_3 is the aggregator ("typically ethlambda_3"). The long-lived devnet reference does use ethlambda_3 as the aggregator (confirmed at long-lived-devnet.md:83), but users running a local devnet via spin-node.sh may have a different layout. A user following the "Quick Start" without checking which node has --is-aggregator could accidentally pause the aggregator and observe aggregator-failure behavior while thinking they're running the supermajority-loss scenario.

Suggest promoting the "verify before pausing" step more prominently — e.g., add a concrete command before step 2:

# Find which node is the aggregator
docker inspect ethlambda_0 ethlambda_1 ethlambda_2 ethlambda_3 \
  --format '{{.Name}}: {{range .Args}}{{.}} {{end}}' | grep is-aggregator

Nits (no action required)

  • to_ms final branch: gsub(/s/, "", raw) strips all s characters, not just the trailing unit suffix. gsub(/s$/, "", raw) is safer, though in practice time strings from tracing (1.5s, 42s) won't be ambiguous here.
  • The "Published attestation" grep (line 67) relies on the sed pattern not matching the Published attestation line from lib.rs:217 (which lacks target_slot), only matching Published attestation to gossipsub from handler.rs:165. The awk 'NF==2' guard prevents incorrect output, but using the full string Published attestation to gossipsub would make the intent explicit.
  • The benchmark numbers in the "Expected results" blocks (36 attestations, ~1400ms, ~290ms, ~4.8x) are from a specific measured run. Marking them explicitly as "measured on an 8-core machine" rather than presenting them as protocol invariants would prevent false positives during performance triage on different hardware. (The Codex review raised this; it's worth addressing.)

Summary: The supermajority table bug and the dead pipe are actionable before merge. The aggregator assumption is a usability concern worth addressing. The rest is polish.


Automated review by Claude (Anthropic) · sonnet · custom prompt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant