docs: add how to test network instability to devnet-runner skill#276
docs: add how to test network instability to devnet-runner skill#276MegaRedHand wants to merge 1 commit intomainfrom
Conversation
🤖 Kimi Code ReviewThis PR adds comprehensive documentation for testing network instability using Minor suggestions for robustness: File:
Nitpick:
The supermajority thresholds table (lines 176-180) correctly reflects the 3/4 requirement and distinguishes between pausing aggregators vs. non-aggregators, which is critical for accurate test results. Verdict: Documentation is clear, accurate, and operationally useful. The shell/awk scripts are appropriate for their debugging context. Addressing the division-by-zero edge case would be ideal but is not blocking. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code ReviewFindings
No Rust correctness or security issues are introduced by the PR itself; the problems are documentation accuracy and the risk of steering consensus testing into the wrong scenario. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
Greptile SummaryAdds documentation for simulating network instability in devnet testing — a new Confidence Score: 5/5Documentation-only PR; safe to merge. Remaining findings are P2 style suggestions. Both findings are P2: one is a missing qualifier that could confuse a reader, and one is a script race condition on live containers that is easily worked around. Neither blocks merge. references/instability-testing.md — minor wording and script robustness improvements
|
| Filename | Overview |
|---|---|
| .claude/skills/devnet-runner/SKILL.md | Added a "Testing Network Instability" section with a quick-start snippet and a link to the new reference file; no issues found. |
| .claude/skills/devnet-runner/references/instability-testing.md | New detailed reference file with scenarios, awk data-extraction scripts, and analysis notes; Scenario 2 omits "non-aggregator" qualifier for nodes to pause, and the data-extraction script calls docker logs twice which can produce inconsistent data on live containers. |
Sequence Diagram
sequenceDiagram
participant Tester
participant ethlambda_0
participant ethlambda_1
participant ethlambda_2
participant ethlambda_3 (aggregator)
Note over Tester,ethlambda_3 (aggregator): Healthy devnet — all 4 validators active, finalization advancing
Tester->>ethlambda_0: docker pause
Tester->>ethlambda_1: docker pause
Note over ethlambda_2,ethlambda_3 (aggregator): Only 2/4 validators active — supermajority lost
loop Every slot
ethlambda_2->>ethlambda_3 (aggregator): attestation (1 of 4)
ethlambda_3 (aggregator)->>ethlambda_3 (aggregator): aggregate (2/4 sigs only)
Note over ethlambda_2,ethlambda_3 (aggregator): Justified/finalized slots stall
end
Tester->>ethlambda_0: docker unpause
Tester->>ethlambda_1: docker unpause
Note over ethlambda_0,ethlambda_3 (aggregator): All 4 validators rejoin — finalization resumes within ~20 slots
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 149
Comment:
**Missing "non-aggregator" qualifier**
Scenario 2 says "Pause 2 nodes" without specifying they should be non-aggregator nodes. A reader following this literally might pause the aggregator, turning it into Scenario 3 (aggregator failure) rather than the intended finalization stall test. Every other pausing instruction in the file is explicit about which nodes to target.
```suggestion
3. Pause 2 non-aggregator nodes (e.g., `ethlambda_0` and `ethlambda_1`)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 94-95
Comment:
**Double `docker logs` call can produce inconsistent two-pass awk data**
The awk script runs `docker logs` twice — once for the `NR==FNR` attestation-count pass and once for the `Processed new block` pass. If the container is still producing logs between the two calls (e.g., you start extraction before pausing), the two streams can diverge and slots in the `att[]` map won't match the processing records, silently yielding `ac=0` for those slots.
Capturing logs to a temp file and feeding it twice avoids this:
```bash
tmp=$(mktemp)
docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > "$tmp"
awk -v node="$c" "..." "$tmp" "$tmp"
rm -f "$tmp"
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "docs: add how to test network instabilit..." | Re-trigger Greptile
| 2. Note the current finalized slot | ||
| 3. Pause 2 nodes | ||
| 4. Confirm finalization stalls (finalized slot stops advancing for 50+ slots) | ||
| 5. Unpause both nodes simultaneously |
There was a problem hiding this comment.
Missing "non-aggregator" qualifier
Scenario 2 says "Pause 2 nodes" without specifying they should be non-aggregator nodes. A reader following this literally might pause the aggregator, turning it into Scenario 3 (aggregator failure) rather than the intended finalization stall test. Every other pausing instruction in the file is explicit about which nodes to target.
| 5. Unpause both nodes simultaneously | |
| 3. Pause 2 non-aggregator nodes (e.g., `ethlambda_0` and `ethlambda_1`) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 149
Comment:
**Missing "non-aggregator" qualifier**
Scenario 2 says "Pause 2 nodes" without specifying they should be non-aggregator nodes. A reader following this literally might pause the aggregator, turning it into Scenario 3 (aggregator failure) rather than the intended finalization stall test. Every other pausing instruction in the file is explicit about which nodes to target.
```suggestion
3. Pause 2 non-aggregator nodes (e.g., `ethlambda_0` and `ethlambda_1`)
```
How can I resolve this? If you propose a fix, please make it concise.| " <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g") \ | ||
| <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g") |
There was a problem hiding this comment.
Double
docker logs call can produce inconsistent two-pass awk data
The awk script runs docker logs twice — once for the NR==FNR attestation-count pass and once for the Processed new block pass. If the container is still producing logs between the two calls (e.g., you start extraction before pausing), the two streams can diverge and slots in the att[] map won't match the processing records, silently yielding ac=0 for those slots.
Capturing logs to a temp file and feeding it twice avoids this:
tmp=$(mktemp)
docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > "$tmp"
awk -v node="$c" "..." "$tmp" "$tmp"
rm -f "$tmp"Prompt To Fix With AI
This is a comment left during a code review.
Path: .claude/skills/devnet-runner/references/instability-testing.md
Line: 94-95
Comment:
**Double `docker logs` call can produce inconsistent two-pass awk data**
The awk script runs `docker logs` twice — once for the `NR==FNR` attestation-count pass and once for the `Processed new block` pass. If the container is still producing logs between the two calls (e.g., you start extraction before pausing), the two streams can diverge and slots in the `att[]` map won't match the processing records, silently yielding `ac=0` for those slots.
Capturing logs to a temp file and feeding it twice avoids this:
```bash
tmp=$(mktemp)
docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > "$tmp"
awk -v node="$c" "..." "$tmp" "$tmp"
rm -f "$tmp"
```
How can I resolve this? If you propose a fix, please make it concise.a16e563 to
59244af
Compare
🤖 Claude Code ReviewNow I have all the information I need. Let me write the review. Review:
|
| Validators | ceil(2N/3) | Max paused | Table says (threshold / max paused) |
|---|---|---|---|
| 4 | 3 | 1 | 3 / 1 ✓ (coincides with 3/4) |
| 6 | 4 | 2 | 5 / 1 ✗ |
| 8 | 6 | 2 | 6 / 2 ✓ (coincides with 3/4) |
| 12 | 8 | 4 | 9 / 3 ✗ |
The 4-node and 8-node rows happen to be numerically identical under both formulas. The 6-node and 12-node rows are wrong. Anyone following this for a 6-node devnet would pause only 1 node expecting finality to stall, but with 4/6 validators still active the network will continue finalizing normally — the test would silently produce a false negative.
The label should read "Supermajority (⌈2N/3⌉)" and the affected rows corrected.
Moderate — Dead pipe in the extraction script
Lines 80–98 (block_data.csv extraction):
docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" | awk -v node="$c" "
...
" <(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g") \
<(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g")awk receives two explicit file arguments (the process substitutions). Per POSIX, when file arguments are given, awk ignores stdin. The first docker logs ... | sed ... | pipe produces output that is never consumed — it is dead code. docker logs is actually invoked three times per container but only two results are read.
Remove the leading pipe chain and invoke awk directly:
awk -v node="$c" "PROGRAM" \
<(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g") \
<(docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g")Or, better, capture once to avoid a double read (important for containers with long log histories):
docker logs "$c" 2>&1 | sed "s/\x1b\[[0-9;]*m//g" > /tmp/logs_$c
awk -v node="$c" "PROGRAM" /tmp/logs_$c /tmp/logs_$c
rm /tmp/logs_$cMinor — Aggregator index assumption is too implicit
Lines 12–14 and 22 assume ethlambda_3 is the aggregator ("typically ethlambda_3"). The long-lived devnet reference does use ethlambda_3 as the aggregator (confirmed at long-lived-devnet.md:83), but users running a local devnet via spin-node.sh may have a different layout. A user following the "Quick Start" without checking which node has --is-aggregator could accidentally pause the aggregator and observe aggregator-failure behavior while thinking they're running the supermajority-loss scenario.
Suggest promoting the "verify before pausing" step more prominently — e.g., add a concrete command before step 2:
# Find which node is the aggregator
docker inspect ethlambda_0 ethlambda_1 ethlambda_2 ethlambda_3 \
--format '{{.Name}}: {{range .Args}}{{.}} {{end}}' | grep is-aggregatorNits (no action required)
to_msfinal branch:gsub(/s/, "", raw)strips allscharacters, not just the trailing unit suffix.gsub(/s$/, "", raw)is safer, though in practice time strings from tracing (1.5s,42s) won't be ambiguous here.- The "Published attestation" grep (line 67) relies on the sed pattern not matching the
Published attestationline fromlib.rs:217(which lackstarget_slot), only matchingPublished attestation to gossipsubfromhandler.rs:165. Theawk 'NF==2'guard prevents incorrect output, but using the full stringPublished attestation to gossipsubwould make the intent explicit. - The benchmark numbers in the "Expected results" blocks (36 attestations, ~1400ms, ~290ms, ~4.8x) are from a specific measured run. Marking them explicitly as "measured on an 8-core machine" rather than presenting them as protocol invariants would prevent false positives during performance triage on different hardware. (The Codex review raised this; it's worth addressing.)
Summary: The supermajority table bug and the dead pipe are actionable before merge. The aggregator assumption is a usability concern worth addressing. The rest is polish.
Automated review by Claude (Anthropic) · sonnet · custom prompt
This PR updates the devnet-runner skill with how to simulate network instability through pausing containers.