Skip to content

[NONEVM-2809] [TXM] Implement exponential backoff for retries#728

Open
patricios-space wants to merge 8 commits intomainfrom
txm/feat/exponetial-backoff-on-retry
Open

[NONEVM-2809] [TXM] Implement exponential backoff for retries#728
patricios-space wants to merge 8 commits intomainfrom
txm/feat/exponetial-backoff-on-retry

Conversation

@patricios-space
Copy link
Copy Markdown
Collaborator

@patricios-space patricios-space commented Apr 29, 2026

@patricios-space patricios-space changed the title [NONEVM-2809] [TXM] Implement exponential backoff for retries instead of fixed interval [NONEVM-2809] [TXM] Implement exponential backoff for retries Apr 29, 2026
@patricios-space patricios-space force-pushed the txm/feat/exponetial-backoff-on-retry branch from d34c176 to 3256750 Compare April 29, 2026 18:27
@patricios-space patricios-space requested a review from Copilot April 29, 2026 18:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements exponential backoff (with jitter) for Transaction Manager broadcast retries to reduce retry storms and make retry behavior configurable.

Changes:

  • Switch retry logic from a fixed delay to exponential backoff and add jittered delays during retry waits.
  • Rename retry delay config to SendRetryBaseDelay and introduce SendRetryJitterPct with updated defaults.
  • Update config-related tests to use the new base-delay field name.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
pkg/txm/txm.go Applies exponential backoff + jitter to broadcast retry wait logic.
pkg/txm/config.go Renames retry delay config, adds jitter config, updates defaults/validation scaffolding.
pkg/txm/config_test.go Adjusts defaulting tests to the renamed retry delay field.
pkg/config/toml_test.go Updates TOML decoding assertions for the renamed retry delay field.
Comments suppressed due to low confidence (2)

pkg/txm/config.go:63

  • SendRetryJitterPct has a default value in DefaultConfigSet, but ApplyDefaults() never applies it. As a result, configs that omit the field will run with 0 jitter (contradicting the default/comment and reducing herd-avoidance). Add defaulting logic for SendRetryJitterPct (or make it a pointer if 0 is a meaningful explicit value).
	if c.SendRetryBaseDelay == nil {
		c.SendRetryBaseDelay = DefaultConfigSet.SendRetryBaseDelay
	}
	if c.MaxSendRetryAttempts == 0 {
		c.MaxSendRetryAttempts = DefaultConfigSet.MaxSendRetryAttempts
	}

pkg/txm/config.go:93

  • SendRetryJitterPct is used to compute a sleep duration, but it's never validated. Values < 0 or very large values can yield negative/overflowed durations (causing immediate retries or extremely long sleeps). Add validation that it is within a sane range (e.g., 0 <= pct && pct <= 1) and consider validating SendRetryBaseDelay is > 0 as well.
	if c.SendRetryBaseDelay == nil {
		err = errors.Join(err, config.ErrMissing{Name: "SendRetryDelay", Msg: "must be set"})
	}
	if c.MaxSendRetryAttempts == 0 {
		err = errors.Join(err, config.ErrInvalid{Name: "MaxSendRetryAttempts", Msg: "must be greater than 0"})
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/txm/txm.go
Comment thread pkg/txm/txm.go Outdated
Comment thread pkg/txm/config.go
Comment thread pkg/config/toml_test.go Outdated
Comment thread pkg/txm/config_test.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

pkg/txm/config_test.go:23

  • Config.ApplyDefaults now has defaulting logic for SendRetryJitterPct, but this test suite doesn't assert the default value is applied (or that a custom value is preserved). Adding assertions for SendRetryJitterPct here would help prevent regressions in the retry timing behavior.

		assert.Equal(t, DefaultConfigSet.BroadcastChanSize, cfg.BroadcastChanSize)
		assert.Equal(t, DefaultConfigSet.ConfirmPollInterval, cfg.ConfirmPollInterval)
		assert.Equal(t, DefaultConfigSet.SendRetryBaseDelay, cfg.SendRetryBaseDelay)
		assert.Equal(t, DefaultConfigSet.MaxSendRetryAttempts, cfg.MaxSendRetryAttempts)
		assert.Equal(t, DefaultConfigSet.TxExpiration, cfg.TxExpiration)
		assert.Equal(t, DefaultConfigSet.CleanupInterval, cfg.CleanupInterval)
	})

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/txm/config.go Outdated
Comment thread pkg/config/toml_test.go
Comment on lines 24 to +29
[TransactionManager]
BroadcastChanSize = 101
ConfirmPollInterval = '9s'
SendRetryDelay = '8s'
SendRetryBaseDelay = '8s'
MaxSendRetryAttempts = 7
SendRetryJitterPct = 0.2
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching the TOML key from SendRetryDelay to SendRetryBaseDelay is a breaking config change because the TOML decoder uses DisallowUnknownFields (old configs will now fail to start with an unknown field error). If backward compatibility is required, consider supporting both keys temporarily (e.g., keep the old field for decoding and map it to the new behavior, with deprecation messaging).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jadepark-dev do we need backwards compatibility here?

Copy link
Copy Markdown
Collaborator

@jadepark-dev jadepark-dev Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patricios-space Depends, in some cases we should update staging, prod-testnet, and the NOPs’ configurations before shipping.

  • Adding a new field with a correct default is backward compatible. If needed, we can fine-tune the value later by updating the configs.
  • Renaming an existing config field can also have a default. However, it's a startup-breaking config change for any deployed TOML that still contains SendRetryDelay. NewDecodedTOMLConfig uses DisallowUnknownFields, so those configs fail before defaults can help

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'll keep the old name. Is not that big of a change

Comment thread pkg/txm/txm.go Outdated
Comment thread pkg/txm/config.go Outdated
Comment thread pkg/txm/config.go Outdated
Comment thread pkg/txm/config.go Outdated
@patricios-space patricios-space force-pushed the txm/feat/exponetial-backoff-on-retry branch from a7373ad to 1cd4b20 Compare April 29, 2026 21:52
@patricios-space patricios-space requested a review from Copilot April 29, 2026 22:01
@patricios-space patricios-space marked this pull request as ready for review April 29, 2026 22:04
@patricios-space patricios-space requested a review from a team as a code owner April 29, 2026 22:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/txm/txm.go
Comment thread pkg/txm/config_test.go Outdated
Comment thread pkg/txm/config.go Outdated
@smartcontractkit smartcontractkit deleted a comment from Copilot AI Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants