Skip to content

ENG-157: dispersal self-healing cron with SUSPENSE escalation#241

Merged
Connorbelez merged 2 commits intomainfrom
ENG-157
Mar 21, 2026
Merged

ENG-157: dispersal self-healing cron with SUSPENSE escalation#241
Connorbelez merged 2 commits intomainfrom
ENG-157

Conversation

@Connorbelez
Copy link
Copy Markdown
Owner

@Connorbelez Connorbelez commented Mar 21, 2026

Detect settled obligations missing dispersal entries and retrigger
creation idempotently. After 3 failed attempts, escalate to SUSPENSE
via a new SUSPENSE_ESCALATED cash ledger entry type.

  • Add dispersalHealingAttempts table for retry tracking
  • Add SUSPENSE_ESCALATED entry type (types, schema, postEntry validator)
  • Add findSettledWithoutDispersals query, retriggerDispersal mutation
  • Add 15-minute interval self-healing cron
  • Fix cron stagger: reconciliation moved from 06:00 to 07:00 UTC
  • Add getSettledObligations + journal amount internal queries
  • 8 new tests covering detection, retry, escalation, and resolution

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Summary by CodeRabbit

Release Notes

  • New Features

    • Automated dispersal self-healing system detects settled obligations missing dispersal entries and automatically attempts recovery through scheduled maintenance every 15 minutes
    • Unresolved recovery attempts escalate to suspense accounts for manual review
  • Improvements

    • Daily reconciliation check rescheduled to 07:00 UTC (previously 06:00 UTC)

Detect settled obligations missing dispersal entries and retrigger
creation idempotently. After 3 failed attempts, escalate to SUSPENSE
via a new SUSPENSE_ESCALATED cash ledger entry type.

- Add dispersalHealingAttempts table for retry tracking
- Add SUSPENSE_ESCALATED entry type (types, schema, postEntry validator)
- Add findSettledWithoutDispersals query, retriggerDispersal mutation
- Add 15-minute interval self-healing cron
- Fix cron stagger: reconciliation moved from 06:00 to 07:00 UTC
- Add getSettledObligations + journal amount internal queries
- 8 new tests covering detection, retry, escalation, and resolution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear bot commented Mar 21, 2026

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Connorbelez, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 21, 2026

Warning

Rate limit exceeded

@Connorbelez has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 7 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1095be7-2e12-4c05-a2ab-dc4f1389895f

📥 Commits

Reviewing files that changed from the base of the PR and between 7b130cc and a65add7.

📒 Files selected for processing (3)
  • convex/dispersal/selfHealing.ts
  • convex/dispersal/selfHealingTypes.ts
  • convex/payments/cashLedger/postEntry.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ENG-157

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Owner Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Connorbelez Connorbelez marked this pull request as ready for review March 21, 2026 19:55
Copilot AI review requested due to automatic review settings March 21, 2026 19:55
Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connorbelez has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
convex/dispersal/__tests__/reconciliation.test.ts (1)

52-53: Derive test result interfaces from query return types to stay in sync when status values expand.

Currently dispersalStatusValidator is v.literal("pending") (Phase 1), and the test interfaces correctly reflect this. However, the validator comment indicates Phase 2 will add "disbursed" | "failed". When that happens, getDisbursementHistory and getDispersalsByMortgage will return multiple status values (they don't filter by status), but the test interfaces will still expect only "pending" — risking type misalignment. Instead of hardcoding status in test result interfaces, derive them from the actual query return types to automatically capture schema changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/dispersal/__tests__/reconciliation.test.ts` around lines 52 - 53, The
test's result interfaces currently hardcode status = "pending" while
dispersalStatusValidator and production queries (getDisbursementHistory,
getDispersalsByMortgage) may later include "disbursed" | "failed"; update the
tests to derive the status type from the actual query return types instead of
hardcoding it — e.g., import or reference the return type of
getDisbursementHistory/getDispersalsByMortgage (or the schema type used by
dispersalStatusValidator) and use TypeScript's ReturnType/Awaited utilities or
the exported response types to form the test interfaces so they automatically
reflect added status values and stay in sync with dispersalStatusValidator and
the query results.
convex/payments/obligations/queries.ts (1)

86-93: Consider paginating settled-obligation scans before volume grows.

The unbounded .collect() is fine for now, but this cron-facing query will eventually need cursor-based pagination/chunking to avoid long scans and memory pressure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/payments/obligations/queries.ts` around lines 86 - 93,
getSettledObligations currently performs an unbounded scan via
ctx.db.query("obligations").withIndex("by_status", ...).collect(), which will
not scale; change the handler to accept pagination args (e.g., limit and cursor)
and replace the unbounded .collect() with a cursor/limit-based scan using the
query on "obligations" with .withIndex("by_status") and pagination primitives
(take/limit and startAfter or an iterator) to return at most the requested chunk
and a nextCursor token; ensure the exported function signature
(getSettledObligations) and its args reflect limit/cursor and the handler
returns {items, nextCursor} so callers (cron) can iterate safely.
convex/payments/cashLedger/reconciliation.ts (1)

72-80: Consider adding a safety guard for large settlement amounts, or document the expected upper bound.

The Number(amount) conversion at line 79 is intentional (per the comment at line 70 — converting bigint to number for cross-boundary use in actions). However, if settlement amounts ever exceed Number.MAX_SAFE_INTEGER (~9 quadrillion cents), precision will be silently lost. Either add a bounds check to reject unsafe values, or document why amounts in this system are guaranteed to stay within safe integer range.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/payments/cashLedger/reconciliation.ts` around lines 72 - 80,
getJournalSettledAmountForObligationInternal currently converts a bigint to
Number with Number(amount), risking silent precision loss if the value exceeds
Number.MAX_SAFE_INTEGER; add a safety check after calling
getJournalSettledAmountForObligation that compares the returned bigint against
BigInt(Number.MAX_SAFE_INTEGER) and either throw/error (or return a safe
sentinel) when it’s out of range, or alternatively add a clear comment/docstring
on getJournalSettledAmountForObligationInternal (and possibly upstream
getJournalSettledAmountForObligation) stating the maximum expected amount and
why values above that cannot occur; reference the functions
getJournalSettledAmountForObligationInternal and
getJournalSettledAmountForObligation and ensure the handler returns only a
Number when the bound check passes.
convex/dispersal/selfHealing.ts (2)

140-145: Minor: Multiple Date.now() calls in same patch.

The patch uses Date.now() twice on lines 143 and 144, which could result in slightly different timestamps (typically microseconds apart). For consistency, consider capturing Date.now() once.

🔧 Suggested fix
 		if (attemptCount > MAX_HEALING_ATTEMPTS) {
 			// ── Escalate to SUSPENSE ──
+			const now = Date.now();
 			if (existing) {
 				await ctx.db.patch(existing._id, {
 					status: "escalated",
 					attemptCount,
-					lastAttemptAt: Date.now(),
-					escalatedAt: Date.now(),
+					lastAttemptAt: now,
+					escalatedAt: now,
 				});
 			} else {
 				await ctx.db.insert("dispersalHealingAttempts", {
 					obligationId: args.obligationId,
 					attemptCount,
-					lastAttemptAt: Date.now(),
-					escalatedAt: Date.now(),
+					lastAttemptAt: now,
+					escalatedAt: now,
 					status: "escalated",
-					createdAt: Date.now(),
+					createdAt: now,
 				});
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/dispersal/selfHealing.ts` around lines 140 - 145, The patch call in
the selfHealing logic uses Date.now() twice in ctx.db.patch(existing._id, {
lastAttemptAt: Date.now(), escalatedAt: Date.now(), ... }), which can yield
slightly different timestamps; capture Date.now() once into a local variable
(e.g., const now = Date.now()) inside the surrounding function/block that runs
the patch (in the same scope as the ctx.db.patch call) and use that variable for
both lastAttemptAt and escalatedAt to ensure identical timestamps.

69-104: Consider pagination for large obligation sets.

The query collects all settled obligations without pagination. There's already a TODO in the codebase (convex/payments/obligations/queries.ts) noting this limitation, and pagination should be implemented as the system scales beyond pre-launch volumes.

For now this is acceptable given the 15-minute cron interval and the filtering logic, but add pagination or a cursor-based approach before production if the obligation count grows significantly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/dispersal/selfHealing.ts` around lines 69 - 104,
findSettledWithoutDispersals currently collects all settled obligations in one
go (unbounded .collect()), which can OOM at scale; change the handler in
findSettledWithoutDispersals to iterate paginated/batched queries (e.g., a fixed
pageSize loop or cursor) against the "obligations" index by_status and process
each page, checking dispersalEntries and dispersalHealingAttempts per obligation
as you do now (use the same withIndex("by_obligation") checks), and stop when a
page returns no rows; ensure page size is reasonable (e.g., 100–1000) and the
loop preserves the same filtering/ordering to avoid skipping or duplicating
obligations.
convex/dispersal/__tests__/selfHealing.test.ts (1)

230-249: Mismatch between lenderId type in ledger_accounts seed data.

The lenderId field in ledger_accounts is typed as v.optional(v.string()) in the schema, but here you're using string literals like "lender-heal-1" instead of the actual Id<"lenders"> returned from ctx.db.insert("lenders", ...).

While this works because the schema allows strings, it creates a disconnect — the actual lenderOneId and lenderTwoId are created above but not used here. This could cause issues if tests later need to correlate ledger accounts with actual lender records.

🔧 Suggested fix to use actual lender IDs
 		const lenderOneAccountId = await ctx.db.insert("ledger_accounts", {
 			type: "POSITION",
-			mortgageId,
-			lenderId: "lender-heal-1",
+			mortgageId: mortgageId as unknown as string,
+			lenderId: lenderOneId as unknown as string,
 			cumulativeDebits: 6000n,
 			cumulativeCredits: 0n,
 			pendingDebits: 0n,
 			pendingCredits: 0n,
 			createdAt: now,
 		});
 		const lenderTwoAccountId = await ctx.db.insert("ledger_accounts", {
 			type: "POSITION",
-			mortgageId,
-			lenderId: "lender-heal-2",
+			mortgageId: mortgageId as unknown as string,
+			lenderId: lenderTwoId as unknown as string,
 			cumulativeDebits: 4000n,
 			cumulativeCredits: 0n,
 			pendingDebits: 0n,
 			pendingCredits: 0n,
 			createdAt: now,
 		});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@convex/dispersal/__tests__/selfHealing.test.ts` around lines 230 - 249, The
ledger_accounts seed uses string literals for lenderId ("lender-heal-1",
"lender-heal-2") instead of the actual lender record IDs created earlier
(lenderOneId, lenderTwoId), causing a mismatch between the seeded
ledger_accounts and the real lenders; update the ctx.db.insert calls that create
lenderOneAccountId and lenderTwoAccountId to pass lenderId: lenderOneId and
lenderId: lenderTwoId (respectively) when inserting into ledger_accounts so the
ledger_accounts rows reference the actual Id<"lenders"> values returned by the
earlier ctx.db.insert("lenders", ...) calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@convex/crons.ts`:
- Around line 30-34: Replace the static crons.interval registration with the
runtime-managed convex-dev-crons registration: remove the crons.interval(...)
call and register internal.dispersal.selfHealing.dispersalSelfHealingCron with
the repository's convex-dev-crons component using its runtime registration API
(e.g. convexDevCrons.registerInterval or equivalent) to schedule the 15-minute
interval; ensure you pass the same interval ({ minutes: 15 }) and the same
handler symbol (internal.dispersal.selfHealing.dispersalSelfHealingCron) so the
job is managed by the cron-component at runtime.

---

Nitpick comments:
In `@convex/dispersal/__tests__/reconciliation.test.ts`:
- Around line 52-53: The test's result interfaces currently hardcode status =
"pending" while dispersalStatusValidator and production queries
(getDisbursementHistory, getDispersalsByMortgage) may later include "disbursed"
| "failed"; update the tests to derive the status type from the actual query
return types instead of hardcoding it — e.g., import or reference the return
type of getDisbursementHistory/getDispersalsByMortgage (or the schema type used
by dispersalStatusValidator) and use TypeScript's ReturnType/Awaited utilities
or the exported response types to form the test interfaces so they automatically
reflect added status values and stay in sync with dispersalStatusValidator and
the query results.

In `@convex/dispersal/__tests__/selfHealing.test.ts`:
- Around line 230-249: The ledger_accounts seed uses string literals for
lenderId ("lender-heal-1", "lender-heal-2") instead of the actual lender record
IDs created earlier (lenderOneId, lenderTwoId), causing a mismatch between the
seeded ledger_accounts and the real lenders; update the ctx.db.insert calls that
create lenderOneAccountId and lenderTwoAccountId to pass lenderId: lenderOneId
and lenderId: lenderTwoId (respectively) when inserting into ledger_accounts so
the ledger_accounts rows reference the actual Id<"lenders"> values returned by
the earlier ctx.db.insert("lenders", ...) calls.

In `@convex/dispersal/selfHealing.ts`:
- Around line 140-145: The patch call in the selfHealing logic uses Date.now()
twice in ctx.db.patch(existing._id, { lastAttemptAt: Date.now(), escalatedAt:
Date.now(), ... }), which can yield slightly different timestamps; capture
Date.now() once into a local variable (e.g., const now = Date.now()) inside the
surrounding function/block that runs the patch (in the same scope as the
ctx.db.patch call) and use that variable for both lastAttemptAt and escalatedAt
to ensure identical timestamps.
- Around line 69-104: findSettledWithoutDispersals currently collects all
settled obligations in one go (unbounded .collect()), which can OOM at scale;
change the handler in findSettledWithoutDispersals to iterate paginated/batched
queries (e.g., a fixed pageSize loop or cursor) against the "obligations" index
by_status and process each page, checking dispersalEntries and
dispersalHealingAttempts per obligation as you do now (use the same
withIndex("by_obligation") checks), and stop when a page returns no rows; ensure
page size is reasonable (e.g., 100–1000) and the loop preserves the same
filtering/ordering to avoid skipping or duplicating obligations.

In `@convex/payments/cashLedger/reconciliation.ts`:
- Around line 72-80: getJournalSettledAmountForObligationInternal currently
converts a bigint to Number with Number(amount), risking silent precision loss
if the value exceeds Number.MAX_SAFE_INTEGER; add a safety check after calling
getJournalSettledAmountForObligation that compares the returned bigint against
BigInt(Number.MAX_SAFE_INTEGER) and either throw/error (or return a safe
sentinel) when it’s out of range, or alternatively add a clear comment/docstring
on getJournalSettledAmountForObligationInternal (and possibly upstream
getJournalSettledAmountForObligation) stating the maximum expected amount and
why values above that cannot occur; reference the functions
getJournalSettledAmountForObligationInternal and
getJournalSettledAmountForObligation and ensure the handler returns only a
Number when the bound check passes.

In `@convex/payments/obligations/queries.ts`:
- Around line 86-93: getSettledObligations currently performs an unbounded scan
via ctx.db.query("obligations").withIndex("by_status", ...).collect(), which
will not scale; change the handler to accept pagination args (e.g., limit and
cursor) and replace the unbounded .collect() with a cursor/limit-based scan
using the query on "obligations" with .withIndex("by_status") and pagination
primitives (take/limit and startAfter or an iterator) to return at most the
requested chunk and a nextCursor token; ensure the exported function signature
(getSettledObligations) and its args reflect limit/cursor and the handler
returns {items, nextCursor} so callers (cron) can iterate safely.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 166eda52-ce52-4e0e-9bb6-0acb4cb8769e

📥 Commits

Reviewing files that changed from the base of the PR and between 3cf200e and 7b130cc.

⛔ Files ignored due to path filters (1)
  • convex/_generated/api.d.ts is excluded by !**/_generated/**
📒 Files selected for processing (13)
  • convex/crons.ts
  • convex/dispersal/__tests__/createDispersalEntries.test.ts
  • convex/dispersal/__tests__/reconciliation.test.ts
  • convex/dispersal/__tests__/selfHealing.test.ts
  • convex/dispersal/selfHealing.ts
  • convex/dispersal/selfHealingTypes.ts
  • convex/mortgages/paymentFrequency.ts
  • convex/payments/cashLedger/postEntry.ts
  • convex/payments/cashLedger/reconciliation.ts
  • convex/payments/cashLedger/types.ts
  • convex/payments/obligations/generateImpl.ts
  • convex/payments/obligations/queries.ts
  • convex/schema.ts

Comment thread convex/crons.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a self-healing mechanism in the Convex backend to detect obligations that are marked settled but are missing dispersal entries, retry dispersal creation idempotently, and escalate after repeated failures via a new cash-ledger entry type.

Changes:

  • Introduces dispersalHealingAttempts for retry/escalation tracking and a 15-minute interval cron to drive the healing loop.
  • Adds SUSPENSE_ESCALATED to cash-ledger types/schema/validators and posts that entry on escalation.
  • Adds internal queries/actions plus new tests covering detection, retry behavior, escalation, and resolution helpers.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
convex/schema.ts Adds dispersalHealingAttempts table and registers SUSPENSE_ESCALATED in cash-ledger journal entry schema.
convex/payments/obligations/queries.ts Adds getSettledObligations internal query for settled obligation enumeration.
convex/payments/obligations/generateImpl.ts Import ordering tweak (type import positioning).
convex/payments/cashLedger/types.ts Adds SUSPENSE_ESCALATED to cash entry types and family constraints.
convex/payments/cashLedger/reconciliation.ts Adds an internalQuery wrapper for retrieving journal-settled amount from actions.
convex/payments/cashLedger/postEntry.ts Extends entryType validator union to include SUSPENSE_ESCALATED.
convex/mortgages/paymentFrequency.ts Minor signature formatting change.
convex/dispersal/selfHealingTypes.ts Defines constants and types for the healing cron/results.
convex/dispersal/selfHealing.ts Implements candidate detection, retrigger mutation, escalation posting, resolve helper, and cron action.
convex/dispersal/__tests__/selfHealing.test.ts Adds end-to-end tests for detection, retry counting/idempotency, escalation journal entry, and resolve helper.
convex/dispersal/__tests__/reconciliation.test.ts Adds local type interfaces for typed makeFunctionReference usage.
convex/dispersal/__tests__/createDispersalEntries.test.ts Formatting-only changes in tests.
convex/crons.ts Moves daily reconciliation to 07:00 UTC and adds 15-minute dispersal self-healing cron.
convex/_generated/api.d.ts Updates generated API typings to include the new dispersal/selfHealing module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread convex/payments/obligations/queries.ts Outdated
Comment thread convex/dispersal/selfHealing.ts
Comment thread convex/dispersal/selfHealing.ts
Comment thread convex/payments/cashLedger/types.ts
Comment thread convex/dispersal/selfHealing.ts
Comment thread convex/dispersal/selfHealing.ts
- Skip balanceCheck for SUSPENSE_ESCALATED (receivable is ~0 after
  CASH_RECEIVED, crediting it would fail assertNonNegativeBalance)
- Remove unused getSettledObligations query (findSettledWithoutDispersals
  queries obligations directly)
- Remove alreadyResolved from HealingResult (always 0, field served no purpose)
- Document resolveHealingAttempt as manual/admin-only (not called by cron)
- Add N+1 TODO comment for future pagination at scale

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Connorbelez Connorbelez merged commit d9a1eea into main Mar 21, 2026
1 of 3 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Mar 29, 2026
Connorbelez added a commit that referenced this pull request Apr 20, 2026
Detect settled obligations missing dispersal entries and retrigger
creation idempotently. After 3 failed attempts, escalate to SUSPENSE
via a new SUSPENSE_ESCALATED cash ledger entry type.

- Add dispersalHealingAttempts table for retry tracking
- Add SUSPENSE_ESCALATED entry type (types, schema, postEntry validator)
- Add findSettledWithoutDispersals query, retriggerDispersal mutation
- Add 15-minute interval self-healing cron
- Fix cron stagger: reconciliation moved from 06:00 to 07:00 UTC
- Add getSettledObligations + journal amount internal queries
- 8 new tests covering detection, retry, escalation, and resolution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

## Release Notes

* **New Features**
  * Automated dispersal self-healing system detects settled obligations missing dispersal entries and automatically attempts recovery through scheduled maintenance every 15 minutes
  * Unresolved recovery attempts escalate to suspense accounts for manual review

* **Improvements**
  * Daily reconciliation check rescheduled to 07:00 UTC (previously 06:00 UTC)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants