diff --git a/README.md b/README.md
index 42ef3625..b07eb10e 100644
--- a/README.md
+++ b/README.md
@@ -44,6 +44,14 @@ $ make wdeps-test
## Documentation
-@TODO Please write a couple of words about what your project does and how it does it.
+Hellgate is the core payment-processing service of the platform. It orchestrates
+the full lifecycle of invoices, payments, refunds, chargebacks and recurrent
+paytools on top of a set of hierarchical, event-sourced state machines, and
+drives external payment providers through the `proxy-provider` Thrift protocol.
+
+The documentation for business logic and internal mechanics lives under
+[`doc/`](doc/index.md) — start at [`doc/index.md`](doc/index.md) for the full
+table of contents covering architecture, state machines, routing, limits and
+accounting, provider integration, risk/repair, and domain/party resolution.
[1]: http://erlang.org/doc/man/shell.html
diff --git a/doc/architecture.md b/doc/architecture.md
new file mode 100644
index 00000000..cbc2d3ff
--- /dev/null
+++ b/doc/architecture.md
@@ -0,0 +1,147 @@
+# Architecture overview
+
+## What Hellgate is
+
+Hellgate (sometimes referred to as *payment processing* or the *processing
+core*) is the service that owns the business-logic view of every invoice and
+payment in the platform. Everything that happens to money — creating an
+invoice, authorising a card, capturing a hold, paying out after settlement,
+refunding, charging back, re-trying against a different provider — is a
+transition on a Hellgate state machine.
+
+Hellgate is intentionally *not* an API gateway: it is invoked by the
+customer-facing API (capi/capi-pcidss and similar) over Woody/Thrift, and it
+consumes a set of backend services in turn. It is the source of truth for the
+*state* of each invoice and payment; balances live in the accounter (shumway)
+and provider-side data lives on the providers.
+
+## OTP applications
+
+The release is composed of five OTP applications under
+[`apps/`](../apps):
+
+| Application | Responsibility |
+| ------------------ | -------------- |
+| `hellgate` | All business logic: state machines, sessions, routing hooks, limits, accounting, risk, repair, invoice templates. |
+| `hg_proto` | Thrift service definitions, Woody service wrapper, protocol helpers. This is the module that mounts the Woody servers and marshals/unmarshals Thrift terms. |
+| `hg_client` | Woody client for the public invoicing and invoice-templating APIs. Used from tests and ad-hoc tooling. |
+| `hg_progressor` | Progressor (the newer event-sourced automaton backend) integration: wraps Progressor RPC, encodes/decodes events, propagates OpenTelemetry context, and exposes a `Processor` callback that Progressor invokes to run Hellgate machines. |
+| `routing` | Routing logic as a standalone app: candidate gathering, scoring, rejection tracking, route explanations. The `hellgate` app calls into it but keeps no routing state itself. |
+
+## External services
+
+Hellgate is one piece of a wider microservice ecosystem. The dependencies it
+consumes are shown below with the Hellgate module that wraps each one:
+
+| Service | Purpose | Wrapper module |
+| ----------------- | --------------------------------------------------------- | -------------- |
+| DMT (`dmt_client`) | Versioned domain configuration (providers, terminals, proxies, payment institutions, routing rules, fees, limits, categories, currencies). Every domain lookup in Hellgate goes through `hg_domain`. | [hg_domain.erl](../apps/hellgate/src/hg_domain.erl) |
+| party-management | Party/shop configuration, operability, contracts. | [hg_party.erl](../apps/hellgate/src/hg_party.erl) |
+| limiter / liminator | Turnover limit enforcement (`Get`, `Hold`, `Commit`, `Rollback`). | [hg_limiter.erl](../apps/hellgate/src/hg_limiter.erl), [hg_limiter_client.erl](../apps/hellgate/src/hg_limiter_client.erl) |
+| shumway (accounter)| Double-entry accounting. Hellgate submits and commits posting plans. | [hg_accounting.erl](../apps/hellgate/src/hg_accounting.erl) |
+| bender | Deterministic ID generation. Hellgate uses Bender-style IDs for invoices, payments, refunds, chargebacks. | Called through `hg_client`/party-management; no dedicated wrapper module. |
+| cubasty (customer)| Storage for saved/recurrent payment resources. | [hg_customer_client.erl](../apps/hellgate/src/hg_customer_client.erl) |
+| fault-detector | Rolling provider availability and conversion statistics. Used to mark dead adapters as unrouteable. | [hg_fault_detector_client.erl](../apps/hellgate/src/hg_fault_detector_client.erl) |
+| proxy-provider | One Woody endpoint per provider adapter; implements `ProcessPayment`, `HandlePaymentCallback`, `GenerateToken`. | [hg_proxy_provider.erl](../apps/hellgate/src/hg_proxy_provider.erl), [hg_session.erl](../apps/hellgate/src/hg_session.erl) |
+| proxy-inspector | Risk scoring and card-token blacklists. | [hg_inspector.erl](../apps/hellgate/src/hg_inspector.erl) |
+| machinegun | Legacy event-sourced automaton backend. | Abstracted behind `hg_machine`. |
+| progressor | Current event-sourced automaton backend (default). | [hg_progressor.erl](../apps/hg_progressor/src/hg_progressor.erl) |
+
+## Backends: Machinegun, Progressor, Hybrid
+
+All persistent state in Hellgate lives in an event-sourced automaton. The
+backend selector lives in [hg_machine.erl:230](../apps/hellgate/src/hg_machine.erl):
+
+```erlang
+call_automaton(Function, Args) ->
+ call_automaton(Function, Args,
+ application:get_env(hellgate, backend, machinegun)).
+```
+
+- `machinegun` — legacy backend using Thrift automaton RPC.
+- `progressor` — newer native backend ([`config/sys.config`](../config/sys.config)
+ sets this in production).
+- `hybrid` — route some namespaces to Machinegun and others to Progressor via
+ [`hg_hybrid.erl`](../apps/hg_progressor/src/hg_hybrid.erl). This is the
+ migration mode.
+
+Regardless of backend, Hellgate is the *processor*: the backend tells it
+"here is a machine's current history and the incoming signal/call", Hellgate
+returns `{events, action, auxst}`, and the backend persists the new events.
+
+## End-to-end flow of a payment
+
+A simplified trace of `CreateInvoice → StartPayment → captured`:
+
+```mermaid
+sequenceDiagram
+ autonumber
+ participant C as Client (capi)
+ participant W as hg_woody_service_wrapper
+ participant I as hg_invoice
+ participant P as hg_invoice_payment
+ participant R as hg_routing
+ participant L as hg_limiter
+ participant CF as hg_cashflow / hg_accounting
+ participant S as hg_session
+ participant Pr as proxy-provider
+ participant A as automaton backend
+
+ C->>W: Thrift: CreateInvoice
+ W->>I: start invoice machine
+ I->>A: append ?invoice_created
+ C->>W: Thrift: StartPayment
+ W->>I: call
+ I->>P: delegate
+ P->>L: check + hold shop / payment limits
+ P->>R: gather_routes (+ fault detector, blacklist, pins)
+ R-->>P: chosen route
+ P->>CF: finalize cashflow + plan in shumway
+ P->>S: create session, ProcessPayment
+ S->>Pr: ProcessPayment
+ Pr-->>S: intent = finish | sleep | suspend
+ Note over S,Pr: async callbacks go to
ProviderProxyHost:ProcessPaymentCallback
+ S-->>P: session result
+ P->>L: commit payment limits
+ P->>CF: commit posting plan
+ P->>A: append events
+ W-->>C: response
+```
+
+On provider failure the same payment can cascade to the next candidate route
+(see [Routing](routing.md) and [State machines](state-machines.md#cascade-and-retries)),
+so a single business-level payment may correspond to several sessions.
+
+> [!IMPORTANT]
+> Hellgate pins a domain revision at the start of the call and passes it
+> through routing, term resolution and accounting. A config change landing
+> mid-payment will not affect the decision — the payment stays on its
+> original view of the world.
+
+## Thrift service surface
+
+Hellgate *exposes* these services (see [`hg_proto.erl`](../apps/hg_proto/src/hg_proto.erl)):
+
+| Path | Interface | Purpose |
+| ----------------------------------------- | ------------------------------------------------- | ------- |
+| `/v1/processing/invoicing` | `dmsl_payproc_thrift:Invoicing` | Invoice / payment / refund / chargeback operations. |
+| `/v1/processing/invoice_templating` | `dmsl_payproc_thrift:InvoiceTemplating` | Invoice template lifecycle + term computation. |
+| `/v1/stateproc/` | `mg_proto_state_processing_thrift:Processor` | Machine processor callback invoked by Machinegun. |
+| `/v1/proxyhost/provider` | `dmsl_proxy_provider_thrift:ProviderProxyHost` | Provider-facing host callback API (`ProcessPaymentCallback`, `GetPayment`, session updates). |
+
+The Progressor backend replaces the `/v1/stateproc/...` Machinegun callback
+with a Progressor-native `Processor` callback served by
+[`hg_progressor_handler.erl`](../apps/hg_progressor/src/hg_progressor_handler.erl).
+
+## Namespaces
+
+Each kind of machine has a dedicated namespace (the `namespace/0` callback of
+the `hg_machine` behaviour). The most important ones are:
+
+- `invoice` — an invoice and its nested payments, refunds, chargebacks
+- `invoice_template` — reusable invoice templates
+- `recurrent_paytools` — tokenised payment methods for recurrent billing
+
+Callbacks from providers are routed to `invoice` machines through a
+tag-to-machine binding stored in
+[`hg_machine_tag`](../apps/hellgate/src/hg_machine_tag.erl).
diff --git a/doc/domain-and-party.md b/doc/domain-and-party.md
new file mode 100644
index 00000000..6ec48a20
--- /dev/null
+++ b/doc/domain-and-party.md
@@ -0,0 +1,210 @@
+# Domain, party and varset
+
+Hellgate is stateless with respect to configuration: every decision that
+depends on merchant settings, provider terms, fees, limits, routing rules,
+available payment methods or acceptable currencies is resolved against the
+**domain**, a versioned configuration store owned by DMT. This page
+documents how that lookup works and which hooks feed it.
+
+## Domain (DMT)
+
+Module: [hg_domain.erl](../apps/hellgate/src/hg_domain.erl).
+
+All domain access flows through `hg_domain:get/2`:
+
+```erlang
+get(Revision, Ref) ->
+ try extract_data(dmt_client:checkout_object(Revision, Ref))
+ catch throw:#domain_conf_v2_ObjectNotFound{} ->
+ error({object_not_found, {Revision, Ref}})
+ end.
+```
+
+- `Revision` is either the symbolic `latest` or a concrete integer version.
+ Hellgate *pins* a revision at the beginning of a payment flow and passes
+ it through the whole call chain — routing, term evaluation and
+ accounting all use the same revision so a config change mid-payment
+ cannot corrupt the outcome.
+- `Ref` is one of the domain reference tuples: `{party_config, …}`,
+ `{shop_config, …}`, `{provider, ProviderRef}`, `{terminal, TerminalRef}`,
+ `{proxy, ProxyRef}`, `{limit_config, …}`, `{category, …}`,
+ `{currency, …}`, `{payment_institution, …}`, `{inspector, …}`, and so on.
+- `dmt_client` is the shared DMT RPC client; Hellgate does not cache
+ outside of its short-lived per-request context.
+
+Every event that records a decision tied to the domain (route selection,
+cash flow, limits) also records the revision used, so the entire decision
+can be reconstructed deterministically.
+
+## Party and shop
+
+Module: [hg_party.erl](../apps/hellgate/src/hg_party.erl).
+
+A **party** owns one or more **shops**. Both are addressed by
+`party_config_ref()` and `shop_config_ref()` respectively and stored as
+domain objects:
+
+```erlang
+get_party(PartyConfigRef) ->
+ checkout(PartyConfigRef, get_party_revision()).
+
+get_shop(ShopConfigRef, PartyConfigRef, Revision) ->
+ try dmt_client:checkout_object(Revision, {shop_config, ShopConfigRef}) of
+ #domain_conf_v2_VersionedObject{
+ object = {shop_config, #domain_ShopConfigObject{
+ data = #domain_ShopConfig{party_ref = PartyConfigRef} = ShopConfig
+ }}
+ } ->
+ {ShopConfigRef, ShopConfig};
+ _ -> undefined
+ catch throw:#domain_conf_v2_ObjectNotFound{} -> undefined
+ end.
+```
+
+Notice that `get_shop/3` validates that the shop belongs to the given
+party — this is the main cross-check that keeps one party from touching
+another's shop by guessing its ID.
+
+Party objects carry:
+
+- Owner metadata and contact details
+- A list of shops
+- Contract terms and KYC status
+- Suspension and activation state
+- Blocking status (fraud, AML, etc.)
+
+Shops carry their own set of turnover limits, category, currency,
+accepted payment tools and account references. Most of the per-merchant
+behaviour a payment will see is ultimately sourced from the shop config.
+
+### Operability checks
+
+Before doing anything that mutates money, Hellgate asserts via
+[`hg_invoice_utils`](../apps/hellgate/src/hg_invoice_utils.erl) that the
+party and shop are *operable* — not blocked, not suspended, contract
+active. A failing check aborts the operation with a clear error instead
+of creating a dangling machine.
+
+## Varset
+
+Module: [hg_varset.erl](../apps/hellgate/src/hg_varset.erl).
+
+The varset is a small map of the variables the domain uses to reduce
+selectors. Think of it as the "question" we're asking the domain. It is
+assembled as the payment progresses, with later stages adding more keys:
+
+```erlang
+-type varset() :: #{
+ category => dmsl_domain_thrift:'CategoryRef'(),
+ currency => dmsl_domain_thrift:'CurrencyRef'(),
+ cost => dmsl_domain_thrift:'Cash'(),
+ payment_tool => dmsl_domain_thrift:'PaymentTool'(),
+ party_config_ref => dmsl_domain_thrift:'PartyConfigRef'(),
+ shop_id => dmsl_base_thrift:'ID'(),
+ risk_score => hg_inspector:risk_score(),
+ flow => instant | {hold, dmsl_domain_thrift:'HoldLifetime'()},
+ wallet_id => dmsl_base_thrift:'ID'()
+}.
+```
+
+When it is handed to DMT, `prepare_varset/1` converts it into the Thrift
+`#payproc_Varset{}` struct DMT selectors evaluate against.
+
+### Where the varset drives behaviour
+
+- **Routing** (`hg_routing:gather_routes/5`): filters routing rules and
+ prohibitions, producing the candidate list.
+- **Term resolution**: fees, 3DS requirements, allowed payment methods,
+ hold lifetimes and other per-operation rules are selected from the
+ party/shop/provider terms against the varset.
+- **Payment institution resolution**
+ (`hg_payment_institution:compute_payment_institution/3`): picks system
+ and external accounts by currency and varset.
+- **Inspector**: the inspector is selected from the domain using the same
+ varset, so a shop can use different risk engines for different
+ categories or payment tools.
+
+The varset is the single bottleneck through which every "what does
+config say here?" question in Hellgate has to pass. This is the reason a
+design change that adds, say, a new routing dimension starts with a new
+varset key.
+
+```mermaid
+flowchart LR
+ I[Invoice + Payer] --> V[varset]
+ R[(domain revision)] --> V
+ RS[risk_score] --> V
+ V --> PI[payment institution
reduction]
+ V --> T[term selectors
fees, 3DS, limits]
+ V --> RT[routing rules
+ prohibitions]
+ V --> ACC[external account
selection]
+```
+
+> [!IMPORTANT]
+> The varset is cumulative: later stages add keys. Earlier stages must
+> not depend on keys that are only filled in later (e.g. routing has a
+> `risk_score` because the inspector runs first; it does **not** have a
+> `provider_ref` because routing is what sets it).
+
+## Payment institution
+
+Module: [hg_payment_institution.erl](../apps/hellgate/src/hg_payment_institution.erl).
+
+A payment institution is the top-level config blob for "a way of
+accepting payments" — typically one per legal entity / licence / scheme.
+It owns:
+
+- Routing rules (policies + prohibitions)
+- Default cash flow postings
+- System account references per currency
+- External account sets (selected by varset)
+- Inspector and proxy references
+
+`compute_payment_institution/3` reduces the referenced payment
+institution against the varset and returns the concrete struct used by
+routing, term resolution and accounting. Any per-request domain
+variability lives inside that reduction; downstream code just sees the
+resolved values.
+
+## Payment tools
+
+Module: [hg_payment_tool.erl](../apps/hellgate/src/hg_payment_tool.erl).
+
+Thin helper to extract the `PaymentTool` from a `Payer` variant (direct
+card, recurrent token, payment terminal, digital wallet, crypto, etc.).
+The payment tool is what enters the varset under `payment_tool` and what
+the provider adapter ultimately consumes.
+
+## Request context
+
+Module: [hg_context.erl](../apps/hellgate/src/hg_context.erl).
+
+Per-request auxiliary data (Woody deadline, trace id, party client,
+domain revision, current log scope) is stashed in a small record kept in
+the process dictionary via `save/1`, `load/0`, `cleanup/0`. Long-running
+call chains (especially repair and the Progressor processor callback)
+save-and-cleanup around the handler to keep request scopes from leaking.
+
+## Putting it together
+
+A concrete example of how party + DMT + varset come together on
+`CreatePayment`:
+
+1. The handler resolves the party and shop from the invoice
+ (`hg_party`) and asserts they are operable.
+2. It builds an initial varset from the invoice and the payer.
+3. It pins the current domain revision.
+4. It calls the inspector (`hg_inspector`) to get a risk score; the
+ score goes into the varset.
+5. It resolves the payment institution
+ (`hg_payment_institution:compute_payment_institution/3`) and reduces
+ routing rules against the varset.
+6. Routing (`hg_routing`) produces a candidate list; cash flow
+ (`hg_cashflow`) is reduced against the same varset once a candidate
+ is chosen.
+7. Terms (fees, limits) are reduced against the same varset before
+ limits are held and the provider call is issued.
+
+The same revision + varset pair is threaded through every subsequent
+state transition, so replaying a payment's history is deterministic even
+if the domain has moved on.
diff --git a/doc/index.md b/doc/index.md
index c6bbec82..a68f30b5 100644
--- a/doc/index.md
+++ b/doc/index.md
@@ -1,3 +1,87 @@
-# Документация
+# Hellgate documentation
-1. [Работа пинов](route_pins.md)
+Hellgate is the core payment processing service of the platform. It implements
+the authoritative state machine for invoices, payments, refunds and
+chargebacks, selects a payment route through the configured providers and
+terminals, enforces merchant/provider turnover limits, drives provider adapters
+over Woody/Thrift, and writes the resulting postings into the double-entry
+accounter (shumway).
+
+Almost everything Hellgate does is expressed as deterministic event-sourced
+transitions over the [`hg_machine`](../apps/hellgate/src/hg_machine.erl)
+abstraction, backed by either [Progressor](../apps/hg_progressor) (the
+production backend, pinned in [`config/sys.config`](../config/sys.config)) or
+Machinegun (the code-level default and legacy backend).
+
+```mermaid
+flowchart LR
+ Client([capi / capi-pcidss])
+ subgraph Hellgate
+ direction TB
+ Invoicing[hg_invoice_handler]
+ Invoice[hg_invoice]
+ Payment[hg_invoice_payment]
+ Session[hg_session]
+ Routing[hg_routing]
+ Limits[hg_limiter]
+ Cashflow[hg_cashflow]
+ Invoicing --> Invoice --> Payment --> Session
+ Payment --> Routing
+ Payment --> Limits
+ Payment --> Cashflow
+ end
+ DMT[(DMT)]
+ PM[(party-management)]
+ Limiter[(limiter)]
+ Shumway[(shumway)]
+ FD[(fault-detector)]
+ Inspector[(proxy-inspector)]
+ Provider[(proxy-provider)]
+ Progressor[(Progressor / Machinegun)]
+
+ Client -- Thrift --> Invoicing
+ Invoice <-- events --> Progressor
+ Routing --> DMT
+ Routing --> FD
+ Routing --> Inspector
+ Payment --> PM
+ Limits --> Limiter
+ Cashflow --> Shumway
+ Session --> Provider
+ Provider -. async callback .-> Invoice
+```
+
+> [!NOTE]
+> Docs last verified against `fb3cabd4`. Claims that reference specific
+> Erlang type names or module lines may drift after refactors — when in
+> doubt, follow the linked source.
+
+## Business-domain documentation
+
+1. [Architecture overview](architecture.md) — what Hellgate is, which OTP
+ applications live here, which external services it depends on, and how a
+ single API call flows through the system.
+2. [State machines](state-machines.md) — the `hg_machine` behaviour, invoice /
+ payment / refund / chargeback / session lifecycles, retries, cascades,
+ recurrent paytools.
+3. [Routing](routing.md) — how route candidates (provider + terminal pairs) are
+ gathered from the domain, filtered by prohibitions, fault detector and
+ blacklist, scored and chosen.
+4. [Route pins](route_pins.md) — how a payer is pinned to a specific candidate
+ within an equal-priority group.
+5. [Limits and accounting](limits-and-accounting.md) — turnover limits
+ (hold/commit/rollback), cash flow computation, allocation, and shumway
+ posting plans.
+6. [Providers, sessions and callbacks](provider-proxy.md) — sessions, the
+ provider proxy protocol, async callbacks via tags, timeout behaviour, token
+ generation for recurrent paytools.
+7. [Risk, repair and operations](risk-and-repair.md) — inspector integration,
+ risk scores, blacklists, the repair API for stuck machines.
+8. [Domain, party and varset](domain-and-party.md) — how configuration from
+ party-management and DMT is resolved at each step through the varset.
+
+## Erlang/OTP and build docs
+
+The top-level [README](../README.md) and [`CLAUDE.md`](../CLAUDE.md) document
+the build and test commands, Erlang coding conventions, and the Docker /
+Docker-Compose workflow for running the full dependency stack.
diff --git a/doc/limits-and-accounting.md b/doc/limits-and-accounting.md
new file mode 100644
index 00000000..61bf6787
--- /dev/null
+++ b/doc/limits-and-accounting.md
@@ -0,0 +1,233 @@
+# Limits and accounting
+
+Hellgate handles two orthogonal financial concerns on every payment:
+
+- **Turnover limits** — policy controls that restrict how much money can flow
+ through a given dimension (shop, provider, terminal, card, etc.) over a
+ rolling window.
+- **Accounting** — double-entry postings against the accounter service
+ (shumway) that reflect what actually moved.
+
+Both subsystems are designed around a three-phase pattern (`hold` / `commit`
+/ `rollback`) so that Hellgate can reserve capacity and posting intent *before*
+the provider call and finalise it *after* we know the outcome.
+
+## Turnover limits
+
+Module: [hg_limiter.erl](../apps/hellgate/src/hg_limiter.erl) (plus the
+Woody client wrapper in
+[hg_limiter_client.erl](../apps/hellgate/src/hg_limiter_client.erl)).
+
+### Where limits come from
+
+Turnover limits are declared in the domain and attached to two kinds of
+objects:
+
+- **Shops** — `#domain_ShopConfig{turnover_limits = [...]}`
+- **Providers / Provision terms** —
+ `#domain_PaymentsProvisionTerms{turnover_limits = {value, [...]}}`
+
+Each reference points at a `#domain_LimitConfig{}` that the limiter service
+owns. Hellgate is only responsible for knowing *which* limits apply to a
+given operation at a given revision — the limiter enforces the numeric
+ceiling.
+
+### The operation identity
+
+Holds are idempotent on an operation ID. Hellgate derives the operation ID
+from stable properties of the payment:
+
+- Payment-level limits: `[provider_id, terminal_id, invoice_id, payment_id, iteration]`
+- Shop-level limits: `[party_id, shop_id, invoice_id, payment_id]`
+- Refund/chargeback flows: analogous lists that include the refund or
+ chargeback ID.
+
+The `iteration` component is the cascade attempt counter, which lets the
+limiter distinguish "same payment, retried on a different route" from "new
+payment" without double-counting.
+
+### Payment limits — hold / commit / rollback
+
+```erlang
+check_limits([turnover_limit()], Invoice, Payment, Session | undefined, Route, Iter)
+ -> {ok, [turnover_limit_value()]}
+ | {error, {limit_overflow, [binary()], [turnover_limit_value()]}}.
+
+hold_payment_limits(Limits, Invoice, Payment, Session, Route, Iter).
+commit_payment_limits(Limits, Invoice, Payment, Session, Route, Iter, BinaryOperationId | undefined).
+rollback_payment_limits(Limits, Invoice, Payment, Session, Route, Iter, BinaryOperationId | undefined).
+```
+
+- `check_limits/6` is a *dry-run* — it returns the current limit values so the
+ payment can fail fast (the routing step calls this to reject overflowing
+ candidates).
+- `hold_payment_limits/6` is the real reservation. It is called once the
+ route is chosen and the cash flow has been built, right before the provider
+ call.
+- `commit_payment_limits/7` finalises the hold on capture success.
+- `rollback_payment_limits/7` releases the hold on cascade / retry / final
+ failure so that the reserved capacity becomes available again.
+
+### Shop and refund limits
+
+Shop limits (`check_shop_limits/5`, `hold_shop_limits/5`, etc.) and refund
+limits (`hold_refund_limits/5`, `commit_refund_limits/5`,
+`rollback_refund_limits/5`) follow exactly the same three-phase contract.
+
+Refunds reverse capture holds: a refund hold effectively releases the
+corresponding capture hold on the same limit bucket, so a fully refunded
+payment becomes invisible to turnover limits (as intended).
+
+## Cash flow
+
+Module: [hg_cashflow.erl](../apps/hellgate/src/hg_cashflow.erl), plus
+helpers in [hg_cashflow_utils.erl](../apps/hellgate/src/hg_cashflow_utils.erl).
+
+### The model
+
+A *cash flow* is a list of postings:
+
+```erlang
+-type posting() :: #domain_CashFlowPosting{
+ source = account(), % merchant | provider | system | external
+ destination = account(),
+ volume = cash_volume(), % fixed | share | product
+ details = binary() % human-readable description
+}.
+```
+
+Volumes are computed, not static:
+
+```erlang
+?fixed(Cash) % literal amount
+?share(P, Q, Of, Rounding) % P/Q of another amount
+?product([Op, V1, V2, ...]) % composition
+```
+
+`Of` is a reference to another amount in the same flow (usually the payment
+amount) so that commission-style postings stay correct when the base changes.
+
+### Finalisation
+
+`hg_cashflow:finalize/3` takes the abstract template, a context containing
+the monetary parameters, and an `AccountMap` that resolves each abstract
+account to a concrete account ID:
+
+```erlang
+compute_postings(CF, Context, AccountMap) ->
+ [
+ ?final_posting(
+ construct_final_account(Source, AccountMap),
+ construct_final_account(Destination, AccountMap),
+ compute_volume(Volume, Context),
+ Details
+ )
+ || ?posting(Source, Destination, Volume, Details) <- CF
+ ].
+```
+
+The account map is built by
+[`hg_accounting:collect_account_map/1`](../apps/hellgate/src/hg_accounting.erl)
+and has four concrete halves:
+
+- **Merchant accounts** — settlement and guarantee from the shop config.
+- **Provider accounts** — the chosen provider's settlement account for the
+ payment currency.
+- **System accounts** — settlement and subagent accounts from the payment
+ institution for the payment currency.
+- **External accounts** — income/outcome accounts selected from the payment
+ institution's external account sets via the varset.
+
+### Reversal
+
+Refunds and chargeback steps reverse a flow by swapping source and
+destination on every posting:
+
+```erlang
+revert(CF) ->
+ [?final_posting(Destination, Source, Volume, revert_details(Details))
+ || ?final_posting(Source, Destination, Volume, Details) <- CF].
+```
+
+## Accounting (shumway)
+
+Module: [hg_accounting.erl](../apps/hellgate/src/hg_accounting.erl).
+
+The accounter is a straightforward double-entry ledger. Hellgate drives it
+with *posting plans*:
+
+- `plan(CashFlow, Context)` — submit staged postings. The accounter computes
+ the effect on each account's balance but does not make it visible yet.
+- `commit(PlanLog, PlanIDs)` — materialise the staged batches.
+- `rollback(PlanLog, PlanIDs)` — discard them.
+
+Each payment can produce several plan IDs over its life (authorisation,
+capture, refund, chargeback stages), and the payment state machine carries
+the plan log forward so that commits and rollbacks target the right batches.
+
+Accounting follows the state machine, not the other way around: posts are
+staged when the corresponding activity starts (e.g.
+`processing_accounter`) and committed when the activity resolves
+successfully (`finalizing_accounter`). This keeps the ledger consistent
+with what the state machine believes and lets us recover from a crash
+between stage and commit by replaying events.
+
+## Allocation
+
+Module: [hg_allocation.erl](../apps/hellgate/src/hg_allocation.erl).
+
+Allocation splits a payment across multiple recipients (think marketplace
+sub-merchants). The domain types (`AllocationPrototype`, `Allocation`,
+`AllocationTransaction`) and the arithmetic (`sub/2`, etc.) are in place,
+but the feature is currently turned off:
+
+```erlang
+calculate(_Prototype, _Party, _Shop, _Cost, _Terms) ->
+ {error, allocation_not_allowed}.
+```
+
+When enabled, allocation interleaves with cash flow: each allocation
+transaction produces an additional sub-flow that is accounted for in
+shumway and can be refunded independently.
+
+## Putting it together
+
+On a happy-path payment the finance subsystems execute in this order:
+
+```mermaid
+sequenceDiagram
+ autonumber
+ participant P as hg_invoice_payment
+ participant L as limiter
+ participant CF as hg_cashflow
+ participant S as shumway
+ participant Pr as provider
+
+ Note over P,L: routing stage
+ P->>L: check_limits (dry-run)
+ P->>L: hold_shop_limits
+ P->>L: hold_payment_limits
+ P->>CF: finalize cash flow
+ P->>S: plan (stage postings)
+ P->>Pr: ProcessPayment
+ alt success
+ Pr-->>P: success + trx
+ P->>L: commit_shop_limits + commit_payment_limits
+ P->>S: commit(plan)
+ else failure
+ Pr-->>P: failure
+ P->>S: rollback(plan)
+ P->>L: rollback_payment_limits
+ Note over P: maybe cascade
+ end
+```
+
+Refunds and chargebacks run the same pattern on their own plans and
+inverted cash flows, so every terminal state leaves the ledger and the
+limiter in a consistent place.
+
+> [!WARNING]
+> The operation ID passed to the limiter must remain stable across retries
+> of the *same* attempt and must change across cascades. `iter` exists
+> specifically so that a cascade to a new route is a new operation, not a
+> double-count of the old one.
diff --git a/doc/provider-proxy.md b/doc/provider-proxy.md
new file mode 100644
index 00000000..48d279b8
--- /dev/null
+++ b/doc/provider-proxy.md
@@ -0,0 +1,207 @@
+# Providers, sessions and callbacks
+
+Hellgate never talks directly to an acquirer, a 3-D Secure directory server
+or an alternative-payment-method back end. Every external provider sits
+behind a *provider proxy* — a separate Woody service that implements the
+`proxy-provider` Thrift protocol and translates between the generic
+Hellgate model and the provider's own API.
+
+This page describes how Hellgate invokes those adapters, how it receives
+their callbacks, and how sessions keep everything consistent.
+
+## The adapter protocol
+
+Three RPCs on the `proxy-provider` interface are invoked from Hellgate:
+
+```erlang
+process_payment(ProxyContext, Route).
+handle_payment_callback(Payload, ProxyContext, Route).
+generate_token(ProxyContext, Route).
+```
+
+Implementation lives in
+[`hg_proxy_provider.erl`](../apps/hellgate/src/hg_proxy_provider.erl).
+
+Each call carries a `ProxyContext` — a self-contained snapshot of payment
+info, previous session state (opaque to Hellgate), and merged
+`ProxyOptions`. The options are collected by
+`hg_proxy_provider:collect_proxy_options/1` which merges three layers,
+terminal-specific first, provider-additional next, and proxy-definition
+defaults last:
+
+```erlang
+lists:foldl(fun(undefined, M) -> M; (M1, M) -> maps:merge(M1, M) end, #{}, [
+ Terminal#domain_Terminal.options,
+ Proxy#domain_Proxy.additional,
+ ProxyDef#domain_ProxyDefinition.options
+]).
+```
+
+This layering lets the same provider be reused across terminals while
+still allowing per-terminal overrides.
+
+The adapter reply is a `provider intent`:
+
+- `{finish, FinishIntent}` — the adapter is done. Outcomes include
+ success (with a transaction reference to record), failure (propagated
+ back into the payment's failure channel), or a user-visible reason.
+- `{sleep, SleepIntent}` — poll again after a timer. Hellgate sets a
+ machine timer; when it fires, the session resumes.
+- `{suspend, SuspendIntent}` — register a tag and wait for an async
+ callback.
+
+See the `hg_session` module for how intents are decoded into activity
+transitions.
+
+## Sessions
+
+Module: [hg_session.erl](../apps/hellgate/src/hg_session.erl).
+
+A session is one conversation with one adapter about one target state. The
+struct is kept small on purpose because it is persisted as part of the
+payment's event history:
+
+- `target` — the final status we want to drive this conversation towards
+ (`processed`, `captured`, `cancelled`, `refunded`).
+- `status` — `active | suspended | finished`.
+- `tags` — the currently registered callback tags for this session.
+- `route` — the `(provider, terminal)` we're talking to.
+- `proxy_state` — an opaque binary the adapter asks us to pass back on
+ every RPC. Hellgate never inspects it.
+- `interaction` — a structured description of any UI the user sees (3DS
+ redirect, OTP page, QR code, …); also feeds into cascade logic.
+- `ui_occurred` — a latching boolean set the first time the user
+ interacts. Cascade will not retry after this flips.
+- `timings` — timestamps captured for diagnostics and SLAs.
+- `repair_scenario` — optional manual override injected via the repair API.
+
+Lifecycle:
+
+1. `create/0` + `set_payment_info/2` — blank session, payment info attached.
+2. `process/1` — first RPC to the adapter. Hellgate interprets the intent
+ and decides whether to finish, sleep or suspend.
+3. `apply_event/3` — on each subsequent event (timer fire, callback arrival,
+ manual repair), advance the session state.
+4. `deduce_activity/1` — derive the next payment activity (`flow_waiting`,
+ `processing_capture`, `finalizing_session`, …) from the session's
+ current status.
+
+## Tags and async callbacks
+
+Async delivery is necessary because many provider flows are inherently
+out-of-band (3DS redirects, bank push notifications, offline bank transfer
+confirmation). Hellgate uses a *tag* as the rendezvous point between a
+suspended session and an incoming callback.
+
+```mermaid
+sequenceDiagram
+ autonumber
+ participant P as hg_invoice_payment
+ participant S as hg_session
+ participant Tag as hg_machine_tag
+ participant Pr as proxy-provider (adapter)
+ participant U as Payer / upstream
+ participant Host as hg_proxy_host_provider
+
+ P->>S: process
+ S->>Pr: ProcessPayment
+ Pr-->>S: intent = suspend, tag = T
+ S->>Tag: put_binding(invoice, T, PaymentID, InvoiceID)
+ Note over S: session.status = suspended
+ U-->>Pr: completes 3DS / OTP / offline step
+ Pr->>Host: ProcessPaymentCallback(T, payload)
+ Host->>Tag: get_binding(invoice, T)
+ Tag-->>Host: (InvoiceID, PaymentID)
+ Host->>P: process_callback
+ P->>S: apply event, resume
+ S-->>P: session.status = finished (or suspended again)
+```
+
+Module: [hg_machine_tag.erl](../apps/hellgate/src/hg_machine_tag.erl).
+
+At session creation (or when the adapter returns a `suspend` intent) a tag
+is registered:
+
+```erlang
+put_binding(<<"invoice">>, Tag, PaymentID, InvoiceID).
+```
+
+The tag is handed to the adapter, which embeds it in whatever
+user-facing URL the payer hits or passes it to the upstream system that
+will later confirm the operation. When the adapter calls back into
+Hellgate it invokes `ProcessPaymentCallback` on the
+[`ProviderProxyHost`](../apps/hg_proto/src/hg_proto.erl) Thrift service
+(mounted at `/v1/proxyhost/provider` by `hg_proto:get_service_spec/2`),
+passing the tag **inside the Thrift payload** — the URL path itself is
+fixed. The handler
+[`hg_proxy_host_provider`](../apps/hellgate/src/hg_proxy_host_provider.erl)
+then:
+
+1. Resolves `tag → (InvoiceID, PaymentID)` via
+ [`hg_machine_tag:get_binding/2`](../apps/hellgate/src/hg_machine_tag.erl).
+2. Calls `hg_invoice:process_callback/2` on the invoice machine.
+3. The invoice routes the callback into the correct payment/session.
+4. The session either finishes (success/failure), sleeps again, or stays
+ suspended under a new tag.
+
+This layer is the reason the callback endpoint is *host-side* — the
+adapter is a client of Hellgate for the callback, not the other way
+round.
+
+## Timeout behaviour
+
+Each session declares a `timeout_behaviour()` from the domain. In broad
+strokes:
+
+- Immediate — the adapter promised to respond synchronously; no polling
+ needed.
+- Polling — the adapter is slow but pollable; Hellgate sets a timer and
+ calls `process/1` again when it fires.
+- Callback — the adapter will drive completion via an async callback; the
+ timer is used as a fail-safe if the callback never arrives.
+
+Timers are implemented by the `set_timer` action returned from the
+machine's `process_signal/2` and are honoured by the automaton backend.
+
+## Fault detector integration
+
+Every adapter RPC is reported to the fault detector. The client module
+[`hg_fault_detector_client`](../apps/hellgate/src/hg_fault_detector_client.erl)
+registers operations on start and finish and queries rolling statistics
+at routing time. The statistics feed two decisions:
+
+- The route's availability (`alive`/`dead`) — a `dead` route is rejected
+ from the candidate list.
+- The route's expected conversion rate — used as part of the scoring tuple
+ when two candidates tie on priority.
+
+Because statistics are reported per operation, a provider that is healthy
+for authorisations but broken for refunds will be marked dead only for the
+broken flow.
+
+> [!CAUTION]
+> The `proxy_state` binary returned by an adapter is opaque to Hellgate and
+> is persisted verbatim into the session event. Adapters must treat it as
+> their own forward-compatible serialisation format — a non-backwards-
+> compatible change will break in-flight sessions on replay.
+
+## Generating recurrent tokens
+
+For recurrent paytools the flow is similar, but the RPC is
+`generate_token/2`. The response becomes the paytool's permanent payment
+resource — subsequent payments against the same paytool skip the
+cardholder-interactive stages entirely and drive the adapter through its
+"use a previously-tokenised card" path.
+
+## Provider-side diagnostics
+
+Two modules exist purely to make provider conversations observable:
+
+- [`hg_proxy.erl`](../apps/hellgate/src/hg_proxy.erl) — low-level call
+ options helper shared across proxy types (provider and inspector).
+- [`hg_profiler.erl`](../apps/hellgate/src/hg_profiler.erl) and the
+ `hg_timings` helper record per-session timings that are later attached
+ to events and surfaced in payment state.
+
+Together with the fault detector and the `interaction` field on sessions
+this gives a fairly granular operational picture of every provider call.
diff --git a/doc/risk-and-repair.md b/doc/risk-and-repair.md
new file mode 100644
index 00000000..41a5302e
--- /dev/null
+++ b/doc/risk-and-repair.md
@@ -0,0 +1,163 @@
+# Risk, repair and operations
+
+Two adjacent subsystems sit around the payment state machine to keep the
+system safe and recoverable:
+
+- [Risk inspection](#risk-inspection) — assigns every payment a coarse
+ risk level and screens against card-level blacklists before routing.
+- [Repair](#repair) — a controlled backdoor that lets operators nudge a
+ stuck payment into a known-good state.
+
+## Risk inspection
+
+Module: [hg_inspector.erl](../apps/hellgate/src/hg_inspector.erl).
+
+Risk inspection is a Woody call to the `proxy-inspector` service. Hellgate
+picks the inspector definition from the domain (`#domain_Inspector{}`),
+merges proxy options the same way it does for providers, and calls
+`InspectPayment`:
+
+```erlang
+inspect(Shop, Invoice, Payment, Inspector) ->
+ Context = #proxy_inspector_Context{
+ payment = get_payment_info(Shop, Invoice, Payment),
+ options = maps:merge(ProxyDef#domain_ProxyDefinition.options,
+ Proxy#domain_Proxy.additional)
+ },
+ {ok, RiskScore} = issue_call('InspectPayment', {Context},
+ hg_proxy:get_call_options(Proxy, Revision),
+ FallBackRiskScore, Deadline),
+ RiskScore.
+```
+
+Notable properties:
+
+- **Risk scores are coarse.** The inspector returns `low | medium | high`
+ — a bucket, not a number. Downstream code uses the bucket as part of the
+ varset for routing and for term resolution.
+- **Fallback is explicit.** If the inspector times out, returns `undefined`
+ or errors out, Hellgate returns the configured `fallback_risk_score`.
+ Payments never get stuck waiting for the inspector.
+- **Deadlines are honoured.** The call runs under the Woody deadline from
+ the current request, so slow inspectors degrade to their fallback rather
+ than holding the machine hostage.
+
+```mermaid
+flowchart TD
+ A[risk_scoring activity] --> B[resolve inspector from domain]
+ B --> C[call proxy-inspector:InspectPayment
under request deadline]
+ C --> D{response within
deadline?}
+ D -- yes, score --> R[add score to varset]
+ D -- yes, undefined --> F[use fallback_risk_score]
+ D -- timeout / error --> F
+ F --> R
+ R --> N[routing]
+```
+
+### Blacklist checks
+
+The inspector also serves the per-route blacklist that the routing layer
+consults. `hg_inspector:check_blacklist/1` builds a context of
+
+```erlang
+#proxy_inspector_BlackListContext{
+ first_id = ProviderID,
+ second_id = TerminalID,
+ field_name = <<"CARD_TOKEN">>,
+ value = Token
+}
+```
+
+and calls `IsBlacklisted`. A `true` return knocks the route out of the
+candidate list with reason `in_blacklist` (see
+[Routing → Stage 3](routing.md#stage-3--blacklist-filtering)). Payments
+without a token (e.g. alternative payment methods) skip the check.
+
+### How the score is used
+
+The risk score flows into the payment pipeline at two points:
+
+- It is added to the **varset** (`hg_varset.erl`). Routing rules, term
+ selectors and fee selectors can therefore branch on risk — for
+ example, routing only low-risk payments to a cheap provider, or
+ charging a premium on high-risk ones.
+- It gates 3-D Secure / step-up selection through the domain-level
+ payment method conditions.
+
+The inspector is *not* an allow/deny gate in itself — Hellgate relies on
+domain configuration (routing + terms) to turn the score into a policy.
+
+## Repair
+
+Module: [hg_invoice_repair.erl](../apps/hellgate/src/hg_invoice_repair.erl).
+
+Every state machine exposes a `repair/3` entry point (via the `hg_machine`
+behaviour). For invoices the repair surface is designed specifically to
+unstick payments that have ended up in an inconsistent state — usually
+because a provider vanished, a session timed out in an unusual way, or
+a manual intervention is required to reconcile with an acquirer.
+
+### Repair scenarios
+
+Operators submit a `#payproc_InvoiceRepairScenario{}` which is one of:
+
+| Scenario | Effect |
+| ------------------------------ | ------ |
+| `fail_pre_processing` | Force the payment into `failed` with a supplied `#domain_Failure{}` *before* any side-effect (no route chosen, no limit held, no provider called). |
+| `skip_inspector` | Substitute a supplied `risk_score` for the inspector call. Useful when the inspector itself is misbehaving. |
+| `fail_session` | Inject a session *failure* into the in-flight session: the payment sees the session as if the provider had returned the given failure and trx info. |
+| `fulfill_session` | Inject a session *success* into the in-flight session: the payment sees the session as if the provider had returned success and the given trx info. |
+| `complex` | A list of scenarios to try in order; the first one whose activity matches fires. |
+
+### Safety checks
+
+`hg_invoice_repair:check_activity_compatibility/2` enforces that a scenario
+only runs when the payment is in a compatible activity. For example,
+`fail_pre_processing` is rejected once the payment is past routing;
+`repair_session` is rejected if there is no session to repair. This keeps
+repair from silently skipping state (and money) it should not touch.
+
+The repair path also requires an explicit revision input — operators must
+state the domain revision they are repairing against — which prevents a
+drifted config from bleeding into a repair that was prepared against an
+older view of the world.
+
+> [!CAUTION]
+> Repair is a privileged operation. `fulfill_session` in particular writes
+> a *success* event for a payment the provider never acknowledged — it
+> must only be used when a real reconciliation with the acquirer has
+> confirmed the outcome. Getting this wrong means booking money that
+> didn't move.
+
+### Typical uses
+
+- A provider adapter is permanently offline: use `fail_session` with an
+ appropriate failure so the payment fails properly and limits roll back.
+- An acquirer confirmed out-of-band that a transaction succeeded but the
+ provider's callback never arrived: use `fulfill_session` with the real
+ trx info.
+- A misconfigured inspector keeps returning `undefined` for a specific
+ payment shape: use `skip_inspector` with an appropriate fallback.
+- A known-bad invoice has to be terminated before it has any side effects:
+ use `fail_pre_processing`.
+
+Every repair operation goes through the same event-sourcing pipeline as a
+normal call, so the repair itself is fully auditable — the emitted events
+indicate that the new state came from a repair scenario rather than a
+provider or customer action.
+
+## Operations summary
+
+Taken together, risk inspection and repair make the state machine robust
+against two common failure modes in payment processing:
+
+- *Up-front uncertainty* — the inspector gives a cheap, bounded screen
+ before committing the payment to an expensive route, and degrades
+ gracefully if the inspector is unreachable.
+- *Tail-end stuckness* — if a payment has wandered off the happy path but
+ the correct resolution is known, repair lets an operator apply it
+ without bypassing event sourcing, accounting or limits.
+
+Everything else — routing failures, provider timeouts, transient retry
+loops — is meant to resolve itself via cascade and retry without human
+intervention.
diff --git a/doc/route_pins.md b/doc/route_pins.md
index b62040b5..6132d582 100644
--- a/doc/route_pins.md
+++ b/doc/route_pins.md
@@ -1,35 +1,131 @@
-# Пины роутов
+# Route pins
-## Какая задача
+> This document replaces the original Russian design note. The mechanism
+> described here is implemented in
+> [`hg_route`](../apps/routing/src/hg_route.erl) and
+> [`hg_routing`](../apps/routing/src/hg_routing.erl) (see
+> `gather_pin_info/2`, `select_better_route/2`,
+> `select_better_pinned_route/2`).
-У нас есть 2 и более роутов с одинаковым приоритетом
-и какой-то там разбивкой по весу. Например 3 роута с весами 33:33:33.
+## The problem
-К нам приходит плательщик. Он оплачивает какую-то услугу
-и этот платеж проходит через конкретный терминал конкретного провайдера.
-Проще говоря он выбрал один из кандидатов (роутов) из списка с одинаковым приоритетом.
+Consider a payment institution with three equal-priority routes sharing the
+same weight — `33 : 33 : 33`. A payer arrives, pays for some service, and the
+payment ends up going through, say, the second route. We would like every
+*subsequent* payment from the same payer to reach the same route, without
+pinning the *other* payers to it.
-Теперь мы хотим чтобы этот плательщик в будущем ходим через тот же самый роут.
+Naïvely re-randomising the weight on every payment would move returning
+payers around between routes. Assigning a sticky route globally per merchant
+would destroy the load split the merchant chose. Pins solve the problem at a
+finer granularity: the *payer* (identified by a configurable set of features)
+is stuck to whichever equal-priority candidate it first ended up on.
-Плательщика определяем каким-то там способом.
+## The mechanism
-## Решение
+Each route candidate in the domain can declare a
+`#domain_RoutingPin{features = [...]}` — a set of payer characteristics the
+candidate considers "identifying". The features currently recognised by
+Hellgate are:
-Мы в каждом роут кандидате можем указать
-[список характеристик](https://github.com/valitydev/damsel/blob/master/proto/domain.thrift#L2850-L2856)
-по которым мы будем определять какой именно плательщик к нам пришел.
+| Feature | Source |
+| -------------- | ------------------------------------------------- |
+| `currency` | payment currency |
+| `payment_tool` | the full payment tool (card BIN, wallet, etc.) |
+| `client_ip` | payer's IP address (may be `undefined`) |
+| `email` | payer's email (may be `undefined`) |
+| `card_token` | tokenised card identifier (may be `undefined`) |
-Когда к нам приходит запрос на проведение платежа, то мы собираем все указанные
-в конкретном кандидате характеристики и вычисляем хэш этих характеристик.
-Этот хэш учитывается при сортировке роутов по самым желаемым.
+At routing time `hg_routing:gather_pin_info/2` walks the declared features
+for the candidate and extracts their current values from the routing
+context. The result is a plain map `#{feature => value}` which travels
+along with the candidate as its `pin` (see
+[`hg_route:pin/1`](../apps/routing/src/hg_route.erl)).
-Если как в примере выше у нас 3 роут кандидата с одинаковым весом
-и список характеристик (например смотрим только на имейл) совпадает,
-то мы лочим роут с этим значением характеристики.
-Все последующие платежи с этими значениями будут проходить по тому роуту, что был использован
-в первой операции. Соответственно вес у нас в одном приоритете становится 100:0:0.
+During scoring, Hellgate sorts equal-priority candidates by
+`#domain_PaymentRouteScores{}`. The critical hook is in
+[`hg_routing:select_better_route/2`](../apps/routing/src/hg_routing.erl):
-Если же один из этих роутов имеет другой набор характеристик, например имейл и IP адрес клиента, то он участвует
-в локе пинов с роутами у которых такой же набор характеристик. В данном примере, так как он один, то распределение
-становится 66:0:33. Если бы был еще один роут с тем же приоритетом и набором характеристик имейл и IP, то
-распределение было бы 50:0:50:0
+```erlang
+case {LeftPin, RightPin} of
+ _ when LeftPin /= ?ZERO, RightPin /= ?ZERO, RightPin == LeftPin ->
+ select_better_pinned_route(Left, Right);
+ _ ->
+ select_better_regular_route(Left, Right)
+end
+```
+
+When two candidates carry the same pin value and that value is not the
+zero sentinel (meaning they saw an identical payer fingerprint), the
+regular weight-based random tie-break is replaced with a deterministic
+one:
+
+```erlang
+route_pin = erlang:phash2({Pin, hg_route:provider_ref(Route),
+ hg_route:terminal_ref(Route)})
+```
+
+Because the pin value is shared, the only thing that differs between the
+two hashes is `(provider_ref, terminal_ref)`. The `phash2` output is
+stable across time and processes, so the *same* payer always picks the
+*same* winner — exactly what the requirement asks for.
+
+Candidates whose feature sets do not overlap participate in their own
+pinning group (their pin values differ, so the `RightPin == LeftPin`
+clause never fires and they fall back to the normal random tie-break).
+
+## Worked example
+
+Three candidates A, B, C at the same priority with weights `33 : 33 : 33`.
+
+- If all three declare the feature set `{email}` and a payer with a fixed
+ email arrives, all three end up with identical `Pin` values. The
+ deterministic `phash2({Pin, P, T})` picks exactly one winner for that
+ email, so the effective distribution for that payer is
+ `100 : 0 : 0` — but a different email (or an unknown email) will pin
+ to a potentially different candidate. The aggregate split across the
+ population is still close to `33 : 33 : 33`.
+
+- If A and B declare `{email}` while C declares `{email, client_ip}`, the
+ `Pin` of C differs from that of A and B (unless by coincidence the IP
+ is absent for all payers, which is rare). A and B form one pin group;
+ C forms its own. For a given payer the collapse looks like `50 : 50 : ?`
+ within the first group and a separate pinning decision for C — hence
+ the "`66 : 0 : 33` → pin kicks in → `0 : 100 : 0` for A/B, C decided
+ independently" pattern.
+
+## Flow
+
+```mermaid
+flowchart LR
+ Route[Candidate with
#domain_RoutingPin{features=...}] --> GP[gather_pin_info/2]
+ VS[Routing context
currency, email, ip, tool, token] --> GP
+ GP --> P[pin :: #{feature => value}]
+ P --> SCORE[score_routes]
+ SCORE --> SEL{pins equal
and non-zero?}
+ SEL -- yes --> DET[select_better_pinned_route
phash2 Pin,P,T]
+ SEL -- no --> REG[select_better_regular_route
weight-based random]
+ DET --> WIN([winner])
+ REG --> WIN
+```
+
+## Practical notes
+
+> [!NOTE]
+> A feature that is `undefined` in the routing context still contributes
+> to the pin value — two candidates that both declare `{email, ip}` with
+> `ip = undefined` are considered equivalent for pinning purposes. This
+> is intentional: the absence of a signal is itself a signal.
+
+> [!IMPORTANT]
+> Pins only break ties *within* an equal-priority, equal-weight group.
+> They do not override fault-detector rejection, blacklist rejection,
+> limit overflow or priority ordering. A dead provider is still dead even
+> if the payer is "pinned" to it.
+
+> [!TIP]
+> The explanation rendered by
+> [`hg_routing_explanation`](../apps/routing/src/hg_routing_explanation.erl)
+> surfaces a `"Pin wasn't the same as in chosen route"` message when pins
+> participated in the decision — useful when debugging why an expected
+> route was not taken.
diff --git a/doc/routing.md b/doc/routing.md
new file mode 100644
index 00000000..cba45546
--- /dev/null
+++ b/doc/routing.md
@@ -0,0 +1,175 @@
+# Routing
+
+Routing is the process of turning an abstract payment into a concrete
+`(Provider, Terminal)` pair that will actually talk to a real acquirer. The
+code lives in the `routing` OTP application under
+[`apps/routing/src/`](../apps/routing/src) — `hellgate` calls into it but
+keeps no routing state of its own.
+
+The main entry point is
+[`hg_routing:gather_routes/5`](../apps/routing/src/hg_routing.erl); the
+routing *context* (`hg_routing_ctx`) is the value threaded through every
+stage, accumulating candidates, rejections and score information.
+
+```mermaid
+flowchart LR
+ A[Payment institution
routing rules + varset] --> B[1. Gather candidates
policies − prohibitions]
+ B --> C[2. Fault detector
reject 'dead' providers]
+ C --> D[3. Blacklist
reject card tokens]
+ D --> E[4. Limits
reject overflowing]
+ E --> F[5. Score & choose
priority · weight · pins · fail rate]
+ F --> G{winner?}
+ G -- yes --> R[chosen route
+ choice context]
+ G -- no --> X[no_route_found]
+```
+
+## The shape of a route
+
+A route is the `routing` app's view of a `(Provider, Terminal)` candidate
+plus everything needed to filter and score it:
+
+- `provider_ref` / `terminal_ref`
+- `priority` — integer, lower is preferred
+- `weight` — used for tie-breaking within a priority band
+- `pins` — payer-identifying characteristics (see [route_pins.md](route_pins.md))
+- `domain_revision` — the DMT revision the candidate was resolved at
+
+## Stage 1 — gathering candidates
+
+```erlang
+gather_routes(Predestination, PaymentInstitution, Varset, Revision, GatherContext) ->
+ case PaymentInstitution#domain_PaymentInstitution.payment_routing_rules of
+ undefined -> hg_routing_ctx:new([]);
+ #domain_RoutingRules{policies = Policies, prohibitions = Prohibitions} ->
+ Candidates = get_candidates(Policies, Varset, Revision),
+ {Accepted, RejectedRoutes} = filter_routes(
+ collect_routes(Candidates, Predestination, Revision),
+ get_table_prohibitions(Prohibitions, Varset, Revision)
+ ),
+ hg_routing_ctx:new(Accepted)
+ end.
+```
+
+The steps are:
+
+1. Look up the payment institution for the invoice's shop.
+2. Walk its `payment_routing_rules` — policies and prohibitions. Both are
+ DMT selector trees evaluated against the **varset** (category, currency,
+ cost, payment tool, risk score, party config ref, shop id, flow, wallet
+ id, etc.; see [Domain, party and varset](domain-and-party.md)).
+3. Expand the matching policies into concrete `(Provider, Terminal)`
+ candidates.
+4. Filter out any candidate that matches a prohibition.
+
+At this point `hg_routing_ctx` holds the *accepted* candidates and the
+reason each rejected candidate dropped out.
+
+## Stage 2 — fault-detector filtering
+
+Even if the domain allows a candidate, the fault detector may have flagged
+its provider as effectively dead. `hg_routing:filter_by_critical_provider_status/1`
+pulls statistics from [`hg_fault_detector_client`](../apps/hellgate/src/hg_fault_detector_client.erl),
+scores every candidate (availability + conversion rate) and rejects anything
+whose availability status is `dead`:
+
+```erlang
+{R1, {{dead, _} = AvailabilityStatus, _ConversionStatus}} ->
+ hg_route:to_rejected_route(R, {'ProviderDead', AvailabilityStatus})
+```
+
+Scores (and the `dead`/`alive` split) are cached into the context so that
+later stages can re-use them for final ranking without a second RPC.
+
+## Stage 3 — blacklist filtering
+
+[`hg_routing:filter_by_blacklist/2`](../apps/routing/src/hg_routing.erl) runs
+every remaining candidate against the inspector's blacklist:
+
+```erlang
+check_routes([Route|Rest], BlCtx) ->
+ % For each (provider, terminal, card_token) call proxy-inspector:IsBlacklisted
+```
+
+Blacklists are keyed on `(provider_id, terminal_id, CARD_TOKEN, )`.
+Anything blacklisted is rejected with reason `in_blacklist`.
+
+Token-less candidates (e.g. when we do not yet have a payment tool, or the
+payment is cash-on-delivery style) skip this check.
+
+## Stage 4 — limits
+
+Turnover limits can also eliminate candidates. Before selecting a route,
+Hellgate evaluates the per-provider / per-terminal turnover limits and marks
+overflowing candidates as rejected with reason `limit_overflow`. See
+[Limits and accounting](limits-and-accounting.md) for the limiter model.
+
+## Stage 5 — scoring and choice
+
+Whatever survives is ranked. The individual route score is assembled from:
+
+- Priority (primary key — lower wins)
+- Weight (secondary; interacts with pins)
+- Fault-detector availability + conversion rate
+- Blacklist status
+
+`hg_routing:choose_rated_route/1` sorts by the tuple
+`(availability, priority, weight-driven pin score, fail rate)` and picks the
+head. It returns both the chosen route and a *choice context* that records
+which alternative would otherwise have won and why each loser lost — this is
+the payload consumed by
+[`hg_routing_explanation`](../apps/routing/src/hg_routing_explanation.erl) to
+produce human-readable "why did we pick this route" output surfaced via the
+payproc API.
+
+## Route pins
+
+Full details in [route_pins.md](route_pins.md). In short: within a group
+of candidates with the same priority and weight, each candidate can
+declare a list of payer *features* (e.g. `[email]` or `[email, client_ip]`)
+via `#domain_RoutingPin{}`. At routing time the values of those features
+are pulled from the routing context and attached to the candidate; when
+two candidates in the same priority group share the same pin value, the
+weight-based random tie-break is replaced by a deterministic
+`phash2({pin, provider, terminal})` comparison.
+
+The upshot is that the effective weight distribution collapses from, say,
+`33:33:33` to `100:0:0` for a given payer if all three candidates share
+the same pin set, keeping the payer "pinned" to the route they first
+used. Candidates with different pin sets participate in their own,
+independent pinning group, so the overall distribution still makes sense
+across the population.
+
+## Cascade: re-routing on failure
+
+Cascade is the routing-level response to a failed session. See
+[State machines → Cascade and retries](state-machines.md#cascade-and-retries)
+for the trigger logic. Mechanically:
+
+1. The payment records its current route and increments `iter`.
+2. The routing context still holds the remaining candidates and their
+ scores (they were stashed during stage 2).
+3. The next candidate is picked with the same scoring rules, and a brand-new
+ session is created against that route.
+
+Because the fault detector's view can change between iterations,
+`filter_by_critical_provider_status/1` is re-run at each cascade attempt —
+a provider that was alive at attempt 1 may be dead by attempt 3.
+
+## Rejections as first-class data
+
+`hg_routing_ctx` keeps rejected candidates grouped by reason (not just a
+flat drop list). The reasons it surfaces are:
+
+- `adapter_unavailable` — marked dead by the fault detector
+- `in_blacklist` — inspector rejected the token
+- `forbidden` — matched a prohibition rule
+- `limit_overflow` — would push a turnover limit over its ceiling
+
+All four reasons are preserved in the final `ChoiceContext` so that operators
+can see exactly why an expected route was not picked.
+
+> [!NOTE]
+> Stages 2 (fault detector) and 3 (blacklist) are each one RPC per
+> candidate. If routing shows up as a latency hotspot, the candidate set is
+> usually the thing to trim — either by tightening domain prohibitions or
+> by narrowing the varset that selects policies.
diff --git a/doc/state-machines.md b/doc/state-machines.md
new file mode 100644
index 00000000..f585f024
--- /dev/null
+++ b/doc/state-machines.md
@@ -0,0 +1,472 @@
+# State machines
+
+Every durable entity in Hellgate — invoices, payments, refunds, chargebacks,
+invoice templates, recurrent paytools — is an event-sourced state machine
+implemented as a module that satisfies the
+[`hg_machine`](../apps/hellgate/src/hg_machine.erl) behaviour.
+
+## The `hg_machine` behaviour
+
+From [hg_machine.erl](../apps/hellgate/src/hg_machine.erl):
+
+```erlang
+-type machine() :: #{
+ id := id(),
+ history := history(),
+ aux_state := auxst()
+}.
+
+-type result() :: #{
+ events => [event_payload()],
+ action => hg_machine_action:t(),
+ auxst => auxst()
+}.
+
+-callback namespace() -> ns().
+-callback init(args(), machine()) -> result().
+-callback process_signal(signal(), machine()) -> result().
+-callback process_call(call(), machine()) -> {response(), result()}.
+-callback process_repair(args(), machine()) -> result().
+```
+
+The backend hands Hellgate the full `history()` (list of past events) plus
+`aux_state()` (an opaque cache used to skip replay of the whole history), and
+Hellgate returns *new* events plus an optional `action`:
+
+- `set_timer` — schedule a deadline (e.g. invoice expiration, session poll)
+- `remove` — ask the backend to delete the machine
+- `notify` — outgoing notifications
+
+Calls come in three shapes:
+
+- `hg_machine:start/3` — `init/2` builds the initial event list.
+- `hg_machine:call/3` / `hg_machine:thrift_call/5` — synchronous, returns a
+ response. Handled by `process_call/2`.
+- `hg_machine:repair/3` — manual intervention, see [Repair](risk-and-repair.md#repair).
+
+The top-level backend selector lives in `hg_machine:call_automaton/3` and
+picks between Machinegun, Progressor and the hybrid router based on the
+`hellgate` `backend` env var.
+
+## Invoice machine
+
+Module: [hg_invoice.erl](../apps/hellgate/src/hg_invoice.erl).
+Namespace: `invoice`.
+
+The invoice is the outer state machine. It owns:
+
+- The immutable `#domain_Invoice{}` (shop, cost, due date, metadata, cart or
+ product, optional mutations)
+- Zero or more *nested* payment state machines keyed by `PaymentID`
+- A cached reference to the party/shop at creation revision
+
+Its internal state (`#st{}` in `hg_invoice.erl`) tracks which sub-entity is
+currently "active":
+
+```erlang
+-type activity() ::
+ invoice % waiting on payment creation or expiration
+ | {payment, payment_id()}.
+```
+
+Key event types (from `payment_events.hrl`):
+
+- `?invoice_created(Invoice)`
+- `?invoice_status_changed(Status)`
+- `?payment_ev(PaymentID, PaymentEvent)` — wraps nested payment events
+
+Status transitions (from the domain):
+
+```mermaid
+stateDiagram-v2
+ [*] --> unpaid
+ unpaid --> paid: all payments captured
+ unpaid --> cancelled: explicit cancellation
+ unpaid --> expired: due date reached
+ paid --> [*]
+ cancelled --> [*]
+ expired --> [*]
+```
+
+The invoice handles the following calls (selected):
+
+- `start_payment` — creates the nested payment machine
+- `capture_payment` / `cancel_payment` / `refund_payment` — delegate to the
+ relevant payment sub-machine
+- `create_chargeback` / `accept_chargeback` / `reject_chargeback` / `reopen_chargeback`
+- `process_callback` — async provider callback routed by tag
+
+`timeout` signals are used for expiration and for session polling of nested
+payments.
+
+## Payment machine
+
+Module: [hg_invoice_payment.erl](../apps/hellgate/src/hg_invoice_payment.erl)
+(around 4k lines — the largest module in the project).
+
+Payments are *sub-machines* of invoices: they do not have their own Machinegun
+namespace; their events are wrapped in `?payment_ev(PaymentID, ...)` and
+appended to the invoice's history. `hg_invoice_payment` provides the
+apply/process functions that `hg_invoice` calls into.
+
+Payment status (from the domain):
+
+```mermaid
+stateDiagram-v2
+ [*] --> pending
+ pending --> processed
+ pending --> cancelled
+ pending --> failed
+ processed --> captured
+ processed --> failed
+ captured --> refunded
+ captured --> charged_back
+ refunded --> [*]
+ captured --> [*]
+ cancelled --> [*]
+ failed --> [*]
+ charged_back --> [*]
+```
+
+The payment struct (simplified) holds:
+
+- The immutable `#domain_InvoicePayment{}`, the parent invoice and party refs
+- `activity` — which internal step is in flight (see below)
+- `target` — the desired terminal session outcome (`processed`, `captured`,
+ `cancelled`, `refunded`)
+- `route` — the current provider+terminal choice
+- `iter` — cascade attempt counter
+- `sessions` — the stack of provider interactions for this payment
+- `cash_flow`, `allocation`, `limits` — financial state
+- Nested `refunds` and `chargebacks` keyed by their IDs
+- `failure`, `retry_attempts`, `repair_scenario` — recovery state
+
+### Payment steps
+
+The `activity()` type in
+[`hg_invoice_payment.erl`](../apps/hellgate/src/hg_invoice_payment.erl) is a
+tagged union — `{payment, Step}`, `{refund, RefundID}`,
+`{chargeback, ChargebackID, ChargebackActivity}`,
+`{adjustment_new | adjustment_pending, AdjustmentID}`, or `idle`. The
+payment branch wraps a `payment_step()`, and it is the steps (not the outer
+`activity()` atoms) that encode the moving pieces of a payment flow. The
+full list of steps, each corresponding to a concrete side-effect:
+
+| `payment_step()` | What it does |
+| --------------------------- | ------------------------------------------------------------------- |
+| `new` | Freshly created, not yet validated. |
+| `shop_limit_initializing` | Shop-level turnover hold via `hg_limiter`. |
+| `shop_limit_failure` | Shop limits exceeded — payment will fail. |
+| `shop_limit_finalizing` | Commit/rollback shop-level hold at the end of the flow. |
+| `risk_scoring` | Calls the inspector (`hg_inspector`) for a risk score. |
+| `routing` | Gathers and ranks candidate routes (`hg_routing`). |
+| `routing_failure` | All candidates rejected — transition to `failed`. |
+| `cash_flow_building` | Computes the final postings with `hg_cashflow:finalize/3`. |
+| `processing_session` | Calls the provider adapter through `hg_session` / `hg_proxy_provider`. |
+| `processing_accounter` | Submits a posting plan to shumway (`plan/2`). |
+| `processing_capture` | Executes a separate capture session for two-step flows. |
+| `processing_failure` | Decides cascade, retry or fail. |
+| `updating_accounter` | Commits/rollbacks the posting plan. |
+| `flow_waiting` | Waiting for an async provider callback. |
+| `finalizing_session` | Cleans up transient session state after a session result. |
+| `finalizing_accounter` | Final accounting commit after capture. |
+
+The *target* of a session encodes what we want to achieve next:
+`processed` (authorise), `captured` (settle), `cancelled` (void),
+`refunded` (reverse).
+
+### Cascade and retries
+
+There are two complementary mechanisms for dealing with provider failures:
+
+- **Cascade** — try the *next* route candidate.
+- **Retry** — try the *same* route again, optionally after a sleep, driven by
+ a `hg_retry` policy.
+
+Cascade is controlled by domain config
+([`#domain_CascadeBehaviour{}`](https://github.com/valitydev/damsel)) and
+implemented in [`hg_cascade.erl`](../apps/hellgate/src/hg_cascade.erl):
+
+```erlang
+is_triggered(Behaviour, OperationFailure, Route, Sessions) -> boolean().
+```
+
+Cascade fires when:
+
+1. The operation failure matches one of the configured mapped error signatures
+ (prefix match over the error notation path — e.g. `preauthorization_failed`
+ covers all its sub-codes), **and**
+2. The user did not interact during the session (no 3DS UI, no OTP step, etc.
+ — see `is_user_interaction_triggered/3`). Replaying a route is pointless
+ after the cardholder made a choice, so UI interactions block cascade.
+
+```mermaid
+flowchart TD
+ F[session failed] --> E{error matches
mapped signature?}
+ E -- no --> TERM[mark payment failed]
+ E -- yes --> UI{ui_occurred
during session?}
+ UI -- yes --> TERM
+ UI -- no --> NEXT{another candidate
available?}
+ NEXT -- no --> TERM
+ NEXT -- yes --> C[bump iter,
rollback limits,
new session on next route]
+ C --> S[(processing_session)]
+```
+
+If both hold, the payment picks the next candidate from the routing context,
+bumps `iter`, and starts a fresh session. Otherwise the failure is terminal.
+
+> [!TIP]
+> A payment that has already shown the user a 3DS redirect will not cascade,
+> because the cardholder has effectively made a choice. If you need to retry
+> in that case it has to go through the cancel/create loop, not cascade.
+
+Retries use [`hg_retry.erl`](../apps/hellgate/src/hg_retry.erl)'s policy
+algebra:
+
+```erlang
+-type policy_spec() ::
+ {linear, retries_num() | {max_total_timeout, pos_integer()}, pos_integer()}
+ | {exponential, retries_num() | {max_total_timeout, pos_integer()}, number(), pos_integer()}
+ | {exponential, retries_num() | {max_total_timeout, pos_integer()}, number(), pos_integer(), timeout()}
+ | {intervals, [pos_integer(), ...]}
+ | {timecap, timeout(), policy_spec()}
+ | no_retry.
+```
+
+Retries are used for session polling, refund reprocessing, and async wait
+loops.
+
+### Allocation and cash flow on payments
+
+When a payment progresses past routing the final cash flow is computed from
+the domain's posting templates and the selected provider/terminal:
+
+- Merchant settlement and guarantee accounts (from the shop config)
+- Provider settlement account (from the chosen provider, by currency)
+- System settlement and subagent accounts (from the payment institution)
+- External income/outcome accounts (selected by varset)
+
+Allocations (split payments) are implemented in
+[`hg_allocation.erl`](../apps/hellgate/src/hg_allocation.erl) but are
+currently disabled — `calculate/5` returns `{error, allocation_not_allowed}`.
+The plumbing is in place for future re-enablement.
+
+## Refund machine
+
+Module: [hg_invoice_payment_refund.erl](../apps/hellgate/src/hg_invoice_payment_refund.erl).
+
+Refunds are sub-machines of a captured payment. A refund holds:
+
+- Its own `#domain_InvoicePaymentRefund{}`
+- A reversed cash flow (source/destination swapped, details marked as
+ reversal)
+- The sessions used to execute the refund on the provider
+- The same route as the original payment (providers require the original
+ transaction)
+- A `status()`: `pending | succeeded | failed`
+
+Refund activities are narrower than payment activities:
+
+```erlang
+-type activity() ::
+ new
+ | session % provider interaction in flight
+ | failure % decide retry or give up
+ | accounter % stage the reversal posting plan
+ | finished.
+```
+
+```mermaid
+stateDiagram-v2
+ [*] --> new
+ new --> session: start refund on provider
+ session --> accounter: provider ack
+ session --> failure: provider error
+ failure --> session: retry policy allows
+ failure --> finished: terminal failure
+ accounter --> finished
+ finished --> [*]
+```
+
+Typical flow:
+
+1. Build the reversed cash flow with [`hg_cashflow:revert/1`](../apps/hellgate/src/hg_cashflow.erl).
+2. Hold the refund's turnover limits (the inverse of the capture hold).
+3. Create a refund session bound to the original route and call
+ `proxy-provider:ProcessRefund` (or a callback-driven flow).
+4. On success, commit the reversal posting plan to shumway; on failure,
+ roll it back and either retry or mark the refund as failed.
+
+## Chargeback machine
+
+Module: [hg_invoice_payment_chargeback.erl](../apps/hellgate/src/hg_invoice_payment_chargeback.erl).
+
+Chargebacks model disputes initiated by the acquirer or card scheme and have
+their own three-stage lifecycle:
+
+```erlang
+-type stage() :: 'chargeback' | 'pre_arbitration' | 'arbitration'.
+-type status() :: 'pending' | 'accepted' | 'rejected' | 'cancelled'.
+```
+
+```mermaid
+stateDiagram-v2
+ direction LR
+ [*] --> chargeback
+ chargeback --> pre_arbitration: reopen
+ pre_arbitration --> arbitration: reopen
+ chargeback --> Terminal
+ pre_arbitration --> Terminal
+ arbitration --> Terminal
+ state Terminal {
+ [*] --> accepted
+ [*] --> rejected
+ [*] --> cancelled
+ }
+```
+
+Each stage can have its own cash-flow plan, kept in the chargeback struct:
+
+```erlang
+#chargeback_st{
+ cash_flow_plans = #{
+ ?chargeback_stage_chargeback() => [],
+ ?chargeback_stage_pre_arbitration() => [],
+ ?chargeback_stage_arbitration() => []
+ }
+}
+```
+
+Operations:
+
+- `create/2` — open a dispute at the `chargeback` stage.
+- `accept/3` — merchant accepts the dispute; apply the stage's posting plan.
+- `reject/3` — merchant disputes the claim.
+- `reopen/3` — move to the next stage (chargeback → pre-arbitration →
+ arbitration).
+- `cancel/3` — drop the chargeback entirely.
+
+Each stage transition can produce a new cash-flow plan so that dispute
+liability is accounted for on every step, not just at the terminal outcome.
+
+## Sessions
+
+Module: [hg_session.erl](../apps/hellgate/src/hg_session.erl).
+
+A *session* is one interaction with a provider adapter. A payment can have
+multiple sessions: one per cascade attempt, plus separate sessions for
+capture, void and refund.
+
+The session struct is roughly:
+
+```erlang
+-type t() :: #{
+ target := target(), % desired terminal status
+ status := active | suspended | finished,
+ trx := 'maybe'(trx_info()),
+ tags := [tag()], % callback tags
+ timeout_behaviour := timeout_behaviour(),
+ context := tag_context(), % invoice/payment id
+ route := route(),
+ payment_info := payment_info(),
+ result => session_result(),
+ proxy_state => binary(), % opaque provider state
+ interaction => interaction(), % 3DS / redirect / OTP
+ ui_occurred => boolean(),
+ timings => timings(),
+ repair_scenario => repair_scenario()
+}.
+```
+
+Session lifecycle:
+
+1. `create/0` — blank session with defaults.
+2. `set_payment_info/2` — attach the data that will be sent to the adapter.
+3. `process/1` — call `proxy-provider:ProcessPayment` (or the relevant op)
+ and interpret the returned intent:
+ - `{finish, FinishIntent}` — adapter is done, extract success/failure.
+ - `{sleep, SleepIntent}` — poll again after a timer.
+ - `{suspend, SuspendIntent}` — suspend and wait for an async callback.
+4. `apply_event/3` — apply a provider callback or a local timeout.
+5. `deduce_activity/1` — derive the next payment activity from the session
+ state.
+
+```mermaid
+stateDiagram-v2
+ [*] --> active: create
+ active --> active: sleep intent (set_timer, re-process)
+ active --> suspended: suspend intent (tag registered)
+ suspended --> active: callback arrives
+ active --> finished: finish intent
+ suspended --> finished: timeout fallback
+ finished --> [*]
+```
+
+Asynchronous callbacks are dispatched by tag. A tag is registered in
+[`hg_machine_tag`](../apps/hellgate/src/hg_machine_tag.erl) at session
+creation time, mapping the tag to `(invoice_id, payment_id)`. When a provider
+`POST`s to the host endpoint at
+`/v1/proxyhost/provider/callback/`, `hg_proxy_host_provider` looks up the
+binding and forwards the callback into the invoice machine, which in turn
+applies it to the session and the payment.
+
+`timeout_behaviour()` encodes what to do when a session times out —
+immediate, polling or callback — and drives the `set_timer` actions that
+Hellgate emits.
+
+## Recurrent paytools
+
+Recurrent paytools are a separate machine type (`recurrent_paytools`
+namespace). A recurrent paytool represents a tokenised payment method
+obtained via `proxy-provider:GenerateToken` and reused for subsequent
+payments without a fresh cardholder interaction.
+
+The token lifecycle runs through its own sessions against the provider, and
+completed tokens are then consumed by payments whose invoice has
+`make_recurrent = true` (see
+[`hg_invoice_registered_payment.erl`](../apps/hellgate/src/hg_invoice_registered_payment.erl)
+for the adjacent "registered" payment path used on the merchant side).
+
+## Invoice templates
+
+Module: [hg_invoice_template.erl](../apps/hellgate/src/hg_invoice_template.erl).
+
+Templates are reusable blueprints that produce an invoice when paired with a
+price and (optionally) mutations. They live in their own namespace
+(`invoice_template`) and expose CRUD plus `ComputeTerms`, which evaluates the
+domain terms applicable to the template's shop to surface fees, available
+payment methods, limits and similar. Templates also support invoice
+mutations (see below).
+
+## Invoice mutations
+
+Module: [hg_invoice_mutation.erl](../apps/hellgate/src/hg_invoice_mutation.erl).
+
+Mutations are deterministic transformations of invoice data applied at invoice
+creation. The only implemented mutation today is `amount` randomisation:
+
+```erlang
+{amount, {randomization, #domain_RandomizationMutationParams{
+ multiplicity = M, % only mutate amounts where amount rem M == 0
+ min_amount = Min,
+ max_amount = Max,
+ direction = upward | downward | both
+}}}
+```
+
+The mutation records both the `original` and the `mutated` amount so that the
+invoice remains auditable. Once applied at creation time, mutations are
+immutable for the life of the invoice.
+
+## Events and auxiliary state
+
+The `aux_state` field in the machine is an opaque cache (msgpack-encoded).
+Each machine module populates it with whatever derived state is expensive to
+recompute by replaying history (e.g. the most recent `#st{}`). The event
+history remains the source of truth: on a cold start, `apply_event/3` can
+rebuild the state from scratch.
+
+Event marshalling is handled by
+[`mg_msgpack_marshalling`](../apps/hellgate/src/mg_msgpack_marshalling.erl) and,
+on the Progressor side, by the `unmarshal_events/1` helpers in
+[`hg_progressor.erl`](../apps/hg_progressor/src/hg_progressor.erl).