Low-cost plasticity diagnostics for reinforcement learning models.
RLPlasticity helps you answer a practical question before you spend time on expensive reruns or ablations:
Does this RL model still look capable of adapting, and where does a plasticity problem seem to concentrate?
This project is built for fast triage, not experiment replacement. It is most useful when training gets weird and you want a cheap first pass before touching reward design, environment code, or large training jobs.
Technical Report: GitHub Pages report
Clone the repository, create a virtual environment, install with the PyTorch extra, and run the bundled demo:
git clone https://github.com/zzy0304Raymond/RLPlasticityLab.git
cd RLPlasticityLab
python -m venv .venvPowerShell:
.venv\Scripts\Activate.ps1
pip install -e ".[torch]"
python -m examples.rl_actor_case --output-dir reports/demo_rl_actor_case --runmacOS / Linux:
source .venv/bin/activate
pip install -e ".[torch]"
python -m examples.rl_actor_case --output-dir reports/demo_rl_actor_case --runIf you only want to scan a checkpoint:
rlplasticity scan --checkpoint actor.ptHealthy checkpoint vs a checkpoint with a frozen encoder:
Healthy plasticity probe output:
RLPlasticity Report | analyzer=plasticity/default | kind=plasticity_probe | evidence=update | layers=3
loss=1.448104
summary=No acute issue detected. Plasticity score=1.000 (update-active=1.000, grad-active=1.000)
Findings
- No diagnostic rule fired.
Frozen-encoder plasticity probe output:
RLPlasticity Report | analyzer=plasticity/default | kind=plasticity_probe | evidence=update | layers=3
loss=1.448104
summary=The encoder is adapting less than downstream layers.
Findings
- [medium/medium] The encoder is adapting less than downstream layers.
evidence: encoder_plasticity_score=0.000
evidence: reference_downstream_score=1.000
Real generated artifacts:
- Healthy plasticity text: docs/showcase/healthy_plasticity_probe.txt
- Healthy plasticity JSON: docs/showcase/healthy_plasticity_probe.json
- Healthy plasticity HTML: docs/showcase/healthy_plasticity_probe.html
- Frozen encoder text: docs/showcase/frozen_encoder_plasticity_probe.txt
- Frozen encoder JSON: docs/showcase/frozen_encoder_plasticity_probe.json
- Frozen encoder HTML: docs/showcase/frozen_encoder_plasticity_probe.html
Example JSON excerpt:
{
"analyzer_name": "plasticity/default",
"snapshot": {
"kind": "plasticity_probe",
"evidence_level": "update",
"loss": 1.448103904724121
},
"metrics": {
"plasticity_score": {
"value": 1.0,
"summary": "Plasticity score=1.000 (update-active=1.000, grad-active=1.000)"
},
"encoder_plasticity_score": {
"value": 1.0,
"summary": "encoder plasticity score=1.000"
}
},
"findings": []
}Use RLPlasticity when you want to inspect:
- a single
actor.ptorpolicy.pt - a checkpoint plus model code
- a checkpoint plus model plus a small batch of replay or environment samples
The toolkit is designed to help with questions like:
- Does this checkpoint look structurally suspicious?
- Is the model responding normally on real inputs?
- Are gradients and updates still reaching the encoder, trunk, and heads?
- Does the issue look global, encoder-side, or head-side?
This first release does not try to decide whether a failure comes from:
- reward design
- exploration
- data collection
- environment bugs
- training orchestration issues outside the model
It focuses only on model plasticity and reports evidence with explicit caveats.
RLPlasticity supports three progressively stronger workflows.
You have:
- only a checkpoint or
state_dict
You get:
- parameter norm and sparsity statistics
encoder / trunk / policy / valuegrouping- low-confidence structural hints
Best for:
- "I only have
actor.pt; is anything obviously strange?"
You have:
- a loadable model
- one batch of samples
- optionally a checkpoint to load
You get:
- forward-only activation health
- low-response hints
- a cheap sanity check before running update probes
Best for:
- "Does this model even respond normally on real observations?"
You have:
- a loadable model
- one or more batches
- a loss function
- an optimizer
- optionally a checkpoint to load
You get:
- gradient reachability
- relative update strength
- stagnant-layer statistics
- encoder/trunk/head plasticity hints
Best for:
- "Is this checkpoint still learning, or has part of the model gone stale?"
Install the package:
pip install -e .Install with PyTorch support:
pip install -e ".[torch]"The project currently exposes five main Python workflows:
scan_checkpoint(...)probe_model(...)probe_plasticity(...)probe_plasticity_window(...)probe_checkpoint_sequence(...)
Convenience helpers are also available for:
- builder-based probing
probe_model_from_builder(...)probe_plasticity_from_builder(...)
- single training-step probing
probe_training_step(...)
- light integration helpers
rlplasticity.integrations.pytorchrlplasticity.integrations.cleanrlrlplasticity.integrations.sb3rlplasticity.integrations.session
from rlplasticity import scan_checkpoint
report = scan_checkpoint("actor.pt")
print(report.to_text())from rlplasticity import probe_plasticity_window
report = probe_plasticity_window(
model,
batches,
loss_fn=loss_fn,
optimizer=optimizer,
max_steps=8,
)
print(report.to_text())from rlplasticity import probe_checkpoint_sequence
report = probe_checkpoint_sequence(
build_model,
checkpoints,
batches,
loss_fn=loss_fn,
optimizer_builder=build_optimizer,
max_steps=4,
)
print(report.to_text())from rlplasticity import probe_model
report = probe_model(model, batch_obs)
print(report.to_text())from rlplasticity import probe_plasticity
report = probe_plasticity(
model,
[batch],
loss_fn=loss_fn,
optimizer=optimizer,
)
print(report.to_text())Static checkpoint scan:
rlplasticity scan --checkpoint actor.ptForward-only probe:
rlplasticity probe-model \
--builder mypkg.models:build_actor \
--samples batch.pt \
--checkpoint actor.pt \
--forward mypkg.probes:forward_batchPlasticity probe:
rlplasticity probe-plasticity \
--builder mypkg.models:build_actor \
--samples batch.pt \
--loss mypkg.losses:actor_loss \
--optimizer mypkg.optim:build_optimizer \
--checkpoint actor.pt \
--max-steps 8Plasticity window:
rlplasticity probe-window \
--builder mypkg.models:build_actor \
--samples replay_window.pt \
--loss mypkg.losses:actor_loss \
--optimizer mypkg.optim:build_optimizer \
--checkpoint actor.pt \
--max-steps 8Checkpoint sequence:
rlplasticity probe-sequence \
--builder mypkg.models:build_actor \
--samples replay_window.pt \
--loss mypkg.losses:actor_loss \
--optimizer mypkg.optim:build_optimizer \
--checkpoints ckpt_10.pt ckpt_20.pt ckpt_30.pt \
--max-steps 4This repository ships a demo actor case with:
build_actorbuild_optimizerforward_batchactor_lossexport_demo_artifactsexport_training_sequence_artifacts
Generate a reusable demo checkpoint and batch:
python -m examples.rl_actor_case --output-dir reports/demo_rl_actor_caseRun the demo end-to-end:
python -m examples.rl_actor_case --output-dir reports/demo_rl_actor_case --runOr probe the generated files through the CLI:
rlplasticity probe-model \
--builder examples.rl_actor_case:build_actor \
--samples reports/demo_rl_actor_case/batch.pt \
--checkpoint reports/demo_rl_actor_case/actor.pt \
--forward examples.rl_actor_case:forward_batchrlplasticity probe-plasticity \
--builder examples.rl_actor_case:build_actor \
--samples reports/demo_rl_actor_case/batch.pt \
--loss examples.rl_actor_case:actor_loss \
--optimizer examples.rl_actor_case:build_optimizer \
--checkpoint reports/demo_rl_actor_case/actor.pt \
--max-steps 1Generate the homepage showcase artifacts:
python -m examples.showcase_reports --output-dir docs/showcaseGenerate the release validation suite:
python -m examples.validation_suite --output-dir docs/validationGenerate a short checkpoint sequence demo:
python - <<'PY'
from examples.rl_actor_case import export_training_sequence_artifacts
print(export_training_sequence_artifacts("reports/demo_sequence"))
PYCommon RL-shaped examples:
- PPO-like actor-critic:
python -m examples.ppo_like_case - DQN-like value network:
python -m examples.dqn_like_case - SAC-like actor/critic with checkpoint sequence:
python -m examples.sac_like_case --output-dir reports/demo_sac_sequence
These examples are framework-independent PyTorch scripts, but they are structured to resemble real PPO / DQN / SAC training code and show how to plug in the helper APIs.
Session-style trainer integration:
from rlplasticity.integrations.session import TrainingProbeSession
session = TrainingProbeSession(
model=model,
optimizer=optimizer,
loss_fn=loss_fn,
metadata={"algo": "ppo"},
)
step_report = session.probe_step(batch)
window_report = session.probe_window(replay_window, max_steps=8)Each report includes:
- analysis mode
- evidence strength
- key metrics
- findings
- caveats
- optional history rows for windows and checkpoint sequences
- trend summaries, including when decline first becomes visible when history is available
- a shortlist of layers worth checking next
Reports can be rendered as:
- text
- HTML
- JSON via
report.to_dict()
src/rlplasticity/
core/ # shared schemas, enums, aggregation, base interfaces
ingest/ # checkpoint loading and static summarization
integrations/ # lightweight helpers for raw PyTorch, CleanRL, and SB3-style setups
probes/ # evidence collection workflows
plasticity/ # metrics, rules, analyzers
reporting/ # text/html rendering
api.py # public Python workflows
cli.py # command-line entry point
See docs/ARCHITECTURE.md for the framework design.
Release-oriented docs:
- Architecture: docs/ARCHITECTURE.md
- Validation: docs/VALIDATION.md
- Report schema: docs/REPORT_SCHEMA.md
- Troubleshooting: docs/TROUBLESHOOTING.md
- Roadmap: docs/ROADMAP.md
- Changelog: docs/CHANGELOG.md
- Release checklist: docs/RELEASE_CHECKLIST.md
- Release report: docs/RELEASE_REPORT_v0.1.0.md
This repository currently uses:
- License: MIT, see LICENSE
- Contribution guide: CONTRIBUTING.md
- Code of conduct: CODE_OF_CONDUCT.md
- Security policy: SECURITY.md
This project currently works with PyTorch checkpoints and torch.load(...).
That means:
- only load checkpoints you trust
- treat untrusted model files as potentially unsafe
- prefer isolated environments when inspecting third-party artifacts
See SECURITY.md for the project policy.
Run the test suite in PowerShell:
$env:PYTHONPATH="src;."
python -m unittest discover -s tests -vRun the test suite in bash:
PYTHONPATH=src:. python -m unittest discover -s tests -vCurrent coverage includes:
- analyzer logic
- CLI behavior
- report serialization
- robustness edge cases
- trend-aware window and checkpoint-sequence probing
- optional PyTorch API and CLI integration
Planned next:
- more architecture families beyond the bundled demo actor
- stronger trainer-specific helper coverage
- lower-overhead online monitoring