awesome-interpretability

Mechanistic interpretability libraries

BauKit - light, simple, and well loved
TransformerLens
- uses jaxtyping, aliases models into a common interface, not as huggingface compatible as other libs
- an extremely opinionated toolkit for doing whatever you want to specific models,
Tuned Lens - tools for looking at how transformer predictions are built layer-by-layer
vgel/repeng - A library for making RepE control vectors
nnsight
- To customize a model, instead of running it as a function, you run it as a "with" context. Inside "with" you can write regular pytorch to modify the computation.
- aim to keep it as simple as bauket eventually, and support remote mechinterp. HuggingFace compatible
Pyvene (intervention focused)
- pyvene tries to be HuggingFace-native, supporting pre-defined interventions or customized interventions (below).
penzai - jax-based, not HuggingFace-native
ViT-Prisma - mechanistic interpretability for vision and video transformers
Transformer Debugger (OpenAI) - not HuggingFace-native
Graphpatch - promising but abandoned
NeuroX
A tutorial on doing it manually
cupbearer A library for mechanistic anomaly detection
Overcomplete - vision SAE toolbox
vLLM-Hook - program internal states of vLLM-served models
vllm-lens - extract residual stream activations and apply steering vectors in vLLM
Neuronpedia - public feature/neuron browser
Docent - interactive model explanation and steering interface

Explainability, counterfactuals and probing

captum
inseq
Explabox (2022)
IBM: AIX360 (2019)
Microsoft: Responsible AI Toolbox (2021)
- Dashboard that integrates: Error analysis, Fairlearn, InterpretML, DiCE, EconML and Data Balance
InterpretML
- SHAP, Mimic and LIME explainers. Permutation feature importance.
MI2.ai
- DrWhy (2019)
  - DALEX, survex, Arena, fairmodels,
- Currently working on: ARES, xSurvival, Large Model Analysis
XAI (2018)
ELI5
NN-SVG
Neptune-AI blog
Neptune-AI blog
AI Ethics tool landscape

Adapters

See this lit review of Adapter intervention types

Steering

TODO format https://github.com/vgel/repeng https://github.com/IBM/AISteer360 https://github.com/wassname/ssteer-eval-aware https://github.com/IBM/activation-steering https://github.com/chili-lab/Spherical-Steering https://github.com/safety-research/weight-steering

Structured output

jsonformer
- doesn't do enums. huggingface only
prob_jsonformer - Jsonformer, but it can output the probability of each choice in a single pass. Has enum
outlines
Microsoft Guidance
lmql.ai
llama.cpp grammar
langchain output_parsers
salute - typescript
TypeChat - typescript
guardrails
clownfish - 2023 Modifying Transformers to Follow a JSON Schema - not updated
relm - 2023 Regular Expression engine for Language Models - not updated
Constrained-Text-Generation-Studio
kor
lm-format-enforcer - remote api's
instructor - for remote api's without logits
Promptify

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-interpretability

Mechanistic interpretability libraries

Explainability, counterfactuals and probing

Adapters

Steering

Structured output

See more

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

awesome-interpretability

Mechanistic interpretability libraries

Explainability, counterfactuals and probing

Adapters

Steering

Structured output

See more

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages