Operational Ontology Framework

How to run AI agents in production with auditable state, no persistent memory in the model.

LLM-based agents become production-ready when they operate within an explicit control envelope, not when the model gets smarter.


Author	Felipe Silva
Version	2.1
License	CC BY 4.0
Full framework (13 sections)	fstech.digital/framework

🇧🇷 Leia em português

Quick Start

git clone https://github.com/fstech-digital/operational-ontology-framework.git
cd operational-ontology-framework

pip install -r requirements.txt
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env

python agent.py examples/customer-support

# Start your own project
mkdir my-project
cp templates/_pin.md my-project/
cp templates/_spec.md my-project/
# Edit with your domain rules and tasks
python agent.py my-project

Options: --all (run all tasks) · --model claude-haiku-4-5-20251001 (change model) · --dry-run (inspect without calling LLM)

Other providers:

# OpenAI
ADAPTER=openai OPENAI_API_KEY=sk-... python agent.py examples/customer-support --model gpt-4o

# Ollama (local models — no API key needed)
ADAPTER=ollama python agent.py examples/customer-support --model gemma4:e2b

The Problem

LLM agents in production face three structural problems: no audit trail of decisions, state loss between sessions, and circular self-validation. The common fix (vector databases, embeddings) trades one problem for another: the agent now remembers diffusely, with no guarantee of which facts survived.

The Solution: 5 Document Artifacts

Memory lives in the filesystem, not in the model. Four markdown files, versioned with git.

Artifact	Role	Volatility
Pin	Immutable rules, domain entities, decision routes	Rarely changes
Spec	Task checklist, execution state, learnings	Changes every session
Handoff	Session memory: decisions, results, continuation briefing	New file each session
Facts	Long-term accumulated knowledge with metadata	Grows over project lifetime
Skills	Reusable procedures that improve through practice	Refined every execution

Pin = what the project is. Spec = what to do. Handoff = what happened. Facts = what was learned. Skills = what the agent knows how to do.

Execution Cycle

Five phases. Each explicit, each generates an artifact.

Boot (+ Retrieval) → Execute → Write-back → Consolidate → Handoff
     ↑                                                        |
     └────────────────────────────────────────────────────────┘

Boot — Load Pin + Spec + Facts + latest Handoff. Query Fact Store for relevant context.
Execute — Work through tasks. Degrade signal: repetition, lost references, generic responses, or >50% context window consumed.
Write-back — Verify, commit, mark done, annotate learning. One task, one commit.
Consolidate — Promote session facts to long-term Fact Store. Flag stale facts (>90 days unverified) for review.
Handoff — Structured record for next session. Always new session, never compaction.

Key Properties

Model-agnostic. Runs with Claude, GPT, Gemini, local models via built-in adapters (Anthropic, OpenAI, Ollama). Tested across provider migrations with zero state loss.

Memory sovereignty. Memory lives in versioned files you control. Switch providers tomorrow, state survives. Not a feature, a consequence of the architecture.

Auditability by design. Every decision is recorded with justification. Full traceability via git log and git bisect.

Agent writes its own memory. Write-back doesn't depend on human discipline. The agent updates Spec, generates Handoff, commits changes. The discipline is in the prompt, not in people training.

How It Compares

	OOF (this)	LangChain / LangGraph	CrewAI	Mem0	MemPalace	ESAA	Letta	Palantir Foundry
What it solves	Auditable state + reusable skills	Call orchestration + memory	Multi-agent orchestration	Standalone memory layer	Episodic recall	State for code agents	OS-like memory runtime	Enterprise ontology
Memory lives in	Filesystem (markdown + git)	Vector DB (Chroma, pgvector)	Vector + hierarchical scopes	Vector + Graph (Pro)	Verbatim + ChromaDB	Event log (JSONL + SHA-256)	Tiered (core/recall/archival)	Centralized ontology
Infra required	None (filesystem only)	Vector DB + checkpointer	ChromaDB or similar	Qdrant/Chroma/pgvector	Python + ChromaDB	JSONL + YAML contracts	Docker + LLM providers	Full proprietary stack
Auditable decisions	Yes (decisions + reasoning + git bisect)	Partial (execution logs)	Partial (memory scopes)	No (similarity recall)	No (probabilistic recall)	Yes (deterministic replay + hash)	Partial (agent-managed)	Yes (digital twin)
Model portable	Yes (tested across Claude, Gemini, local)	No (LangGraph runtime)	No (CrewAI runtime)	Yes (framework-agnostic)	Partial (local only)	No (code agents only)	Partial (multi-LLM)	No (vendor lock-in)
Long-term memory	Yes (Fact Store v2.0)	Via LangMem SDK	Resets between runs	Yes (vector + graph)	Yes (verbatim store)	Event replay	Yes (archival tier)	Yes (ontology)
Agent writes own memory	Yes (write-back in cycle)	No (external persistence)	No (resets on exit)	No (SDK calls)	No (MCP tools)	No (orchestrator writes)	Yes (self-editing context)	No (platform manages)
Production evidence	6 months, 5 agents, 3 clients	Fortune 500 via LangSmith	60% Fortune 500	SOC 2, HIPAA	3 days old	Paper published, no production	Startups	Fortune 100, governments
Cost	Free (CC BY 4.0)	Free OSS + paid cloud	Free OSS + Enterprise	$0–249/mo	Free	Free	$0–200/mo	$M+/year

Key differences:

Orchestration frameworks (LangChain, CrewAI) solve how agents call tools. OOF solves how agents maintain state between calls. They complement each other.
Memory layers (Mem0, MemPalace, Letta) solve recall (finding relevant facts). OOF solves governance (recording decisions with justification, ensuring portability).
ESAA shares the same principle (memory outside the model, auditable) but targets code agents with event sourcing. OOF targets any business domain with document artifacts.
Palantir Foundry validates the same thesis at enterprise scale. OOF delivers the same paradigm with radically simpler infrastructure (markdown + git vs. proprietary stack).

In Production

Six months, five agents, three clients. Fleet migrated across Claude versions, Gemma (local), and mixed commercial providers. Zero state loss in any transition.

External case: VJ Turrini (business consulting) adopted in four weeks. Agent handles client queries, schedules, and auditable decision history. Path: consultation → trust → automation.

Repo Structure

├── agent.py                  # Reference implementation (~480 lines)
├── adapters.py               # LLM provider adapters (Anthropic, OpenAI, Ollama)
├── test_agent.py             # Unit tests (pytest, 38+ tests)
├── templates/
│   ├── _pin.md               # Blank Pin with guidance
│   ├── _spec.md              # Blank Spec with guidance
│   ├── _handoff.md           # Blank Handoff with guidance
│   ├── _facts.md             # Blank Fact Store with guidance
│   └── _skills.md            # Blank Skills catalog with guidance
├── examples/
│   └── customer-support/     # Complete fictional project
│       ├── _pin.md
│       ├── _spec.md
│       ├── _facts.md
│       └── handoffs/
├── requirements.txt          # anthropic>=0.40.0
└── .env.example

Limits

Not for pure chatbots (no side effects = no need for state)
Not for exploratory prototypes (formalizing too early crystallizes wrong hypotheses)
Not for stateless systems (FaaS, pure transformations)
Empirical metrics (boot failure rate, handoff recovery time) are declared as pending

Full Framework

The complete framework (13 sections) with D+L+A triad, component mapping, write-back protocol, memory sovereignty, N5 validation, and declared empirical gaps:

→ fstech.digital/framework

About FSTech

Brazilian consultancy. Executable ontologies and AI agents in production. The product is the operation, not the document.

Site · Newsletter · X @fs_tech_ · Contact

Operational Ontology Framework v2.1 · April 2026 · CC BY 4.0

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
examples/customer-support		examples/customer-support
templates		templates
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.pt-BR.md		README.pt-BR.md
adapters.py		adapters.py
agent.py		agent.py
requirements.txt		requirements.txt
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Operational Ontology Framework

Quick Start

The Problem

The Solution: 5 Document Artifacts

Execution Cycle

Key Properties

How It Compares

In Production

Repo Structure

Limits

Full Framework

About FSTech

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Operational Ontology Framework

Quick Start

The Problem

The Solution: 5 Document Artifacts

Execution Cycle

Key Properties

How It Compares

In Production

Repo Structure

Limits

Full Framework

About FSTech

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages