One prompt. Many agents. One deliverable. Running on your laptop.
Live Demo · Quick Start · Features · Architecture · Roadmap
- What is Nexus
- Why I Built This
- Live Demo
- Core Features
- Quick Start
- Providers
- Architecture
- Project Layout
- Commands
- Troubleshooting
- Roadmap
- Security Notice
- Contributing
- License
- Acknowledgments
Nexus is an open-source multi-agent harness that takes a single prompt, routes it through a classifier, hands it to an orchestrator that plans with a todo list, and fans work out to research, code, and creative sub-agents. Those agents share a sandboxed filesystem with shell, browser, code execution, Jupyter, and a catalog of 60 MCP tools they reach as files on disk. At the end you get a written report, runnable code, or a generated image — assembled from whatever the agents produced along the way.
Built on LangGraph, DeepAgents, and AIO Sandbox. Runs entirely on your machine. Swap providers by editing .env — or skip API billing entirely by logging in with your existing Claude Max or ChatGPT Plus/Pro subscription.
I like open source because I can pull it apart. Perplexity Computer showed me a shape of product I wanted to exist, and ByteDance's deer-flow showed me it could be built in the open. I wanted my own take on it, running locally, in a stack I actually know: LangChain, LangGraph, and DeepAgents. Nexus is the result — the Docker container and the agents live on your machine, and you swap providers by editing .env.
A static preview of the execution view is deployed at nexus-web-snowy.vercel.app. It runs on mocked data with no backend, so you can explore the UI without any setup.
The full experience requires a LangGraph server and the AIO Sandbox container running locally. See Quick Start.
Skills are structured capability modules — Markdown files that define workflows, best practices, and templates. Nexus ships with five built-in skills: deep research, build app, generate image, data analysis, and write report. Skills are not embedded in the system prompt. They're loaded into the orchestrator's filesystem at startup and read on demand, keeping the context window lean.
Tools follow a two-layer architecture:
- Hot layer (~20 tools) — bound to every sub-agent on every turn. Web search, browser automation, code execution, Jupyter, image generation, and document conversion.
- Cold layer (60 MCP tools) — TypeScript wrapper files under
/home/gem/workspace/servers/in the sandbox. An agent discovers them viamcp_tool_search, reads the wrapper for the schema, and runs it throughsandbox_nodejs_execute.
Why the indirection? Token cost (60 schemas in the system prompt costs ~55K tokens before the conversation starts) and tool selection accuracy (models degrade past 30-50 tools). The whole thing is provider-agnostic — same code path on Google, Anthropic, OpenAI, and Z.AI.
HOT — bound to sub-agents every turn COLD — files in /home/gem/workspace/servers/
research / code sub-agents 60 MCP tools as TypeScript wrapper files
|
v
mcp_tool_search -> wrapper paths
read wrapper file -> schema + example
write Node script -> sandbox_nodejs_execute
Complex tasks rarely fit in a single pass. The orchestrator decomposes them into sub-tasks and delegates to specialised agents, each with its own scoped context, tools, and tier.
| Sub-agent | Tier | Tools |
|---|---|---|
research |
deep-research |
tavily search/extract/map, browser, util-convert, MCP cold catalog |
code |
code |
code/nodejs/jupyter execution, MCP cold catalog |
creative |
image |
generate_image |
general-purpose |
default |
none — defers back to the orchestrator |
Sub-agents are self-contained — they do not inherit tools, prompts, or skills from the orchestrator.
Every task gets its own execution environment with a full filesystem. The agent reads, writes, and edits files. It executes shell commands, runs code, launches a browser, and operates Jupyter notebooks — all inside an isolated Docker container.
/home/gem/workspace/
├── research/task_{id}/ # research agent workspace
├── code/task_{id}/ # code agent workspace
├── creative/task_{id}/ # creative agent workspace
├── orchestrator/ # orchestrator scratch space
├── shared/ # final deliverables
└── servers/ # cold MCP tool wrappers
Agents ask for a tier, not a specific model. Five tiers cover every role:
| Tier | Purpose | Example models |
|---|---|---|
classifier |
Fast routing | Flash Lite, Haiku, nano, GLM-4.7 |
default |
General reasoning | Flash, Sonnet, GPT-5.4, GLM-5 Turbo |
code |
Code generation | Sonnet, Opus, GPT-5.4, GLM-5.1 |
deep-research |
Frontier / long tasks | Gemini 3.1 Pro, Claude Opus 4.6, GPT-5.4, GLM-5.1 |
image |
Image generation | Gemini 3.1 Flash Image |
Set one provider and you're good. Set several and the tier router picks a sensible model per role. The priority order lives in apps/agents/src/nexus/models/registry.ts.
Nexus supports Claude OAuth and Codex CLI as model providers — meaning you can run agents against your existing Claude Max or ChatGPT Plus/Pro subscription instead of paying per-token through the API.
| Provider | What it reuses | How to configure |
|---|---|---|
| Claude OAuth | Claude Max subscription | Set CLAUDE_CODE_OAUTH_TOKEN or drop credentials at ~/.claude/.credentials.json |
| Codex CLI | ChatGPT Plus/Pro subscription | Set CODEX_ACCESS_TOKEN + CODEX_ACCOUNT_ID, or log in via codex CLI |
When present, Claude OAuth takes priority over ANTHROPIC_API_KEY in all tier resolutions. Codex CLI is wired into the code tier only. Both are reported in preflight diagnostics at startup.
Note: Prompt caching is disabled on the Claude OAuth path due to the 4-block
cache_controlcap. Use the API-key path if you need caching.
- Node.js 20+
- Docker (for the AIO Sandbox container)
- At least one model provider (see Providers)
- A Tavily API key for search, extract, and map: tavily.com
-
Clone and install
git clone https://github.com/Berkay2002/nexus.git cd nexus npm install -
Set up environment variables
cp .env.example .env
Fill in at least one provider key plus
TAVILY_API_KEY. Alternatives to API keys:- Claude OAuth: set
CLAUDE_CODE_OAUTH_TOKENor drop~/.claude/.credentials.jsonto reuse a Claude Max subscription - Codex CLI: set
CODEX_ACCESS_TOKEN+CODEX_ACCOUNT_IDor log in viacodexCLI to reuse ChatGPT Plus/Pro - Vertex AI: run
gcloud auth application-default login(no API key needed)
If you're on the GLM Coding Plan, set
ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4. - Claude OAuth: set
-
Start the AIO Sandbox (in its own terminal)
docker run --security-opt seccomp=unconfined --rm -it -p 8080:8080 \ ghcr.io/agent-infra/sandbox:latest
-
Start Nexus
npm run dev
This runs the LangGraph server on
:2024and Next.js on:3000. The startup log shows which providers were detected and how each tier resolved:[Nexus] Preflight [Nexus] Providers: google [OK] (vertex-adc) anthropic [OK] (claude-oauth) openai [OK] (codex-cli) zai [--] (ZAI_API_KEY not set) [Nexus] Tier resolution: classifier -> google:gemini-3.1-flash-lite-preview default -> anthropic:claude-sonnet-4-6 code -> openai:gpt-5.4 (codex) deep-research -> anthropic:claude-opus-4-6 image -> google:gemini-3.1-flash-image-previewNexus fails fast if no provider can satisfy the
defaulttier. No silent fallbacks.
Nexus auto-detects providers from environment variables.
| Provider | Env vars | Tiers covered |
|---|---|---|
| Google (Vertex) | GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION + ADC login |
classifier, default, code, deep-research, image |
| Google (AI Studio) | GEMINI_API_KEY |
classifier, default, code, deep-research, image |
| Anthropic (API) | ANTHROPIC_API_KEY |
classifier, default, code, deep-research |
| Anthropic (OAuth) | CLAUDE_CODE_OAUTH_TOKEN or ~/.claude/.credentials.json |
classifier, default, code, deep-research |
| OpenAI (API) | OPENAI_API_KEY |
classifier, default, code, deep-research |
| OpenAI (Codex CLI) | CODEX_ACCESS_TOKEN + CODEX_ACCOUNT_ID or ~/.codex/auth.json |
code |
| Z.AI (GLM) | ZAI_API_KEY (+ optional ZAI_BASE_URL) |
classifier, default, code, deep-research |
Image generation is Google-only for now. The creative sub-agent disables itself if no Google credentials are present. Claude OAuth takes priority over ANTHROPIC_API_KEY when both are present.
Agents ask for a tier, not a specific model. That's how you swap providers without touching agent code.
The priority order per tier lives in apps/agents/src/nexus/models/registry.ts. Tweak it if you want a different default.
The settings gear in the top-right of the UI opens a panel listing every model the server detected (via /api/models) and lets you override the model per role: orchestrator, router, research, code, creative. Overrides are session-scoped — a reload resets to defaults.
Three processes, talking only over HTTP.
AIO Sandbox (Docker :8080) <--> LangGraph dev server (:2024) <--> Next.js (:3000)
- AIO Sandbox — one Docker container shared by all agents: shell, browser, filesystem, Jupyter. Workspace root is
/home/gem/workspace/. - LangGraph server — hosts the meta-router, orchestrator, and sub-agents. The orchestrator is a DeepAgent with a
CompositeBackendthat routes/memories/and/skills/to SQLite (via Drizzle) and everything else to the sandbox. - Next.js frontend — streams subagent messages, todos, and tool calls via
useStreamfrom@langchain/react. The execution view renders a todo panel, agent status, live subagent cards, a workspace outputs panel, and dedicated artifact renderers for filesystem ops, code execution, and image generation.
Full design spec: docs/superpowers/specs/2026-04-10-nexus-design.md.
nexus/
├── apps/
│ ├── agents/ # LangGraph server (Node 20, DeepAgents)
│ │ └── src/nexus/
│ │ ├── graph.ts # Meta-router + orchestrator wiring
│ │ ├── models/ # Tier-based provider registry
│ │ ├── agents/ # Sub-agent definitions
│ │ │ ├── research/
│ │ │ ├── code/
│ │ │ ├── creative/
│ │ │ └── general-purpose/
│ │ ├── tools/ # LangChain tool wrappers
│ │ │ ├── search/
│ │ │ ├── extract/
│ │ │ ├── map/
│ │ │ ├── generate-image/
│ │ │ ├── browser-*/
│ │ │ ├── code-*/
│ │ │ ├── nodejs-*/
│ │ │ ├── jupyter-*/
│ │ │ └── util-convert-to-markdown/
│ │ ├── skills/ # Orchestrator skills (SKILL.md + templates)
│ │ │ ├── deep-research/
│ │ │ ├── build-app/
│ │ │ ├── generate-image/
│ │ │ ├── data-analysis/
│ │ │ └── write-report/
│ │ ├── backend/ # AIO Sandbox + Composite + Store
│ │ ├── middleware/ # Per-role model swap, runtime instructions
│ │ └── db/ # SQLite schema (Drizzle ORM)
│ │
│ └── web/ # Next.js 16 / React 19 frontend
│ └── src/
│ ├── app/
│ │ ├── page.tsx # Landing <-> execution switch
│ │ └── demo/page.tsx # Mocked demo (Vercel-deployable)
│ ├── components/
│ │ ├── execution/ # Todo panel, agent cards, prompt bar,
│ │ │ # workspace outputs, artifact renderers
│ │ ├── landing/ # Logo, tagline, prompt input
│ │ └── settings/ # Runtime model override panel
│ ├── hooks/ # useNexusStream, etc.
│ └── providers/ # LangGraph client + Stream provider
│
└── docs/ # Design specs and plans
| Command | What it does |
|---|---|
npm run dev |
Start both servers (LangGraph :2024 + Next.js :3000) |
npm run build |
Build all workspaces via Turbo |
npm run lint |
Lint everything |
npm run lint:fix |
Lint with auto-fix |
npm run format |
Prettier format |
cd apps/agents && npm test |
Agent unit tests (no credentials needed) |
| Problem | Fix |
|---|---|
No provider can satisfy the 'default' tier |
No provider env vars detected. Set at least one of GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY, ZAI_API_KEY, Vertex ADC, Claude OAuth, or Codex CLI credentials. |
| Creative sub-agent disabled | Image generation needs Google. Add a Google credential. |
| Vertex AI auth errors | Re-run gcloud auth application-default login and check GOOGLE_CLOUD_PROJECT. |
| Z.AI returns 404 / model-not-found | You're on the GLM Coding Plan. Set ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4. |
| "Cannot reach LangGraph server" | npm run dev isn't running, or it crashed during preflight. Check the terminal. |
| "AIO Sandbox unreachable" | Start the Docker container (step 3 above). |
| "TAVILY_API_KEY is not set" | Fill in .env and restart. |
| Claude OAuth not detected | Check that CLAUDE_CODE_OAUTH_TOKEN is set or ~/.claude/.credentials.json exists. Token may have expired — re-export from Claude Code. |
| Codex CLI not detected | Ensure both CODEX_ACCESS_TOKEN and CODEX_ACCOUNT_ID are set, or run codex to populate ~/.codex/auth.json. |
MVP is done. What's next is less about shipping features and more about making the thing feel good to use. Full descriptions in ROADMAP.md.
Now
docker compose upfor the whole stack- Cost and token meter per run
- Async / resumable runs (survive page reloads)
Next
- Interruptible agents with a redirect input
- "Why did you do that" inspector on every tool call
- Editable
AGENTS.mdfor project-level instructions - Critic sub-agent that reviews drafts before synthesis
- LangSmith trace integration in the UI
- Context caching across providers
Later
- Nexus exposes itself as an MCP server
- Import skills from a Git URL
Nexus is designed to run in a local trusted environment — your laptop, accessible only via 127.0.0.1. If you expose it to a LAN, public cloud, or the internet without strict security measures, you risk:
- Unauthorized execution — the sandbox runs shell commands, writes files, and browses the web. An unauthenticated endpoint becomes an open RCE vector.
- Data exposure — agent conversations, workspace files, and API keys could be accessed by anyone who can reach the ports.
Recommendations:
- Keep Nexus behind
localhost. If you need remote access, put it behind an authenticated reverse proxy. - Never expose the AIO Sandbox port (
:8080) to untrusted networks. - Treat
.envas secrets — it contains API keys. - Review the AIO Sandbox's
--security-opt seccomp=unconfinedflag and tighten it for production use.
Contributions are welcome. Nexus is a solo project right now, but if you want to help:
- Fork the repo and create a feature branch.
- Follow existing patterns — read
CLAUDE.mdand.claude/rules/for conventions. - Run
npm run lintandcd apps/agents && npm testbefore opening a PR. - Keep PRs focused — one feature or fix per PR.
If you're not sure where to start, check the Roadmap for ideas or open an issue to discuss.
MIT. See LICENSE.
Inspired by Perplexity Computer and ByteDance's deer-flow.
Built on:
- DeepAgents — orchestrator and sub-agent framework
- LangGraph — agent runtime and streaming
- LangChain — LLM abstractions and tool definitions
- AIO Sandbox — isolated execution environment
- Tavily — web search, extract, and map APIs
