GitHub - Berkay2002/nexus

One prompt. Many agents. One deliverable. Running on your laptop.

Live Demo · Quick Start · Features · Architecture · Roadmap

What is Nexus

Nexus is an open-source multi-agent harness that takes a single prompt, routes it through a classifier, hands it to an orchestrator that plans with a todo list, and fans work out to research, code, and creative sub-agents. Those agents share a sandboxed filesystem with shell, browser, code execution, Jupyter, and a catalog of 60 MCP tools they reach as files on disk. At the end you get a written report, runnable code, or a generated image — assembled from whatever the agents produced along the way.

Built on LangGraph, DeepAgents, and AIO Sandbox. Runs entirely on your machine. Swap providers by editing .env — or skip API billing entirely by logging in with your existing Claude Max or ChatGPT Plus/Pro subscription.

Why I Built This

I like open source because I can pull it apart. Perplexity Computer showed me a shape of product I wanted to exist, and ByteDance's deer-flow showed me it could be built in the open. I wanted my own take on it, running locally, in a stack I actually know: LangChain, LangGraph, and DeepAgents. Nexus is the result — the Docker container and the agents live on your machine, and you swap providers by editing .env.

Live Demo

A static preview of the execution view is deployed at nexus-web-snowy.vercel.app. It runs on mocked data with no backend, so you can explore the UI without any setup.

The full experience requires a LangGraph server and the AIO Sandbox container running locally. See Quick Start.

Core Features

Skills & Tools

Skills are structured capability modules — Markdown files that define workflows, best practices, and templates. Nexus ships with five built-in skills: deep research, build app, generate image, data analysis, and write report. Skills are not embedded in the system prompt. They're loaded into the orchestrator's filesystem at startup and read on demand, keeping the context window lean.

Tools follow a two-layer architecture:

Hot layer (~20 tools) — bound to every sub-agent on every turn. Web search, browser automation, code execution, Jupyter, image generation, and document conversion.
Cold layer (60 MCP tools) — TypeScript wrapper files under /home/gem/workspace/servers/ in the sandbox. An agent discovers them via mcp_tool_search, reads the wrapper for the schema, and runs it through sandbox_nodejs_execute.

Why the indirection? Token cost (60 schemas in the system prompt costs ~55K tokens before the conversation starts) and tool selection accuracy (models degrade past 30-50 tools). The whole thing is provider-agnostic — same code path on Google, Anthropic, OpenAI, and Z.AI.

HOT — bound to sub-agents every turn       COLD — files in /home/gem/workspace/servers/
research / code sub-agents                  60 MCP tools as TypeScript wrapper files
                                       |
                                       v
                          mcp_tool_search   ->   wrapper paths
                          read wrapper file ->   schema + example
                          write Node script ->   sandbox_nodejs_execute

Sub-Agents

Complex tasks rarely fit in a single pass. The orchestrator decomposes them into sub-tasks and delegates to specialised agents, each with its own scoped context, tools, and tier.

Sub-agent	Tier	Tools
`research`	`deep-research`	tavily search/extract/map, browser, util-convert, MCP cold catalog
`code`	`code`	code/nodejs/jupyter execution, MCP cold catalog
`creative`	`image`	`generate_image`
`general-purpose`	`default`	none — defers back to the orchestrator

Sub-agents are self-contained — they do not inherit tools, prompts, or skills from the orchestrator.

Sandbox & File System

Every task gets its own execution environment with a full filesystem. The agent reads, writes, and edits files. It executes shell commands, runs code, launches a browser, and operates Jupyter notebooks — all inside an isolated Docker container.

/home/gem/workspace/
  ├── research/task_{id}/     # research agent workspace
  ├── code/task_{id}/         # code agent workspace
  ├── creative/task_{id}/     # creative agent workspace
  ├── orchestrator/           # orchestrator scratch space
  ├── shared/                 # final deliverables
  └── servers/                # cold MCP tool wrappers

Provider-Agnostic Models

Agents ask for a tier, not a specific model. Five tiers cover every role:

Tier	Purpose	Example models
`classifier`	Fast routing	Flash Lite, Haiku, nano, GLM-4.7
`default`	General reasoning	Flash, Sonnet, GPT-5.4, GLM-5 Turbo
`code`	Code generation	Sonnet, Opus, GPT-5.4, GLM-5.1
`deep-research`	Frontier / long tasks	Gemini 3.1 Pro, Claude Opus 4.6, GPT-5.4, GLM-5.1
`image`	Image generation	Gemini 3.1 Flash Image

Set one provider and you're good. Set several and the tier router picks a sensible model per role. The priority order lives in apps/agents/src/nexus/models/registry.ts.

Bring Your Own Subscription

Nexus supports Claude OAuth and Codex CLI as model providers — meaning you can run agents against your existing Claude Max or ChatGPT Plus/Pro subscription instead of paying per-token through the API.

Provider	What it reuses	How to configure
Claude OAuth	Claude Max subscription	Set `CLAUDE_CODE_OAUTH_TOKEN` or drop credentials at `~/.claude/.credentials.json`
Codex CLI	ChatGPT Plus/Pro subscription	Set `CODEX_ACCESS_TOKEN` + `CODEX_ACCOUNT_ID`, or log in via `codex` CLI

When present, Claude OAuth takes priority over ANTHROPIC_API_KEY in all tier resolutions. Codex CLI is wired into the code tier only. Both are reported in preflight diagnostics at startup.

Note: Prompt caching is disabled on the Claude OAuth path due to the 4-block cache_control cap. Use the API-key path if you need caching.

Quick Start

Prerequisites

Node.js 20+
Docker (for the AIO Sandbox container)
At least one model provider (see Providers)
A Tavily API key for search, extract, and map: tavily.com

Configuration

Clone and install

git clone https://github.com/Berkay2002/nexus.git
cd nexus
npm install

Set up environment variables
```
cp .env.example .env
```
Fill in at least one provider key plus TAVILY_API_KEY. Alternatives to API keys:
- Claude OAuth: set CLAUDE_CODE_OAUTH_TOKEN or drop ~/.claude/.credentials.json to reuse a Claude Max subscription
- Codex CLI: set CODEX_ACCESS_TOKEN + CODEX_ACCOUNT_ID or log in via codex CLI to reuse ChatGPT Plus/Pro
- Vertex AI: run gcloud auth application-default login (no API key needed)
If you're on the GLM Coding Plan, set ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4.

Running the Application

Start the AIO Sandbox (in its own terminal)

docker run --security-opt seccomp=unconfined --rm -it -p 8080:8080 \
  ghcr.io/agent-infra/sandbox:latest

Start Nexus

npm run dev

This runs the LangGraph server on :2024 and Next.js on :3000. The startup log shows which providers were detected and how each tier resolved:

[Nexus] Preflight
[Nexus] Providers:
  google      [OK] (vertex-adc)
  anthropic   [OK] (claude-oauth)
  openai      [OK] (codex-cli)
  zai         [--] (ZAI_API_KEY not set)
[Nexus] Tier resolution:
  classifier    -> google:gemini-3.1-flash-lite-preview
  default       -> anthropic:claude-sonnet-4-6
  code          -> openai:gpt-5.4 (codex)
  deep-research -> anthropic:claude-opus-4-6
  image         -> google:gemini-3.1-flash-image-preview

Nexus fails fast if no provider can satisfy the default tier. No silent fallbacks.

Open http://localhost:3000

Providers

Nexus auto-detects providers from environment variables.

Provider	Env vars	Tiers covered
Google (Vertex)	`GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION` + ADC login	classifier, default, code, deep-research, image
Google (AI Studio)	`GEMINI_API_KEY`	classifier, default, code, deep-research, image
Anthropic (API)	`ANTHROPIC_API_KEY`	classifier, default, code, deep-research
Anthropic (OAuth)	`CLAUDE_CODE_OAUTH_TOKEN` or `~/.claude/.credentials.json`	classifier, default, code, deep-research
OpenAI (API)	`OPENAI_API_KEY`	classifier, default, code, deep-research
OpenAI (Codex CLI)	`CODEX_ACCESS_TOKEN` + `CODEX_ACCOUNT_ID` or `~/.codex/auth.json`	code
Z.AI (GLM)	`ZAI_API_KEY` (+ optional `ZAI_BASE_URL`)	classifier, default, code, deep-research

Image generation is Google-only for now. The creative sub-agent disables itself if no Google credentials are present. Claude OAuth takes priority over ANTHROPIC_API_KEY when both are present.

Tiers

Agents ask for a tier, not a specific model. That's how you swap providers without touching agent code.

The priority order per tier lives in apps/agents/src/nexus/models/registry.ts. Tweak it if you want a different default.

Runtime Model Overrides

The settings gear in the top-right of the UI opens a panel listing every model the server detected (via /api/models) and lets you override the model per role: orchestrator, router, research, code, creative. Overrides are session-scoped — a reload resets to defaults.

Architecture

Three processes, talking only over HTTP.

AIO Sandbox (Docker :8080) <--> LangGraph dev server (:2024) <--> Next.js (:3000)

AIO Sandbox — one Docker container shared by all agents: shell, browser, filesystem, Jupyter. Workspace root is /home/gem/workspace/.
LangGraph server — hosts the meta-router, orchestrator, and sub-agents. The orchestrator is a DeepAgent with a CompositeBackend that routes /memories/ and /skills/ to SQLite (via Drizzle) and everything else to the sandbox.
Next.js frontend — streams subagent messages, todos, and tool calls via useStream from @langchain/react. The execution view renders a todo panel, agent status, live subagent cards, a workspace outputs panel, and dedicated artifact renderers for filesystem ops, code execution, and image generation.

Full design spec: docs/superpowers/specs/2026-04-10-nexus-design.md.

Project Layout

nexus/
├── apps/
│   ├── agents/                        # LangGraph server (Node 20, DeepAgents)
│   │   └── src/nexus/
│   │       ├── graph.ts               # Meta-router + orchestrator wiring
│   │       ├── models/                # Tier-based provider registry
│   │       ├── agents/                # Sub-agent definitions
│   │       │   ├── research/
│   │       │   ├── code/
│   │       │   ├── creative/
│   │       │   └── general-purpose/
│   │       ├── tools/                 # LangChain tool wrappers
│   │       │   ├── search/
│   │       │   ├── extract/
│   │       │   ├── map/
│   │       │   ├── generate-image/
│   │       │   ├── browser-*/
│   │       │   ├── code-*/
│   │       │   ├── nodejs-*/
│   │       │   ├── jupyter-*/
│   │       │   └── util-convert-to-markdown/
│   │       ├── skills/                # Orchestrator skills (SKILL.md + templates)
│   │       │   ├── deep-research/
│   │       │   ├── build-app/
│   │       │   ├── generate-image/
│   │       │   ├── data-analysis/
│   │       │   └── write-report/
│   │       ├── backend/               # AIO Sandbox + Composite + Store
│   │       ├── middleware/            # Per-role model swap, runtime instructions
│   │       └── db/                    # SQLite schema (Drizzle ORM)
│   │
│   └── web/                           # Next.js 16 / React 19 frontend
│       └── src/
│           ├── app/
│           │   ├── page.tsx           # Landing <-> execution switch
│           │   └── demo/page.tsx      # Mocked demo (Vercel-deployable)
│           ├── components/
│           │   ├── execution/         # Todo panel, agent cards, prompt bar,
│           │   │                      # workspace outputs, artifact renderers
│           │   ├── landing/           # Logo, tagline, prompt input
│           │   └── settings/          # Runtime model override panel
│           ├── hooks/                 # useNexusStream, etc.
│           └── providers/             # LangGraph client + Stream provider
│
└── docs/                              # Design specs and plans

Commands

Command	What it does
`npm run dev`	Start both servers (LangGraph :2024 + Next.js :3000)
`npm run build`	Build all workspaces via Turbo
`npm run lint`	Lint everything
`npm run lint:fix`	Lint with auto-fix
`npm run format`	Prettier format
`cd apps/agents && npm test`	Agent unit tests (no credentials needed)

Troubleshooting

Problem	Fix
`No provider can satisfy the 'default' tier`	No provider env vars detected. Set at least one of `GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `ZAI_API_KEY`, Vertex ADC, Claude OAuth, or Codex CLI credentials.
Creative sub-agent disabled	Image generation needs Google. Add a Google credential.
Vertex AI auth errors	Re-run `gcloud auth application-default login` and check `GOOGLE_CLOUD_PROJECT`.
Z.AI returns 404 / model-not-found	You're on the GLM Coding Plan. Set `ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4`.
"Cannot reach LangGraph server"	`npm run dev` isn't running, or it crashed during preflight. Check the terminal.
"AIO Sandbox unreachable"	Start the Docker container (step 3 above).
"TAVILY_API_KEY is not set"	Fill in `.env` and restart.
Claude OAuth not detected	Check that `CLAUDE_CODE_OAUTH_TOKEN` is set or `~/.claude/.credentials.json` exists. Token may have expired — re-export from Claude Code.
Codex CLI not detected	Ensure both `CODEX_ACCESS_TOKEN` and `CODEX_ACCOUNT_ID` are set, or run `codex` to populate `~/.codex/auth.json`.

Roadmap

MVP is done. What's next is less about shipping features and more about making the thing feel good to use. Full descriptions in ROADMAP.md.

Now

docker compose up for the whole stack
Cost and token meter per run
Async / resumable runs (survive page reloads)

Next

Interruptible agents with a redirect input
"Why did you do that" inspector on every tool call
Editable AGENTS.md for project-level instructions
Critic sub-agent that reviews drafts before synthesis
LangSmith trace integration in the UI
Context caching across providers

Later

Nexus exposes itself as an MCP server
Import skills from a Git URL

Security Notice

Nexus is designed to run in a local trusted environment — your laptop, accessible only via 127.0.0.1. If you expose it to a LAN, public cloud, or the internet without strict security measures, you risk:

Unauthorized execution — the sandbox runs shell commands, writes files, and browses the web. An unauthenticated endpoint becomes an open RCE vector.
Data exposure — agent conversations, workspace files, and API keys could be accessed by anyone who can reach the ports.

Recommendations:

Keep Nexus behind localhost. If you need remote access, put it behind an authenticated reverse proxy.
Never expose the AIO Sandbox port (:8080) to untrusted networks.
Treat .env as secrets — it contains API keys.
Review the AIO Sandbox's --security-opt seccomp=unconfined flag and tighten it for production use.

Contributing

Contributions are welcome. Nexus is a solo project right now, but if you want to help:

Fork the repo and create a feature branch.
Follow existing patterns — read CLAUDE.md and .claude/rules/ for conventions.
Run npm run lint and cd apps/agents && npm test before opening a PR.
Keep PRs focused — one feature or fix per PR.

If you're not sure where to start, check the Roadmap for ideas or open an issue to discuss.

License

MIT. See LICENSE.

Acknowledgments

Inspired by Perplexity Computer and ByteDance's deer-flow.

Built on:

DeepAgents — orchestrator and sub-agent framework
LangGraph — agent runtime and streaming
LangChain — LLM abstractions and tool definitions
AIO Sandbox — isolated execution environment
Tavily — web search, extract, and map APIs

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
.agents/skills		.agents/skills
.claude		.claude
.kb		.kb
apps		apps
data		data
docs		docs
public		public
.env.example		.env.example
.gitguardian.yaml		.gitguardian.yaml
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
handoff.md		handoff.md
langgraph.json		langgraph.json
package-lock.json		package-lock.json
package.json		package.json
skills-lock.json		skills-lock.json
tsconfig.json		tsconfig.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

What is Nexus

Why I Built This

Live Demo

Core Features

Skills & Tools

Sub-Agents

Sandbox & File System

Provider-Agnostic Models

Bring Your Own Subscription

Quick Start

Prerequisites

Configuration

Running the Application

Providers

Tiers

Runtime Model Overrides

Architecture

Project Layout

Commands

Troubleshooting

Roadmap

Security Notice

Contributing

License

Acknowledgments

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

What is Nexus

Why I Built This

Live Demo

Core Features

Skills & Tools

Sub-Agents

Sandbox & File System

Provider-Agnostic Models

Bring Your Own Subscription

Quick Start

Prerequisites

Configuration

Running the Application

Providers

Tiers

Runtime Model Overrides

Architecture

Project Layout

Commands

Troubleshooting

Roadmap

Security Notice

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages