DeskPilot is an accessibility-first agent that uses local OCR and multi-step planning to automate desktop tasks. It consists of a high-performance Python backend and a modern React frontend dashboard.
The project is divided into two symmetric sub-projects:
src/cua_backend/: The Python world. Managed byuv. Contains the agent core, planner, and FastAPI server.src/cua_frontend/: The React world. Managed byBun. Contains the web-based dashboard and live execution viewer.
The backend is managed by uv. All commands should be run from the src/cua_backend directory.
cd src/cua_backend
uv syncThis starts the FastAPI server which the frontend dashboard connects to.
cd src/cua_backend
python api_server.pyServer runs at: http://localhost:8000
Execute tasks directly from your command line.
cd src/cua_backend
docker exec -it deskpilot-desktop python3 /app/src/cua_backend/run.py "Open Chrome and search for lo-fi music" --model "openrouter/google/gemini-2.0-flash-001"The frontend is managed by Bun. All commands should be run from the src/cua_frontend directory.
cd src/cua_frontend
bun installcd src/cua_frontend
bun run devUI runs at: http://localhost:5173 (proxies /api to the backend)
DeskPilot uses Docker to provide a sandbox environment for the agent to interact with.
# From the project root
docker-compose -f docker/docker-compose.yml up --build -d- Browser: http://localhost:6080/vnc.html
- VNC Client:
localhost:5900
DeskPilot/
│
├── src/
│ ├── cua_frontend/ # Web-based UI (React + Bun)
│ │ ├── src/ # React source
│ │ ├── package.json
│ │ └── vite.config.js # Proxy setup for backend
│ │
│ └── cua_backend/ # Agent Logic & API (Python + uv)
│ ├── api/ # FastAPI routes & schemas
│ ├── agent/ # Core state machine & logic
│ ├── app/ # CLI entry points
│ ├── pyproject.toml # uv configuration
│ ├── run.py # CLI runner
│ └── api_server.py # Web server runner
│
├── configs/ # Shared configurations
├── docker/ # Virtual environment (X11, VNC, NoVNC)
├── docs/ # Extensive implementation plans
└── runs/ # Agent execution logs & screenshots