Built for the Gemini 3 Hackathon
GenTutor turns any learning material into an animated video lesson and lets you have a conversation with an AI tutor while you watch.
Demonstration video: YouTube bilibili
- You describe what you want to learn — type a prompt or upload documents (PDF, MD, TXT, DOCX).
- A multi-agent pipeline generates a video — the backend coordinates 8 specialized agents that ingest content, write a script, synthesize narration, plan visuals, generate Manim animations, and merge everything into a final video with subtitles.
- You watch and interact — pause the video at any point, highlight a region on screen, and ask questions. The AI tutor sees the current frame, subtitles, and scene context to give relevant answers. You can also start a live voice session (via Gemini Live API) to talk with the tutor hands-free.
IngestAgent → ScriptAgent → NarrationAgent → TTSAgent
→ VisualPlannerAgent → MotionDesignerAgent → MergeAgent → FrontEndAgent
Each agent reads the previous agent's output and writes structured JSON. The pipeline is orchestrated by a coordinator and runs asynchronously via Celery + Redis.
| Layer | Technologies |
|---|---|
| AI Model | Gemini 3 Pro/Flash |
| Frontend | React, TypeScript, Vite, React Markdown, KaTeX |
| Backend | FastAPI, Celery, Redis, SQLAlchemy, FastAPI Users |
| Video | Manim (animation), Google Cloud TTS (narration), FFmpeg (merge) |
| AI Orchestration | LangChain, LangGraph, custom Manim Agent |
| Real-time | Gemini Live API (voice), WebSocket |
| Storage | Google Cloud Storage, SQLite |
genTutorFrontEnd/ # React SPA
pages/
Home.tsx # Project creation, file upload
Learn.tsx # Video player, annotations, AI chat, live voice
services/
geminiService.ts # SSE streaming chat with backend
liveService.ts # WebSocket voice session (Gemini Live API)
genTutorBackEnd/ # Python backend
src/insight_stream/
api/ # FastAPI routes, auth, WebSocket live proxy
pipeline/
coordinator.py # Orchestrates the 8-agent pipeline
agents/ # Ingest, Script, Narration, TTS, VisualPlanner,
# MotionDesigner, Merge, FrontEnd
packages/
manim-agent/ # LangGraph agent that writes & validates Manim code
manim-layouts/ # Layout templates (Grid, TwoColumn, 3D, etc.)
tts-manim-time/ # TTS timing synchronization
- Node.js 18+
- Python 3.12+
- Redis
GEMINI_API_KEYenvironment variable
cd genTutorFrontEnd
npm install
npm run devcd genTutorBackEnd
uv sync
# Start API server
uvicorn insight_stream.api.app:app --reload
# Start Celery worker (separate terminal)
celery -A insight_stream.api.worker worker --loglevel=infoBuilt with the Google Gemini API for the Gemini 3 Hackathon.