GenTutor

Built for the Gemini 3 Hackathon

GenTutor turns any learning material into an animated video lesson and lets you have a conversation with an AI tutor while you watch.

Demonstration video: YouTube bilibili

What it does

You describe what you want to learn — type a prompt or upload documents (PDF, MD, TXT, DOCX).
A multi-agent pipeline generates a video — the backend coordinates 8 specialized agents that ingest content, write a script, synthesize narration, plan visuals, generate Manim animations, and merge everything into a final video with subtitles.
You watch and interact — pause the video at any point, highlight a region on screen, and ask questions. The AI tutor sees the current frame, subtitles, and scene context to give relevant answers. You can also start a live voice session (via Gemini Live API) to talk with the tutor hands-free.

Pipeline

IngestAgent → ScriptAgent → NarrationAgent → TTSAgent
  → VisualPlannerAgent → MotionDesignerAgent → MergeAgent → FrontEndAgent

Each agent reads the previous agent's output and writes structured JSON. The pipeline is orchestrated by a coordinator and runs asynchronously via Celery + Redis.

Tech Stack

Layer	Technologies
AI Model	Gemini 3 Pro/Flash
Frontend	React, TypeScript, Vite, React Markdown, KaTeX
Backend	FastAPI, Celery, Redis, SQLAlchemy, FastAPI Users
Video	Manim (animation), Google Cloud TTS (narration), FFmpeg (merge)
AI Orchestration	LangChain, LangGraph, custom Manim Agent
Real-time	Gemini Live API (voice), WebSocket
Storage	Google Cloud Storage, SQLite

Project Structure

genTutorFrontEnd/          # React SPA
  pages/
    Home.tsx               # Project creation, file upload
    Learn.tsx              # Video player, annotations, AI chat, live voice
  services/
    geminiService.ts       # SSE streaming chat with backend
    liveService.ts         # WebSocket voice session (Gemini Live API)

genTutorBackEnd/           # Python backend
  src/insight_stream/
    api/                   # FastAPI routes, auth, WebSocket live proxy
    pipeline/
      coordinator.py       # Orchestrates the 8-agent pipeline
      agents/              # Ingest, Script, Narration, TTS, VisualPlanner,
                           # MotionDesigner, Merge, FrontEnd
  packages/
    manim-agent/           # LangGraph agent that writes & validates Manim code
    manim-layouts/         # Layout templates (Grid, TwoColumn, 3D, etc.)
    tts-manim-time/        # TTS timing synchronization

Getting Started

Prerequisites

Node.js 18+
Python 3.12+
Redis
GEMINI_API_KEY environment variable

Frontend

cd genTutorFrontEnd
npm install
npm run dev

Backend

cd genTutorBackEnd
uv sync
# Start API server
uvicorn insight_stream.api.app:app --reload
# Start Celery worker (separate terminal)
celery -A insight_stream.api.worker worker --loglevel=info

Acknowledgments

Built with the Google Gemini API for the Gemini 3 Hackathon.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
genTutorBackEnd		genTutorBackEnd
genTutorFrontEnd		genTutorFrontEnd
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE.manim-community		LICENSE.manim-community
LICENSE.manimgl		LICENSE.manimgl
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenTutor

What it does

Pipeline

Tech Stack

Project Structure

Getting Started

Prerequisites

Frontend

Backend

Acknowledgments

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GenTutor

What it does

Pipeline

Tech Stack

Project Structure

Getting Started

Prerequisites

Frontend

Backend

Acknowledgments

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages