Skip to content

GenTutorByGemini/GenTutor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenTutor

Built for the Gemini 3 Hackathon

GenTutor turns any learning material into an animated video lesson and lets you have a conversation with an AI tutor while you watch.

Demonstration video: YouTube bilibili

What it does

  1. You describe what you want to learn — type a prompt or upload documents (PDF, MD, TXT, DOCX).
  2. A multi-agent pipeline generates a video — the backend coordinates 8 specialized agents that ingest content, write a script, synthesize narration, plan visuals, generate Manim animations, and merge everything into a final video with subtitles.
  3. You watch and interact — pause the video at any point, highlight a region on screen, and ask questions. The AI tutor sees the current frame, subtitles, and scene context to give relevant answers. You can also start a live voice session (via Gemini Live API) to talk with the tutor hands-free.

Pipeline

IngestAgent → ScriptAgent → NarrationAgent → TTSAgent
  → VisualPlannerAgent → MotionDesignerAgent → MergeAgent → FrontEndAgent

Each agent reads the previous agent's output and writes structured JSON. The pipeline is orchestrated by a coordinator and runs asynchronously via Celery + Redis.

Tech Stack

Layer Technologies
AI Model Gemini 3 Pro/Flash
Frontend React, TypeScript, Vite, React Markdown, KaTeX
Backend FastAPI, Celery, Redis, SQLAlchemy, FastAPI Users
Video Manim (animation), Google Cloud TTS (narration), FFmpeg (merge)
AI Orchestration LangChain, LangGraph, custom Manim Agent
Real-time Gemini Live API (voice), WebSocket
Storage Google Cloud Storage, SQLite

Project Structure

genTutorFrontEnd/          # React SPA
  pages/
    Home.tsx               # Project creation, file upload
    Learn.tsx              # Video player, annotations, AI chat, live voice
  services/
    geminiService.ts       # SSE streaming chat with backend
    liveService.ts         # WebSocket voice session (Gemini Live API)

genTutorBackEnd/           # Python backend
  src/insight_stream/
    api/                   # FastAPI routes, auth, WebSocket live proxy
    pipeline/
      coordinator.py       # Orchestrates the 8-agent pipeline
      agents/              # Ingest, Script, Narration, TTS, VisualPlanner,
                           # MotionDesigner, Merge, FrontEnd
  packages/
    manim-agent/           # LangGraph agent that writes & validates Manim code
    manim-layouts/         # Layout templates (Grid, TwoColumn, 3D, etc.)
    tts-manim-time/        # TTS timing synchronization

Getting Started

Prerequisites

  • Node.js 18+
  • Python 3.12+
  • Redis
  • GEMINI_API_KEY environment variable

Frontend

cd genTutorFrontEnd
npm install
npm run dev

Backend

cd genTutorBackEnd
uv sync
# Start API server
uvicorn insight_stream.api.app:app --reload
# Start Celery worker (separate terminal)
celery -A insight_stream.api.worker worker --loglevel=info

Acknowledgments

Built with the Google Gemini API for the Gemini 3 Hackathon.

About

GenTutor turns documents into interactive, code-driven explainer videos. Learners can pause, select any visual element, and ask questions in context. Built with Gemini 3 for reasoning and refinement.

Resources

License

MIT and 2 other licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE.manim-community
MIT
LICENSE.manimgl

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors