- Introduction
- Key Features
- Environment Setup Guide
- Configuration
- Usage Guide
- RAG Capabilities
- Architecture & Deep Dive
- Testing
This tool is designed to externalize your Perplexity.ai conversation history into organized, semantically searchable Markdown files. It facilitates the emergence of a personal knowledge base powered by local AI, bridging the gap between ephemeral inquiry and structured knowledge.
- Parallelized Extraction: Leverages Playwright to extract multiple conversation threads simultaneously for high-velocity data retrieval.
- Architectural Resilience: Automatically restores browser contexts and retries operations, ensuring continuity amidst environmental instability.
- Advanced RAG (Retrieval-Augmented Generation): Engage in a cognitive dialogue with your history. The system employs intent analysis to synthesize broad summaries or pinpoint specific technical insights.
- Semantic Vector Search: Move beyond keyword matching. Locate information based on conceptual depth and semantic relevance.
- Persistent State Tracking: Frequent checkpoints allow the system to resume progress after any interruption.
- Interactive Synthesis (REPL): A streamlined command-line interface for human-system synergy.
If you are new to development or don't have the necessary tools installed, follow these steps to set up your environment.
We recommend using a version manager to install Node.js. This allows you to easily switch versions and avoids permission issues.
- Windows:
- Download and run the latest installer from nvm-windows.
- Open a new Command Prompt or PowerShell and run:
nvm install 20 nvm use 20
- macOS / Linux:
- Install
nvmby following the instructions at nvm.sh. - Run:
nvm install 20 nvm use 20
- Install
- Download and install Ollama from ollama.ai.
- Open your terminal and pull the required models:
ollama pull nomic-embed-text ollama pull deepseek-r1
If you don't have the git command installed, you can simply download this project as a ZIP file from GitHub and extract it.
Once extracted, open your terminal in the project folder and run:
npm install
npx playwright install chromiumEstablish your environment by duplicating the template:
cp .env.example .env- OLLAMA_URL: Access point for your local AI engine (default: http://localhost:11434).
- OLLAMA_MODEL: Cognitive model for RAG synthesis (e.g., deepseek-r1).
- OLLAMA_EMBED_MODEL: Model for generating vector representations (e.g., nomic-embed-text).
- ENABLE_VECTOR_SEARCH: Set to
trueto activate semantic and RAG layers.
Launch the system:
# Start the development environment
npm run dev- Start scraper (Library): Initiates extraction. Authenticate manually if required.
- Search conversations: Interface with your history using various modes:
- Auto: Heuristic selection between semantic and exact search.
- Semantic: Fuzzy matching via high-dimensional vector space.
- RAG: Direct inquiry—e.g., "What did I learn about emergent intelligence?"
- Exact: Rapid string matching via ripgrep (bundled).
- Build vector index: Processes Markdown exports into a local vector store.
- Reset all data: Purges checkpoints, authentication data, and the vector index.
The RAG modality is engineered for various levels of cognitive inquiry:
- Broad Synthesis: "Summarize all threads regarding distributed systems."
- Granular Retrieval: "Locate the specific TypeScript pattern I used for the worker pool."
- Cross-Thread Integration: "How has my conceptual understanding of React hooks shifted?"
For a detailed look at our RAG implementation, hybrid search strategy, and theoretical foundations, please refer to:
👉 ARCH.md
- src/ai/: Ollama interaction and advanced RAG orchestration layers.
- src/scraper/: Playwright-based extraction logic and parallel worker pool management.
- src/search/: Vector storage (Vectra) and ripgrep search implementation.
- src/repl/: Interactive CLI components.
- src/utils/: Shared utility functions for data chunking and logging.
We prioritize a "Testing Trophy" architecture, emphasizing integration tests.
# Execute unit-level verifications
npm run test:unit
# Execute integration-level verifications
npm run test:integration