Skip to content

WISCLab/speech-mine

 
 

Repository files navigation

speech-mine

PyPI Docs

A Python toolkit for speaker-diarized transcription and transcript analysis. Built on WhisperX; extract word-level, forced-aligned, speaker-labeled CSVs from audio, then search, format, and chunk them.

Modules

Module Description Docs
extract Transcribe audio with speaker diarization
format Format CSV transcripts into readable scripts
chunk Split audio into segments via YAML config
search Fuzzy search transcripts by word or phrase

Installation

curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/beckettfrey/speech-mine
cd speech-mine
uv sync

See docs/installation.md for library dependency setup and HuggingFace token configuration.

Quick Start

Note

speech-mine is flexible and adapts to your use case. The commands below show a generalized example workflow. For more granular control, use the Python API directly.

# 1. (Optional) Chunk a long recording into segments
uv run speech-mine chunk recording.wav chunks.yaml chunks/

# 2. Extract a transcript
uv run speech-mine extract interview.mp3 output.csv \
  --hf-token YOUR_TOKEN \
  --num-speakers 2 \
  --compute-type float32

# 3. Format into a readable script
uv run speech-mine format output.csv script.txt

# 4. Search it
uv run speech-mine search "topic of interest" output.csv --pretty

# 5. (Optional) Chunk the recording again around segments of interest
uv run speech-mine chunk recording.wav segments.yaml clips/

MCP Server

speech-mine includes an MCP server that exposes all tools to Claude Code and other MCP clients.

Install globally (no clone needed):

claude mcp add speech-mine -- uvx --from speech-mine speech-mine-mcp

This pulls the latest published version from PyPI via uvx. After running it, restart Claude Code — the search_transcript, extract_audio, chunk_audio, and other tools will be available in your session.

If you cloned the repo: the included .mcp.json configures the server automatically when you open the project in Claude Code.

Documentation

# Serve docs locally
uv run mkdocs serve

Or browse the docs/ folder directly.

License

MIT

About

Bridges faster-whisper with pyannote/speaker-diarization-3.1 to extract contextual data from audio files locally.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%