Skip to content

AbaoFromCUG/pipecat-dashscope

Repository files navigation

pipecat-dashscope

中文文档

pipecat-dashscope provides native DashScope service integrations for Pipecat. This package uses DashScope native SDK integrations and does not rely on OpenAI-compatible endpoints.

Why Native DashScope APIs

  • Pipecat pipelines are latency-sensitive and depend on realtime streaming/event semantics.
  • Chat Completions and Responses APIs are typically not sufficient for low-latency turn handling in voice agents.
  • Native DashScope SDK integrations keep behavior aligned with DashScope protocol families (Generation, MultiModalConversation, tts_v2, and qwen_tts_realtime).
  • Use examples/realtime_api_check.py to verify that your endpoint supports the Realtime API before running voice pipelines.

Features

  • Native DashScope Generation LLM integration (DashScopeGenerationLLMService)
  • Native DashScope MultiModalConversation LLM integration (DashScopeMultiModalLLMService)
  • Native DashScope ASR integration for segmented STT
  • Native DashScope tts_v2 TTS integration
  • Native DashScope qwen_tts_realtime integration
  • Native DashScope MultiModalConversation TTS integration
  • Runtime-updatable Pipecat service settings
  • Compatible with Pipecat LLMContext and pipeline processors
  • Environment-variable based configuration for DashScope credentials

Installation

uv add pipecat-dashscope

Usage

The recommended usage is the end-to-end voice bot in examples/bot.py, which wires this pipeline:

transport.input() -> DashScopeSTTService -> user_aggregator -> DashScope LLM -> DashScope TTS -> transport.output() -> assistant_aggregator

Set your API key and run a preset:

export DASHSCOPE_API_KEY="your_api_key"
uv run --dev examples/bot.py --preset default

Available presets in examples/bot.py:

  • default: STT fun-asr-flash-8k-realtime, LLM generation/qwen3-max, TTS v2/cosyvoice-v3-flash, voice longanyang
  • fast: STT fun-asr-flash-8k-realtime, LLM generation/qwen-plus, TTS v2/cosyvoice-v2, voice longxiaochun_v2
  • quality: STT fun-asr-flash-8k-realtime, LLM generation/qwen3-max, TTS multimodal/qwen-tts, voice Cherry
  • realtime: STT fun-asr-flash-8k-realtime, LLM multimodal/qwen3.6-flash-2026-04-16, TTS qwen-realtime/qwen-tts-realtime, voice Cherry

Override any preset setting with CLI options:

uv run --dev examples/bot.py \
  --preset realtime \
  --llm-service multimodal \
  --llm-model qwen3.6-flash-2026-04-16 \
  --tts-service qwen-realtime \
  --tts-model qwen-tts-realtime \
  --tts-voice Cherry

Supported override flags:

  • --stt-model
  • --llm-model
  • --llm-service (generation, multimodal)
  • --tts-service (v2, qwen-realtime, multimodal)
  • --tts-model
  • --tts-voice

Configuration

  • DASHSCOPE_API_KEY: required if api_key= is not passed for any service
  • DASHSCOPE_BASE_URL: optional override for both DashScopeGenerationLLMService and DashScopeMultiModalLLMService

Default LLM API base URL:

https://dashscope.aliyuncs.com/api/v1

Notes:

  • DashScopeGenerationLLMService uses DashScope native async AioGeneration.
  • DashScopeMultiModalLLMService uses DashScope native async AioMultiModalConversation.
  • DashScopeSTTService is a segmented STT service and expects VAD in the Pipecat pipeline.
  • DashScopeTTSV2Service uses dashscope.audio.tts_v2.SpeechSynthesizer.
  • DashScopeQwenRealtimeTTSService uses dashscope.audio.qwen_tts_realtime.
  • DashScopeMultiModalTTSService uses dashscope.MultiModalConversation with TTS-capable Qwen models.
  • All DashScope TTS services require explicit model and voice values (no built-in runtime defaults).
  • Keep these three TTS API families separate when extending the package; do not merge them into a single service unless DashScope unifies the underlying protocol.

Example: examples/bot.py

examples/bot.py is an end-to-end Pipecat voice-agent demo that wires:

  • DashScopeSTTService (speech to text)
  • DashScopeGenerationLLMService or DashScopeMultiModalLLMService
  • DashScopeTTSV2Service, DashScopeQwenRealtimeTTSService, or DashScopeMultiModalTTSService

The script provides preset pipeline profiles (default, fast, quality, realtime) and supports overriding STT/LLM/TTS model, service family, and voice via CLI options.

Run it from this package directory:

uv run --dev examples/bot.py --preset quality

The example always uses SmallWebRTC transport and forwards other Pipecat runner options as needed.

Requirements:

  • Set DASHSCOPE_API_KEY in your environment.
  • Ensure pipecat-ai runner extras are installed (the package dev dependency group includes them).

Testing

  • Prefer unit tests around request shaping, settings translation, and audio payload decoding.
  • Avoid live DashScope network tests in the default test path.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages