Conversation
|
Just played around with this. Seems to be working! |
| name="AgentActivity.tts_say", | ||
| ) | ||
|
|
||
| task.add_done_callback(self._on_pipeline_reply_done) |
There was a problem hiding this comment.
🟡 _on_pipeline_reply_done callback causes duplicate state transitions for realtime say path
In _generate_reply() (agent_activity.py:1054-1066), the RealtimeModel path intentionally does NOT add the _on_pipeline_reply_done callback, because _realtime_generation_task_impl already handles state transitions internally (setting agent state to "listening", calling on_end_of_agent_speech, and _restore_interruption_by_audio_activity at agent_activity.py:2962-2972). However, in the new say() method, task.add_done_callback(self._on_pipeline_reply_done) at line 988 is applied unconditionally to both the realtime say path and the TTS path. When the realtime say path is taken, _realtime_say_task calls _realtime_generation_task → _realtime_generation_task_impl, which performs state transitions. Then when the task completes, _on_pipeline_reply_done (agent_activity.py:1938-1946) fires and calls on_end_of_agent_speech(ignore_user_transcript_until=time.time()) a second time with a later timestamp, briefly extending the window during which user transcripts are suppressed.
| task.add_done_callback(self._on_pipeline_reply_done) | |
| if self._rt_session is None or is_given(audio) or self.tts: | |
| task.add_done_callback(self._on_pipeline_reply_done) |
Was this helpful? React with 👍 or 👎 to provide feedback.
| self, | ||
| text: str | AsyncIterable[str], | ||
| *, | ||
| allow_interruptions: NotGivenOr[bool] = NOT_GIVEN, |
There was a problem hiding this comment.
should we really expose allow_interruptions to this api? we don't have it in generate_reply, and disallow allow_interruptions=False for realtime session with server side VAD.
even the the api support it in server side VAD, but how can they know the audio playout is finished in agent and re-allow interruption?
| ): | ||
| model_info = ( | ||
| "a RealtimeSession that implements say()" | ||
| if isinstance(self.llm, llm.RealtimeModel) |
There was a problem hiding this comment.
llm cannot be RealtimeModel when rt_session is None? so maybe you need to add a capability to the RealtimeModel for say.
| self._session._tool_items_added(tool_messages) | ||
|
|
||
| @utils.log_exceptions(logger=logger) | ||
| async def _realtime_say_task( |
There was a problem hiding this comment.
question: should we reuse _realtime_reply_task?
async def _realtime_reply_task(
self,
*,
speech_handle: SpeechHandle,
model_settings: ModelSettings,
user_input: str | None = None,
instructions: str | None = None,
text: str | AsyncIterable[str] | None = None,
) -> None:and we check only one of the user_input and the text can be set.
| return | ||
| except llm.RealtimeError as e: | ||
| logger.error("failed to say text: %s", str(e)) | ||
| self._session._update_agent_state("listening") |
|
I'm not sure about this, isn't Phonic just using a TTS underneath? This seems like a very specialized method for them? |
No description provided.