Solving the Nova Sonic Timeout: A Guide to Finalizing Voice Bot Conversations
Problem
When a Nova Sonic voice bot uses the system microphone, it can successfully capture customer details, update the database, and send confirmation emails—but then times out before speaking the final confirmation to the user.
Logs confirm:
- Tool calls and database writes succeed
- Email dispatch completes
- Multiple
Timeouterrors fire → no final spoken message
Clarifying the Issue
This isn’t a database or email bug. The problem lies in the handshake between tool use and voice output.
The tool call returns locally, but Nova Sonic never receives a proper tool_result + end-of-input signal, so it doesn’t generate the closing utterance. Meanwhile, the mic stream remains open, keeping the session “listening” instead of finalizing.
Why It Matters
If the bot doesn’t voice its closing line, users assume the entire workflow failed—even though the backend succeeded. In production voice apps, this mismatch breaks trust and leads to unnecessary support calls.
Key Terms
- Tool call / tool_result: Round-trip where the model requests a tool and expects a result with the same
name. - End-of-input (commit/flush): Signal that no more mic audio is coming this turn.
response.completed: Terminal event signaling the model has finished generating output.- Modalities: Declaring both
["audio","text"]so Nova Sonic returns speech and transcript.
Steps at a Glance
- Close the mic after user input.
- Return the tool result with the correct name and payload.
- Stream until
response.completed, not just the first text delta. - Request audio modality to ensure the assistant speaks.
- Tune timeouts (≥60s) and don’t stop early.
Detailed Steps
1. Close mic After the user provides details, call the SDK’s commit method: client.input_audio_buffer.commit()
client.responses.events.create({
"type": "tool_result",
"name": "create_or_update_customer",
"content": [
{"type":"json", "data":{"name": name, "email": email, "phone": phone, "status":"created"}}
],
"response_id": current_response_id
})
response.completed:- Capture
output_text.deltafor text - Capture
response.audio.deltafor speech
"modalities": ["audio", "text"]
If no audio arrives, you can still synthesize the text yourself—but ideally let Nova Sonic handle it.
5. Timeout discipline Bump read timeouts to 60–90 seconds. Fail only onresponse.failed or disconnect, not after a few short retries.Conclusion
This isn’t a data-layer failure—it’s a conversation lifecycle issue. By:
- Closing the mic,
- Returning the tool_result,
- Waiting for
response.completed, and - Requesting audio output,
you’ll resolve the timeout and let Nova Sonic deliver the final spoken confirmation.
Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of The Rose Theory series on math and physics.

Comments
Post a Comment