Solving the Nova Sonic Timeout: A Guide to Finalizing Voice Bot Conversations

Problem

When a Nova Sonic voice bot uses the system microphone, it can successfully capture customer details, update the database, and send confirmation emails—but then times out before speaking the final confirmation to the user.

Logs confirm:

Tool calls and database writes succeed
Email dispatch completes
Multiple Timeout errors fire → no final spoken message

Clarifying the Issue

This isn’t a database or email bug. The problem lies in the handshake between tool use and voice output.

The tool call returns locally, but Nova Sonic never receives a proper tool_result + end-of-input signal, so it doesn’t generate the closing utterance. Meanwhile, the mic stream remains open, keeping the session “listening” instead of finalizing.

Why It Matters

If the bot doesn’t voice its closing line, users assume the entire workflow failed—even though the backend succeeded. In production voice apps, this mismatch breaks trust and leads to unnecessary support calls.

Key Terms

Tool call / tool_result: Round-trip where the model requests a tool and expects a result with the same name.
End-of-input (commit/flush): Signal that no more mic audio is coming this turn.
response.completed: Terminal event signaling the model has finished generating output.
Modalities: Declaring both ["audio","text"] so Nova Sonic returns speech and transcript.

Steps at a Glance

Close the mic after user input.
Return the tool result with the correct name and payload.
Stream until response.completed, not just the first text delta.
Request audio modality to ensure the assistant speaks.
Tune timeouts (≥60s) and don’t stop early.

Detailed Steps

1. Close mic After the user provides details, call the SDK’s commit method:

   client.input_audio_buffer.commit()

2. Send tool result Forward the outcome of DB/email operations as a tool_result event:

   client.responses.events.create({
     "type": "tool_result",
     "name": "create_or_update_customer",
     "content": [
       {"type":"json", "data":{"name": name, "email": email, "phone": phone, "status":"created"}}
     ],
     "response_id": current_response_id
   })

3. Drain events until completion Keep reading events until response.completed:

Capture output_text.delta for text
Capture response.audio.delta for speech

4. Guarantee audio output Include:

   "modalities": ["audio", "text"]

If no audio arrives, you can still synthesize the text yourself—but ideally let Nova Sonic handle it.

5. Timeout discipline Bump read timeouts to 60–90 seconds. Fail only on response.failed or disconnect, not after a few short retries.

Conclusion

This isn’t a data-layer failure—it’s a conversation lifecycle issue. By:

Closing the mic,
Returning the tool_result,
Waiting for response.completed, and
Requesting audio output,

you’ll resolve the timeout and let Nova Sonic deliver the final spoken confirmation.

Aaron Rose is a software engineer and technology writer at tech-reader.blog and the author of The Rose Theory series on math and physics.

Search This Blog

Tech-Reader.blog