Build an Emergency Helpline Voice Agent with LangChain
Learn how to build a real-time AI voice agent for emergency helplines using LangChain, AssemblyAI, and OpenAI. The agent listens to caller distress, triages the situation, dispatches emergency services, and keeps the caller calm—all without typing or menus.
-->
Build a Real-Time AI Emergency Voice Agent with LangChai
India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder
d
:
h
:
m
:
s
Career
GenAI
Prompt Engg
ChatGPT
LLM
Langchain
RAG
AI Agents
Machine Learning
Deep Learning
GenAI Tools
LLMOps
Python
NLP
SQL
AIML Projects
Reading list
How to Become a Data Analyst in 2025: A Complete RoadMap
A Comprehensive Learning Path to Tableau in 2025
A Comprehensive NLP Learning Path 2025
Learning Path to Become a Data Scientist in 2025
Step-by-Step Roadmap to Become a Data Engineer in 2025
A Comprehensive MLOps Learning Path: 2025 Edition
Roadmap to Become an AI Engineer in 2025
A Comprehensive Learning Path to Master Computer Vision in 2025
Best Roadmap to Learn Generative AI in 2025
GenAI Roadmap for Enterprises
Large Language Models Demystified: A Beginner’s Roadmap
Learning Path to Become a Prompt Engineering Specialist
Build an Emergency Helpline Voice Agent with LangChain
Riya Bansal Last Updated : 08 Jun, 2026
7 min read
We have all been in an emergency where every second matters. Someone’s life is at risk but there you’re panicking. Now, imagine in this situation of distress when a helpline asks you to press numbers on your keypad to connect with the right agent? Pure chaos, right? Here, we just need someone to listen and act immediately instead of passing it on and that too without dropping the call.
In this blog, we’ll be solving this huge challenge by building our very own AI Emergency Helpline voice agent. The agent listens to a caller’s spoken distress, triages the situation, dispatches the right emergency service, and keeps the caller calm, all in real-time, all-over voice.
No typing. No menus. Just talk.
Table of contents
Why an Emergency Helpline?
How the Pipeline Works?
Getting Started with the Voice Agent
Stage 1: Speech-to-Text with AssemblyAI
Stage 2: The Emergency Triage Agent
Stage 3: Text-to-Speech with OpenAI TTS
Wiring the Full Pipeline
Testing the Voice Agent
Conclusion
Why an Emergency Helpline?
Perhaps the most common examples of voice assistants in use today are food ordering or music streaming. These “functional” use cases are relatively harmless from a perspective of user experience, but easily forgettable. On the other hand, the use case of an emergency helpline is entirely different.
For this use case, latency is a critical factor, the tone of the voice assistant can affect who receives help first, and you cannot use an alternative method to dispatch an emergency vehicle (ambulance). As such, every design decision made within this pipeline has a potential to cause real consequences, making this design the most valuable use case to gain experience from.
How the Pipeline Works?
The Sandwich Model of Architecture comprises 3 independent components, and each one is designed to work concurrently. Each one will begin processing independently and at the same time as the one before it finishes its processing stage, i.e.:
while speaking, transcribing will begin during the middle of the speaker’s sentence,
the reasoning agent will begin reasoning on the previous responses while the speaker finishes their sentence,
text-to-speech will begin synthesizing responses to that speaker’s sentence while the reasoning agent continues reasoning.
If everything is implemented correctly, the entire process will be completed in less than ten seconds. In a timed execution scenario, this would allow the audio to be continuously streamed, providing no interruptions in audio delivery.
Getting Started with the Voice Agent
You’ll need API keys for AssemblyAI (real-time STT) and OpenAI (both the agent brain and TTS). You can easily consolidate your APIs into one provider and one job by using OpenAI TTS.
Here are the command lines needed to install the required libraries:
!pip install langchain langgraph assemblyai websockets fastapi uvicorn openai
Instructions for setting environment variables:
export ASSEMBLYAI_API_KEY="your_key" export OPENAI_API_KEY="your_key" export LANGSMITH_TRACING="true" export LANGSMITH_API_KEY="your_key"
You should enable Langsmith to ensure that every conversation between your agent and a customer can be considered an audit as well as that it can be utilized as a potential support ticket. Auditing provides for compliance and debugging by providing documentation regarding what your agent said when.
Stage 1: Speech-to-Text with AssemblyAI
At the STT stage, we transcribe the voice of the caller live. As such, we will use the WebSocket API from AssemblyAI following a producer-consumer model, where audio chunks go inside and transcripts go out, respectively, at the same time.
from typing import AsyncIterator import asyncio import contextlib
async def stt_stream( audio_stream: AsyncIterator[bytes], ) -> AsyncIterator[VoiceAgentEvent]: stt = AssemblyAISTT(sample_rate=16000)
async def send_audio(): try: async for chunk in audio_stream: await stt.send_audio(chunk) finally: await stt.close()
send_task = asyncio.create_task(send_audio())
try: async for event in stt.receive_events(): yield event finally: send_task.cancel()
with contextlib.suppress(asyncio.CancelledError): await send_task
await stt.close()
The two key event types are STT Chunk and STT Output. STT Chunk contains partial transcripts generated while the caller is speaking, allowing a human supervisor to monitor the conversation in real time. STT Output is the final punctuated transcript used by the agent to trigger actions.
When using AssemblyAI for a helpline, the content safety detection flag should be enabled. It provides early warnings of distress signals through transcript metadata before the agent processes the text, giving the agent more time to determine an appropriate response.
Stage 2: The Emergency Triage Agent
The second stage of aiding a caller will be through an Emergency Triage Agent. This is where the agent analyzes the transcript received from a caller, evaluates whether assistance is needed, determines which tool should be used, and interacts with the caller in a calm manner.
The agent has four tools available to perform these tasks: location lookup, emergency dispatch, escalation to a live operator and deescalation of non-life-threatening distress to reduce emotional discomfort.
from uuid import uuid4
from langchain.agents import create_agent from langchain.messages import HumanMessage from langgraph.checkpoint.memory import InMemorySaver
Active call registry
active_calls = {}
def get_caller_location(caller_id: str) -> str: """Look up the caller's registered address or last known GPS location.""" locations = { "caller_001": "12 MG Road, Bengaluru, Karnataka 560001", "caller_002": "45 Park Street, Kolkata, West Bengal 700016", }
return locations.get( caller_id, "Location not found. Ask caller to confirm address.", )
def dispatch_emergency(service: str, location: str, severity: str) -> str: """Dispatch police, ambulance, or fire services to a location.""" valid_services = ["ambulance", "police", "fire"]
if service.lower() not in valid_services: return f"Unknown service: {service}. Use ambulance, police, or fire."
return ( f"{service.capitalize()} dispatched to {location}. " f"Severity: {severity}. ETA: 8-12 minutes. " f"Reference: EM-{uuid4().hex[:6].upper()}" )
def escalate_to_human(caller_id: str, reason: str) -> str: """Escalate the call to a human operator when the situation exceeds AI capability.""" active_calls[caller_id] = { "status": "escalated", "reason": reason, }
return ( f"Escalating call {caller_id} to human operator. " f"Reason: {reason}. Hold time: under 2 minutes." )
def calming_protocol(situation: str) -> str: """Return guided breathing or grounding instructions for distressed callers.""" return ( "I hear you. You are safe right now. " "Take a slow breath in for 4 counts, hold for 4, out for 4. " "I am here with you." )
agent = create_agent( model="openai:gpt-4o-mini", tools=[ get_caller_location, dispatch_emergency, escalate_to_human, calming_protocol, ], system_prompt="""You are ARIA, an AI emergency response assistant for a 24/7 helpline.
Your job is to stay calm, assess the situation quickly, and take the right action.
Rules you must always follow:
- Always acknowledge the caller's distress before asking questions.
- Ask only one question at a time. Never overwhelm a panicking caller.
- If someone mentions chest pain, difficulty breathing, or unconsciousness — dispatch ambulance immediately.
- If someone mentions violence, threats, or break-in — dispatch police immediately.
- If the situation is unclear or emotional crisis — use calming protocol first.
- Escalate to a human operator if the caller is unresponsive or the situation is ambiguous.
- Keep every response under 3 sentences. Short and clear saves lives.
- Do NOT use emojis, asterisks, bullet points, or markdown. You are speaking aloud.""",
checkpointer=InMemorySaver(), )
The InMemorySaver checkpointer plays a crucial role here as it allows ARIA to remember the entire call history, including:
what was said by the caller three calls ago,
what has already been sent to the caller,
whether the caller verified their own location, etc.
If there were no memory, then every response would begin from a blank state, which can be very problematic in an urgent situation.
Next, consider the streaming agent function.
async def agent_stream( event_stream: AsyncIterator[VoiceAgentEvent], ) -> AsyncIterator[VoiceAgentEvent]: thread_id = str(uuid4()) # Unique per call session
async for event in event_stream: yield event
if event.type == "stt_output": stream = agent.astream( {"messages": [HumanMessage(content=event.transcript)]}, {"configurable": {"thread_id": thread_id}}, stream_mode="messages", )
async for message, _ in stream: if message.text: yield AgentChunkEvent.create(message.text)
stream_mode="messages" sends tokens to TTS as they are produced. ARIA’s first words have started to be spoken before she has completed her reasoning process. This is what creates a 400-millisecond response vs. a 2-second response!
Stage 3: Text-to-Speech with OpenAI TTS
OpenAI TTS is the natural choice, you are already using an OpenAI API key for your agent, thus making one API call, one SDK, and no extra accounts. The tts-1 model was built for real-time/streamed text-to-speech rendering. The shimmer voice is very calm, clear, and rational; all appropriate tones for a helpline.
from utils import merge_async_iters from openai import AsyncOpenAI
client = AsyncOpenAI()
async def tts_stream( event_stream: AsyncIterator[VoiceAgentEvent], ) -> AsyncIterator[VoiceAgentEvent]: text_buffer = []
async def process_upstream() -> AsyncIterator[VoiceAgentEvent]: async for event in event_stream: yield event
if event.type == "agent_chunk": text_buffer.append(event.text)
async def synthesize_audio() -> AsyncIterator[VoiceAgentEvent]: full_text = "".join(text_buffer)
if not full_text.strip(): return
async with client.audio.speech.with_streaming_response.create( model="tts-1", voice="shimmer", # Calm, composed — right for emergencies input=full_text, response_format="pcm", # Raw PCM for lowest latency playback ) as response: async for chunk in response.iter_bytes(chunk_size=4096): yield TTSChunkEvent.create(chunk)
async for event in merge_async_iters( process_upstream(), synthesize_audio(), ): yield event
Tts-1 begins streaming audio chunks as soon as the initial sentence has been synthesized rather than waiting until the entire sentence has been created. You can use response_format='pcm' to skip the overhead of a container and stream audio directly into the websocket byte stream. With a tts-1-hd this means that while the quality is increased, there will be approximately a 200ms increase in latency compared to using tts-1. To get the best performance for an emergency helpline, it is advisable to use the tts-1 voice option.
The
[truncated for AI cost control]