AI News HubLIVE
站内改写6 min read

Build an Emergency Helpline Voice Agent with LangChain

Learn how to build a real-time AI voice agent for emergency helplines using LangChain, AssemblyAI, and OpenAI. The agent listens to caller distress, triages the situation, dispatches emergency services, and keeps the caller calm—all without typing or menus.

SourceAnalytics VidhyaAuthor: Riya Bansal

-->

Build a Real-Time AI Emergency Voice Agent with LangChai

India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

d

:

h

:

m

:

s

Career

GenAI

Prompt Engg

ChatGPT

LLM

Langchain

RAG

AI Agents

Machine Learning

Deep Learning

GenAI Tools

LLMOps

Python

NLP

SQL

AIML Projects

Reading list

How to Become a Data Analyst in 2025: A Complete RoadMap

A Comprehensive Learning Path to Tableau in 2025

A Comprehensive NLP Learning Path 2025

Learning Path to Become a Data Scientist in 2025

Step-by-Step Roadmap to Become a Data Engineer in 2025

A Comprehensive MLOps Learning Path: 2025 Edition

Roadmap to Become an AI Engineer in 2025

A Comprehensive Learning Path to Master Computer Vision in 2025

Best Roadmap to Learn Generative AI in 2025

GenAI Roadmap for Enterprises

Large Language Models Demystified: A Beginner’s Roadmap

Learning Path to Become a Prompt Engineering Specialist

Build an Emergency Helpline Voice Agent with LangChain

Riya Bansal Last Updated : 08 Jun, 2026

7 min read

We have all been in an emergency where every second matters. Someone’s life is at risk but there you’re panicking. Now, imagine in this situation of distress when a helpline asks you to press numbers on your keypad to connect with the right agent? Pure chaos, right? Here, we just need someone to listen and act immediately instead of passing it on and that too without dropping the call.

In this blog, we’ll be solving this huge challenge by building our very own AI Emergency Helpline voice agent. The agent listens to a caller’s spoken distress, triages the situation, dispatches the right emergency service, and keeps the caller calm, all in real-time, all-over voice.

No typing. No menus. Just talk.

Table of contents

Why an Emergency Helpline?

How the Pipeline Works?

Getting Started with the Voice Agent

Stage 1: Speech-to-Text with AssemblyAI

Stage 2: The Emergency Triage Agent

Stage 3: Text-to-Speech with OpenAI TTS

Wiring the Full Pipeline

Testing the Voice Agent

Conclusion

Why an Emergency Helpline?

Perhaps the most common examples of voice assistants in use today are food ordering or music streaming. These “functional” use cases are relatively harmless from a perspective of user experience, but easily forgettable. On the other hand, the use case of an emergency helpline is entirely different.

For this use case, latency is a critical factor, the tone of the voice assistant can affect who receives help first, and you cannot use an alternative method to dispatch an emergency vehicle (ambulance). As such, every design decision made within this pipeline has a potential to cause real consequences, making this design the most valuable use case to gain experience from.

How the Pipeline Works?

The Sandwich Model of Architecture comprises 3 independent components, and each one is designed to work concurrently. Each one will begin processing independently and at the same time as the one before it finishes its processing stage, i.e.:

while speaking, transcribing will begin during the middle of the speaker’s sentence,

the reasoning agent will begin reasoning on the previous responses while the speaker finishes their sentence,

text-to-speech will begin synthesizing responses to that speaker’s sentence while the reasoning agent continues reasoning.

If everything is implemented correctly, the entire process will be completed in less than ten seconds. In a timed execution scenario, this would allow the audio to be continuously streamed, providing no interruptions in audio delivery.

Getting Started with the Voice Agent

You’ll need API keys for AssemblyAI (real-time STT) and OpenAI (both the agent brain and TTS). You can easily consolidate your APIs into one provider and one job by using OpenAI TTS.

Here are the command lines needed to install the required libraries:

!pip install langchain langgraph assemblyai websockets fastapi uvicorn openai

Instructions for setting environment variables:

export ASSEMBLYAI_API_KEY="your_key" export OPENAI_API_KEY="your_key" export LANGSMITH_TRACING="true" export LANGSMITH_API_KEY="your_key"

You should enable Langsmith to ensure that every conversation between your agent and a customer can be considered an audit as well as that it can be utilized as a potential support ticket. Auditing provides for compliance and debugging by providing documentation regarding what your agent said when.

Stage 1: Speech-to-Text with AssemblyAI

At the STT stage, we transcribe the voice of the caller live. As such, we will use the WebSocket API from AssemblyAI following a producer-consumer model, where audio chunks go inside and transcripts go out, respectively, at the same time.

from typing import AsyncIterator import asyncio import contextlib

async def stt_stream( audio_stream: AsyncIterator[bytes], ) -> AsyncIterator[VoiceAgentEvent]: stt = AssemblyAISTT(sample_rate=16000)

async def send_audio(): try: async for chunk in audio_stream: await stt.send_audio(chunk) finally: await stt.close()

send_task = asyncio.create_task(send_audio())

try: async for event in stt.receive_events(): yield event finally: send_task.cancel()

with contextlib.suppress(asyncio.CancelledError): await send_task

await stt.close()

The two key event types are STT Chunk and STT Output. STT Chunk contains partial transcripts generated while the caller is speaking, allowing a human supervisor to monitor the conversation in real time. STT Output is the final punctuated transcript used by the agent to trigger actions.

When using AssemblyAI for a helpline, the content safety detection flag should be enabled. It provides early warnings of distress signals through transcript metadata before the agent processes the text, giving the agent more time to determine an appropriate response.

Stage 2: The Emergency Triage Agent

The second stage of aiding a caller will be through an Emergency Triage Agent. This is where the agent analyzes the transcript received from a caller, evaluates whether assistance is needed, determines which tool should be used, and interacts with the caller in a calm manner.

The agent has four tools available to perform these tasks: location lookup, emergency dispatch, escalation to a live operator and deescalation of non-life-threatening distress to reduce emotional discomfort.

from uuid import uuid4

from langchain.agents import create_agent from langchain.messages import HumanMessage from langgraph.checkpoint.memory import InMemorySaver

Active call registry

active_calls = {}

def get_caller_location(caller_id: str) -> str: """Look up the caller's registered address or last known GPS location.""" locations = { "caller_001": "12 MG Road, Bengaluru, Karnataka 560001", "caller_002": "45 Park Street, Kolkata, West Bengal 700016", }

return locations.get( caller_id, "Location not found. Ask caller to confirm address.", )

def dispatch_emergency(service: str, location: str, severity: str) -> str: """Dispatch police, ambulance, or fire services to a location.""" valid_services = ["ambulance", "police", "fire"]

if service.lower() not in valid_services: return f"Unknown service: {service}. Use ambulance, police, or fire."

return ( f"{service.capitalize()} dispatched to {location}. " f"Severity: {severity}. ETA: 8-12 minutes. " f"Reference: EM-{uuid4().hex[:6].upper()}" )

def escalate_to_human(caller_id: str, reason: str) -> str: """Escalate the call to a human operator when the situation exceeds AI capability.""" active_calls[caller_id] = { "status": "escalated", "reason": reason, }

return ( f"Escalating call {caller_id} to human operator. " f"Reason: {reason}. Hold time: under 2 minutes." )

def calming_protocol(situation: str) -> str: """Return guided breathing or grounding instructions for distressed callers.""" return ( "I hear you. You are safe right now. " "Take a slow breath in for 4 counts, hold for 4, out for 4. " "I am here with you." )

agent = create_agent( model="openai:gpt-4o-mini", tools=[ get_caller_location, dispatch_emergency, escalate_to_human, calming_protocol, ], system_prompt="""You are ARIA, an AI emergency response assistant for a 24/7 helpline.

Your job is to stay calm, assess the situation quickly, and take the right action.

Rules you must always follow:

  • Always acknowledge the caller's distress before asking questions.
  • Ask only one question at a time. Never overwhelm a panicking caller.
  • If someone mentions chest pain, difficulty breathing, or unconsciousness — dispatch ambulance immediately.
  • If someone mentions violence, threats, or break-in — dispatch police immediately.
  • If the situation is unclear or emotional crisis — use calming protocol first.
  • Escalate to a human operator if the caller is unresponsive or the situation is ambiguous.
  • Keep every response under 3 sentences. Short and clear saves lives.
  • Do NOT use emojis, asterisks, bullet points, or markdown. You are speaking aloud.""",

checkpointer=InMemorySaver(), )

The InMemorySaver checkpointer plays a crucial role here as it allows ARIA to remember the entire call history, including:

what was said by the caller three calls ago,

what has already been sent to the caller,

whether the caller verified their own location, etc.

If there were no memory, then every response would begin from a blank state, which can be very problematic in an urgent situation.

Next, consider the streaming agent function.

async def agent_stream( event_stream: AsyncIterator[VoiceAgentEvent], ) -> AsyncIterator[VoiceAgentEvent]: thread_id = str(uuid4()) # Unique per call session

async for event in event_stream: yield event

if event.type == "stt_output": stream = agent.astream( {"messages": [HumanMessage(content=event.transcript)]}, {"configurable": {"thread_id": thread_id}}, stream_mode="messages", )

async for message, _ in stream: if message.text: yield AgentChunkEvent.create(message.text)

stream_mode="messages" sends tokens to TTS as they are produced. ARIA’s first words have started to be spoken before she has completed her reasoning process. This is what creates a 400-millisecond response vs. a 2-second response!

Stage 3: Text-to-Speech with OpenAI TTS

OpenAI TTS is the natural choice, you are already using an OpenAI API key for your agent, thus making one API call, one SDK, and no extra accounts. The tts-1 model was built for real-time/streamed text-to-speech rendering. The shimmer voice is very calm, clear, and rational; all appropriate tones for a helpline.

from utils import merge_async_iters from openai import AsyncOpenAI

client = AsyncOpenAI()

async def tts_stream( event_stream: AsyncIterator[VoiceAgentEvent], ) -> AsyncIterator[VoiceAgentEvent]: text_buffer = []

async def process_upstream() -> AsyncIterator[VoiceAgentEvent]: async for event in event_stream: yield event

if event.type == "agent_chunk": text_buffer.append(event.text)

async def synthesize_audio() -> AsyncIterator[VoiceAgentEvent]: full_text = "".join(text_buffer)

if not full_text.strip(): return

async with client.audio.speech.with_streaming_response.create( model="tts-1", voice="shimmer", # Calm, composed — right for emergencies input=full_text, response_format="pcm", # Raw PCM for lowest latency playback ) as response: async for chunk in response.iter_bytes(chunk_size=4096): yield TTSChunkEvent.create(chunk)

async for event in merge_async_iters( process_upstream(), synthesize_audio(), ): yield event

Tts-1 begins streaming audio chunks as soon as the initial sentence has been synthesized rather than waiting until the entire sentence has been created. You can use response_format='pcm' to skip the overhead of a container and stream audio directly into the websocket byte stream. With a tts-1-hd this means that while the quality is increased, there will be approximately a 200ms increase in latency compared to using tts-1. To get the best performance for an emergency helpline, it is advisable to use the tts-1 voice option.

The

[truncated for AI cost control]