This post guides you through building a server for real-time PDF text extraction from Amazon S3 using the Model Context Protocol (MCP). It compares this interactive approach with Amazon Textract, detailing architecture, implementation steps, cost analysis, and security considerations.
Build an MCP server for on-demand text extraction from PDFs in Amazon S3.
Ideal for text-based PDFs in development and proof-of-concept environments.
Cara is an AI-native solution on AWS that automates back-office processes for insurance brokerages, saving agents ~10 hours per week. It uses Amazon EKS and Amazon Bedrock for scalable, secure, domain-specific AI workflows.
Cara automates repetitive tasks in insurance brokerages, addressing talent shortage. Built on AWS with EKS orchestration and Bedrock for LLM inference.
Delivers tenant-isolated, elastic scaling for thousands of concurrent users.
Stripe processes $1.4 trillion in annual payment volume across 50 countries. Using a ReAct agent framework on Amazon Bedrock, they reduced review handling time by 26% while maintaining human oversight. This article covers the technical architecture, infrastructure decisions, and lessons learned, including task decomposition, orchestration patterns, and cost optimization via prompt caching.
Stripe decomposed compliance reviews into sub-tasks arranged as a directed acyclic graph, ensuring quality and auditability.
AI agents provide pre-researched information to human reviewers, who retain final decision authority, achieving over 96% helpfulness ratings.
In this technical collaboration between AWS and the authors, we present a pragmatic solution: agentic overlays. Agentic overlays are thin wrapper layers that transform traditional REST-based services into agents capable of participating in A2A interactions. They also expose REST APIs as tools compatible with the Model Context Protocol (MCP). Together, they let enterprises add A2A capabilities to existing REST services without rewriting business logic, without duplicating code, and without running parallel infrastructures. This reduces agent sprawl in the infrastructure by reusing existing services as agents. We provide reference architectures and sample code that show how to build agentic overlays.
Agentic overlays are thin wrappers that turn REST services into A2A agents and expose MCP tools.
No need to rewrite business logic or maintain parallel infrastructure, reducing cost and complexity.
This post shows you how to configure training jobs on Amazon SageMaker AI to get the most out of Blackwell’s architecture on AWS. You learn how to select batch sizes and sequence lengths that take advantage of Blackwell’s expanded memory, choose the right precision format for your model size (1B to 64B parameters), and apply activation checkpointing strategically. By the end, you have a practical framework for tuning your training configuration and launching distributed training jobs on P6-B200 instances.
Blackwell's expanded memory supports larger batch sizes, longer sequence lengths, and simplified model sharding.
Activation checkpointing is a prerequisite for stable training with large models (~14B+ parameters).
This post demonstrates how to implement video upscaling using SeedVR2 on SageMaker AI. It covers solution architecture, deployment steps, and performance comparisons highlighting quality improvements and processing efficiency. By the end, you'll have practical knowledge to implement this super resolution solution.
SeedVR2 is an open-source video restoration model by ByteDance, combining diffusion models and GANs for efficient upscaling.
The solution uses a three-tier AWS architecture including security, storage, and SageMaker processing pipeline.
In this post, we show you how to build Chaplin (Customer Health and Planned Lifecycle Intelligence Nexus), an open source solution that uses AI agents exposed through the Model Context Protocol (MCP) to provide self-service health event analytics.
Chaplin is an open-source solution using AI agents via MCP for self-service AWS Health event analytics.
It overcomes the bottleneck of relying on TAMs for health event interpretation.
This post shows how to build a governed, serverless data mesh on AWS that provides the secure, scalable data foundation production agentic AI requires.
Agentic AI requires fine-grained access control at every step from tool discovery to query execution, which traditional RAG governance cannot address.
Amazon S3 Tables with built-in Iceberg support and AWS Lake Formation provide row/column/cell-level security with up to 10x higher transactions per second.
In this post, you will learn how to build a voice agent that handles appointment reminder conversations using Amazon Nova 2 Sonic and Amazon Bedrock AgentCore. The agent authenticates patients by voice, manages appointments (confirm, cancel, or reschedule), collects pre-visit health information, and escalates to human staff when needed. You handle routine calls at scale, which can help reduce no-show rates. This sample focuses on the agentic side of the problem: voice conversation and tool orchestration. A browser-based interface is included for testing. To connect the agent to actual phone lines for outbound dialing, you would integrate a telephony service such as Amazon Connect Customer.
Uses Amazon Nova 2 Sonic for native speech-to-speech processing and Amazon Bedrock AgentCore for serverless runtime.
Handles patient authentication, appointment management, health info collection, and escalation to humans.
This post explains how to build an end-to-end integration between Snowflake semantic views and Amazon QuickSight. Using movie review data, it demonstrates how to define a shared business logic layer, explore data with natural-language queries via Cortex Analyst, and generate consistent dashboards—reducing data reconciliation efforts and AI hallucinations.
Semantic views attach business definitions directly to the data layer, ensuring unified interpretation across AI and BI systems
Natural-language queries through Cortex Analyst reduce AI hallucination risk
Loka built a conversational AI agent using Amazon Nova 2 Sonic that addresses the latency and unnaturalness of traditional voice assistants, achieving high accuracy, low cost, and natural interactions through native speech-to-speech processing.
Traditional voice agents suffer from 3-5 second delays due to a three-step pipeline (STT, LLM, TTS), harming conversation flow and increasing costs.
Amazon Nova 2 Sonic uses end-to-end speech processing, scoring 87.0 on Big Bench Audio, with 1.39s TTFB and ~$0.27/hour cost.
This post shows you how to build a conversational protein research assistant that combines three capabilities: Natural language query parsing to extract structured search parameters, vector similarity search over protein embeddings using a specialized language model and AI-generated scientific summaries of search results.
Use Strands Agents SDK to orchestrate three specialized tools: parser, searcher, summarizer, deployed to Amazon Bedrock AgentCore.
Leverage ESM-C 300M protein language model for embeddings and pgvector on Amazon Aurora PostgreSQL for vector similarity search.
This post presents patterns for building production-ready multi-tenant AI systems using Amazon Bedrock AgentCore, demonstrated through healthcare AI agents serving multiple clinics and hospitals, covering tenant isolation, service tier differentiation, cost tracking, and observability.
Achieve complete tenant isolation using native Amazon Bedrock AgentCore capabilities.
Differentiate service tiers (e.g., Basic and Premium) with minimal custom code.
Ampersend built a pay-per-intelligence routing layer on Amazon Bedrock AgentCore Payments, enabling AI agents to autonomously pay for model services using the x402 protocol. The integration handles wallet custody, spending governance, and two-hop settlement, reducing development time from months to under two weeks.
Ampersend integrates with Amazon Bedrock AgentCore Payments to provide autonomous pay-per-intelligence for AI agents.
The solution uses a two-hop payment pattern: agent pays Ampersend, then Ampersend pays the model provider.
This post explores using multimodal embeddings, LLM captioning, and vector search on AWS to turn aerial imagery into a natural-language-searchable knowledge base. A five-stage pipeline built with Amazon Bedrock and Amazon OpenSearch Serverless evaluates different embedding models, fusion strategies, captioning approaches, and search methods. Experiments show Amazon Nova Multimodal Embeddings achieve the highest F1 scores on benchmark queries. The work evolved into Vexcel Intelligence, a searchable imagery product.
This post walks you through deploying ComfyUI workflows on Amazon SageMaker AI processing jobs to generate hundreds of high-quality images in a single batch. Learn to set up infrastructure using AWS CDK, configure GPU-accelerated processing, and automate image generation at scale. Adapt this solution to your own ComfyUI workflows.
Deploy ComfyUI on SageMaker processing jobs for batch image generation.
Use AWS CDK to create infrastructure including VPC, S3 buckets, and Lambda triggers.
Amazon Bedrock AgentCore now includes a fully managed web search capability that grounds AI agents in current web data. It uses Amazon's own web index, offers privacy by keeping queries within AWS, and integrates via MCP with minimal code. This solves the problem of stale knowledge in agents.
Web Search on Amazon Bedrock AgentCore is GA, providing agents with real-time web information.
Powered by Amazon's own web index with tens of billions of documents, refreshed within minutes.
This article explains how to integrate Adobe Marketing Agent with Amazon Quick using Model Context Protocol (MCP). It details the setup process, authentication, and validation with sample queries for audience rankings, loyalty segments, journey usage, and conflict analysis, enabling marketers to get campaign insights via natural language conversations.
Integrate Adobe Marketing Agent with Amazon Quick via MCP for natural language campaign insights.
Configure the branded connector, manage tool permissions, and publish the connection.
Amazon SageMaker AI now emits over 100 detailed inference metrics covering GPU health, token-level latency, KV cache pressure, traffic distribution across Availability Zones, and more. These metrics are displayed in a built-in SageMaker Insights dashboard in CloudWatch, which supports PromQL queries. This post explains how to enable detailed observability, navigate the dashboard, and connect metrics to external tools.
SageMaker inference endpoints now emit over 100 detailed OpenTelemetry metrics to CloudWatch by default.
The new SageMaker Insights dashboard provides Performance, Capacity, and Reliability views to quickly pinpoint latency and resource issues.
Amazon Bedrock AgentCore harness is now generally available, allowing developers to create and run a fully functional agent with just two API calls. It provides an isolated runtime environment, built-in memory, tool integration, skill libraries, and real-time tracing without writing orchestration code or building containers.
Two API calls (CreateHarness and InvokeHarness) create and run an agent quickly
Agent runs in isolated environment with filesystem and shell for safe code execution
Today, we’re announcing inline payload support for Amazon SageMaker AI Async Inference. Customers can now send inference payloads directly in the request body of the InvokeEndpointAsync API, removing the need to upload input data to Amazon Simple Storage Service (Amazon S3) before each invocation.
New Body parameter allows sending inline payloads up to 128KB, mutually exclusive with InputLocation.
Simplifies client code: no S3 client, IAM permissions, or input bucket management needed.
Amazon Quick introduces autonomous agents that work continuously, a prioritized activity feed, and cross-data source insights from a single question, helping users reclaim hours daily.
New autonomous agents in Quick handle tasks continuously in the background.
Activity feed consolidates and prioritizes communications across apps.
At AWS Summit New York City, AWS announced a series of innovations including AWS Context (coming soon), AWS Glue Data Catalog Business Context and Semantic Search (preview), and Amazon S3 Annotations (generally available) to provide trusted context for AI agents. These services leverage knowledge graphs, identity-aware access, and open standards to enable organizations to build a shared, governed context layer that enhances the decision-making capabilities of AI agents.
AWS Context automatically maps data relationships into a knowledge graph, enabling agentic search for governed data and business rules at runtime.
AWS Glue Data Catalog adds business context and semantic search, enriching technical metadata with business descriptions and terms, and skill assets for agent guidance.
Amazon Bedrock AgentCore introduces new capabilities for connecting agents to organizational, web, and paid knowledge, along with optimization features for continuous improvement and enhanced policy controls.
Agents gain native access to organizational knowledge via Managed Knowledge Base, web search, and paid content via AgentCore payments.
Optimization capabilities include failure/intent/trajectory insights, recommendations, and A/B testing for continuous improvement.
Today, we’re announcing a new API with Amazon Bedrock Guardrails. With this API, you can apply individual safeguards, also referred to as safety checks, at any point in your agentic AI applications without creating guardrail resources. In this post, we walk through how the InvokeGuardrailChecks API works and how to use it to build safe, multi-turn agentic AI applications.
Amazon Bedrock Guardrails introduces InvokeGuardrailChecks API for applying safety checks without creating guardrail resources.
API operates in detect-only mode, returning numeric scores for content filters, prompt attack detection, and sensitive information filters.
Amazon SageMaker AI announces container image caching for inference, reducing end-to-end latency by up to 2x for generative AI models during scale-out events.
Container caching automatically activates for supported accelerator instance types with no modifications required.
Eliminates container image pull on new instance launches, reducing startup latency by up to 51%.
This post walks you through how to use P-EAGLE directly within Amazon SageMaker AI. It demonstrates how to select a compatible model from SageMaker JumpStart, configure parallel drafting, and deploy a highly optimized endpoint.
P-EAGLE eliminates sequential drafting by predicting all draft tokens in a single forward pass.
Today, we are announcing the availability of the Gemma 4 family on Amazon Bedrock. Built by Google DeepMind and released under the Apache 2.0 license, Gemma 4 is a family of open-weight models designed with a focus on intelligence-per-parameter across a broad range of deployment scenarios. The family includes three instruction-tuned variants: Gemma 4 31B, Gemma 4 26B-A4B, and Gemma 4 E2B. These cover dense and mixture-of-experts (MoE) architectures, where only a fraction of the model’s parameters activate per request. The variants offer built-in reasoning, native function calling, and multimodal input across text and image.
Gemma 4 family now available on Amazon Bedrock, featuring three variants: 31B dense, 26B-A4B MoE, and E2B PLE.
Supports built-in reasoning mode, native function calling, and multimodal input (text and image).
This post introduces detectors in the Strands Evals SDK that automatically identify failures in AI agent execution traces and perform root cause analysis, reducing diagnosis time from hours to minutes. You learn how to call detector functions, interpret structured output (categorized failures, confidence scores, causal chains, and fix recommendations), and integrate detection into your evaluation pipeline for automated diagnosis on every test run.
Detectors operate in two phases: failure detection (scanning spans against a 9-category taxonomy) and root cause analysis (linking causes to symptoms and recommending fixes).
Functions detect_failures and analyze_root_cause provide separate outputs, while diagnose_session offers a unified pipeline.
This post demonstrates building a competitive research agent using LangChain Deep Agents and Amazon Bedrock AgentCore. The agent delegates deep work to isolated subagents (browser and interpreter) to overcome context window limitations, enabling parallel research, data analysis, and cross-session memory.
Deep Agents orchestrates specialized ephemeral subagents, each in an isolated AgentCore MicroVM.
Three browser subagents research competitor websites in parallel, then an analyst subagent generates charts and reports.