AI News HubLIVE
In-site rewrite3 min read

What Happens When AI Agents Refuse to Work Until They're Paid

The proliferation of local AI agents for software development creates governance and cost challenges at scale. Organizations need centralized platforms with intelligent routing, caching, and financial chargebacks to manage architectural drift and expenses. The author proposes using Agent-to-Agent (A2A) and Agent Payment Protocol (AP2) as open standards for orchestration and internal economics.

SourceHacker News AIAuthor: owulveryck

Exposing the problem

Giving every developer a powerful, local AI agent feels like the ultimate productivity hack. But for organizations running at scale, it is a governance and cost trap waiting to spring.

Currently, the AI revolution in the Software Development Lifecycle (SDLC) is happening almost entirely on developers’ laptops. We are building isolated, monolithic agent loops. I’ve been advocating for a shift toward an agentic platform because I am convinced this local-first approach is only transient.

But before explaining why this model breaks down, let’s define what running SDLC “at scale” means in this context: bringing AI-powered development to N teams working on M products, with both N and M being greater than 10. We are not just talking about the internal dynamics of a single team, but true multi-product organizations.

Ensuring trust at the organizational level

Let’s consider a fundamental truth: LLMs are probabilistic, meaning AI directives are only followed a certain percentage of the time. Imagine you create a skill to enforce a critical business rule—let’s call it an “enterprise architecture decision.”

Because of the nature of AI, there is always a chance this skill is partially ignored or poorly applied.

If that failure rate is even 10%, and you scale this across N > 10 teams running thousands of iterations, you are mathematically guaranteed that some teams will ship code that bypasses your global business rules. This leads to massive architectural drift.

We can, of course, build deterministic guardrails with hooks and programs to enforce validation. But if these are executed locally on developers’ laptops, we lose centralized observability.

The CTO or Principal Engineer is ultimately accountable for the brand’s software. They cannot simply rely on “trusting the team”; they need systemic guarantees. How can a CTO confidently certify what is shipped when the enforcement mechanisms are scattered and invisible?

Managing LLM Costs and Internal Economics

When AI directives are executed locally at the team level, the organization loses control over the execution model.

Developers are often locked into a one-size-fits-all approach. A specific skill might run perfectly on a mid-tier LLM but fail on a low-cost one, yet current local tools (like Copilot or Claude) offer no easy way to dynamically route requests to the most cost-effective model based on the task’s complexity.

Consequently, the organization pays a premium for every single call made by local agents. Without centralized caching or intelligent model routing, this cost scales linearly with the number of developers and iterations, quickly ballooning into a massive expense.

This brings us to a final financial consideration: the internal economy. If a developer builds a highly effective AI skill that is later adopted by multiple teams, who absorbs the execution costs? A decentralized model provides no answer. We need a way to accurately track usage and manage chargebacks to compensate the teams building these shared organizational assets.

Building the Platform of the Future

To solve these challenges, we need to shift from local black boxes to centralized services. A true agentic platform should handle AI queries dynamically—optimizing models and utilizing caching to control costs at scale. It must also maintain a financial ledger for cross-team chargebacks and an audit logbook to ensure architectural compliance.

The rest of this post is a step-by-step demonstration of how this future could look, leveraging two open-source standards: the Agent-2-Agent (A2A) protocol for orchestration and governance, and the Agent Payment Protocol (AP2) to handle the internal economics.

Setting the Scene: The Local Architect

Imagine you are a Product Manager or Tech Lead in a stream-aligned team, tasked with building a new application. To design the implementation, you turn to your local AI architect, “Winston.” (if you are using BMAD you may know Winston already :D)

Winston runs entirely on your local machine. It is smart—well-versed in general software architecture principles and equipped with guardrails to escalate critical compliance issues, like GDPR.

But here is the catch: Winston operates in a silo. It has a massive blind spot regarding the enterprise context and absolutely zero knowledge of the internal components already existing within your organization.

The workflow begins the moment you submit your initial prompt, triggering Winston’s local execution loop.

Here is the prompt you give to Winston:

(…) for this feature, we need to send 50,000 transactional emails per day.

Note: We are skipping over prompt and context engineering here. Naturally, the human would supply much more detail, and Winston would already be loaded with the product’s baseline context.

Consulting the Enterprise Source of Truth

Winston understands the technical requirements, but it is completely blind to the organization’s existing ecosystem. To bridge this gap, it must rely on the platform: a centralized suite of capabilities designed to help stream-aligned teams build applications that fit the company’s standards. The specific capability Winston is mandated to call is the Enterprise Architecture Service. This service acts as the organization’s brain for standards, blueprints, and reusable building blocks. Today, this service is fully automated, handled by a highly optimized, centralized AI agent. These agents don’t use human prompts to talk to each other; they communicate via the Agent-to-Agent (A2A) protocol, a standardized way to query tasks and exchange states. Winston wraps your request in an A2A message and fires it off to the Architect Agent:

{ "role": "user", "parts": [{ "type": "text", "text": "I need to set up email notifications for 50k users"}], "metadata": {"ceiling_credits": 1000} }