2026-06-04 15:23 UTCIn-site rewrite6 min readUpdated: 2026-06-30 13:03 UTC

Mate Security’s Asaf Wiener made every backend engineer a model router. He’s right to.

When AI inference costs threatened Mate Security's runway, CEO Asaf Wiener didn't just cut costs—he restructured the company so that every backend engineer owns model selection, evaluation, and routing for their workloads. This shift from cloud-era opacity to workload-level cost visibility has enabled quality-cost optimization, with open-source models sometimes outperforming frontier APIs on specific tasks. Wiener argues that an AI-native company's only structural advantage is shipping against the best model available that day, enabled by an 'execution mode' culture that avoids legal-policy review cycles and hires for adaptability.

SourceThe New Stack AIAuthor: Matthew Burns

Article intelligence

EngineersAdvanced

Key points

Wiener broke down AI inference cost into ~10 sub-lines for workload-level visibility, projecting per-feature cost before shipping.
Every backend engineer at Mate runs evals on their workloads, choosing models based on quality and cost, updated continuously.
Open-source models sometimes beat frontier APIs in Mate's internal evals on both quality and cost for specific workloads.
Mate's 'execution mode' culture bypasses legal-policy review for new models, hiring for 'resilience for changing.'

Why it matters

This matters because wiener broke down AI inference cost into ~10 sub-lines for workload-level visibility, projecting per-feature cost before shipping.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

This panel is AI-generated and reviewed for accuracy.

AI Operators is The New Stack’s profile series on how CEOs and founders run AI-native companies: the stacks they choose, the costs they manage, the staff they manage, the workflows they rebuild, and the hard calls other operators can learn from.

A few months into running Mate Security, Asaf Wiener called his CTO and told him they had about five months before the startup ran out of financial runway. The number that triggered the call was the AI inference line on Wiener’s CEO dashboard.

“I took a look at the dashboard, and the number was incredibly high,” Wiener tells The New Stack. “I said, ‘Okay, oh my god, I need to stop. We’re bleeding money, and we’re a startup. If I’m not stopping that right now, I will not survive. I will not survive in six months from now.’”

Wiener is the co-founder and CEO of Mate Security, an AI-native SOC startup that emerged from stealth in November with $15.5 million in seed funding from Team8 and Insight Partners (The New Stack is owned by Insight Partners). He led product at Wiz before Mate, and at Microsoft before that.

At Mate, Wiener says, the line that demands CEO-level attention is the cost of running AI models. That bill has two parts: what Mate pays providers like Amazon, Google, and Anthropic, and what it costs to run other models on Mate’s own hardware.

Wiener will not put a dollar figure on the record. He calls the category the cost of goods sold by an AI-native company. In his telling, it is the first line on the P&L. His reaction to the high bill was not a cost-cutting exercise. It was a rewrite of how Mate is structured, moving away from cloud-era strategies, and how every backend engineer at the company chooses, tests, and routes the AI models behind its product.

So now he breaks that single line into roughly ten sub-lines to manage it. He says he can open his dashboard, see where the day’s spikes are, and know why each one happened. The unit he measures against is the investigation, the core economic action of a security operations product. Mate estimates the token cost of every feature per investigation before it ships, allowing Wiener to decide whether a feature can be priced profitably or must be re-engineered to run more cheaply before it reaches a customer. That visibility is non-negotiable, in his view. If you cannot break apart the largest cost line, you cannot make defensible calls about which features to ship or which margins are real.

The pressure is not unique to Mate. As model access broadens and model performance compresses, inference is becoming where AI startup margins live or die in 2026.

As model access broadens and model performance compresses, inference is becoming where AI startup margins live or die in 2026.

If inference is the largest cost line in an AI-native product, in Wiener’s view, the people shipping features cannot be insulated from model economics. They have to know which model they’re using, what it costs, how it performs, and when a newer or smaller model should replace it. Inference became a P&L line, so model selection had to become an engineering practice: Every workload has an owner, every owner has an eval, and every eval ties quality back to cost.

At many AI-native companies, model choice still sits with a small research or platform group. At Mate, it sits with the engineer who owns the workload.

The AI Operators

Mate Security

Asaf Wiener · Co-founder & CEO

Personal

Claude Cowork: hunting target accounts, enriching lists, prepping LinkedIn outreach

Claude Code: custom outbound agents tuned to Mate’s ICP

Production

Frontier APIs paired with self-hosted open-source models on Mate’s own GPUs

Cursor for engineering, Lovable for product design

No LangChain or LangGraph in core agents; built from scratch

First line on the P&L

Broken into ~10 sub-lines for workload-level visibility

Measured per investigation, the core economic unit

No dollar figure on the record

Per-feature AI cost projected before each feature ships

Forecasts decide what gets shipped, repriced, or re-engineered

Every engineer routes models

Hiring filter: “resilience for changing.” No refusers; won’t hire candidates uncomfortable with workflow churn

Backend engineers own model selection and eval at the workload level

Evals are part of the coding life cycle; routing decided per workload, continuously

Integration code written by Mate’s own agents overnight; agents sometimes self-approve PRs

Execution mode: no legal-policy review cycle on new models or tools

Wiener’s framing: shipping against the best model the day it ships is the only structural edge a small company has

100 Integrations shipped in a single overnight agent run

Agent-generated PRs, sometimes agent-approved

Ready for customers by morning

Overnight runs ongoing since the eval discipline shipped

Made possible by a model layer that’s no longer a fixed vendor cost

5 months The survival window a dashboard spike forced

Cost layer was invisible to the engineers shipping features

Forced a full rebuild of model selection and routing

Trigger: a single look at the inference line on Mate’s dashboard

Lesson: cost visibility has to live with the workload owner

“At the end of the day, they are not actually writing lines of code. They are orchestrating agents.”

Asaf Wiener, Co-founder & CEO

On what backend engineers at AI-native companies actually do now.

Mate Security’s AI workflow

“If you define the day-to-day job of one of my backend engineers, part of their job is to create an eval and test those models,” Wiener says. “It’s part of the coding life cycle.”

Every backend engineer at Mate writes and runs evals on the workloads they own. When Anthropic ships a new Opus, when Google updates Gemini, when a new open-source model lands on Hugging Face, the responsible engineer tests the candidate against production. If the new model wins on Mate’s quality and cost criteria, the workload moves. If not, it stays put. The decision is local, and updated continuously.

Wiener’s most-cited example is the Opus 4.6 to 4.7 transition. The reflex across AI-native teams was to upgrade the week 4.7 shipped, on the assumption that the newest model from a top lab is automatically the best one for any workload.

“Believe it or not, 4.7 is not the best model,” Wiener says. “We have our AI researchers testing the quality of those models every day, and if we’re finding that Opus 4.6 is better for us, we will shift as an organization.”

He means for Mate’s workloads, not as a general verdict on the model. The same workload-specific test runs against open-source models. Mate routes some workloads to self-hosted models on its own GPUs rather than the frontier APIs.

When open source wins

Wiener pushes back on the idea that cost discipline forces a team to accept worse output. The eval discipline is what makes quality and cost compatible: The team optimizes for quality first, and in Mate’s internal evals, open-source models sometimes beat frontier APIs on both quality and cost for specific workloads. Wiener is careful with that finding, and one can appreciate the discipline of his distinction. Too many founders treat “open source beats frontier” as a slogan. He treats it like a measurement.

The discipline has produced one result Wiener is comfortable putting on the record. Mate has built a class of agents that write integrations behind the scenes around the clock.

“We developed high-quality agents that write integrations behind the scenes 24/7,” Wiener says. “I will see that Mate built 100 integrations during the night, and our agents provided the PRs. Sometimes they also approved the PRs.”

Agents approving their own pull requests at a security company raises a control question in itself. Wiener frames it as a productivity result; enterprise buyers will test that framing in pilots. The automation is possible at all, he says, because the model layer is no longer a single fixed vendor cost the company is locked into. Wiener made a version of this argument himself in a column for The New Stack on the ROI of AI-driven security automation, writing that measuring AI work by output quality and cost per investigation is what separates a credible product from a marketing claim.

Wiener does not impose the eval-and-route discipline from outside the workflow. He runs his own version of it. As CEO, he owns sales, marketing, and go-to-market, and he says his day is mostly spent working alongside agents rather than people. He uses Claude Cowork to hunt and build target-account lists, enrich them into spreadsheets, and prep LinkedIn outreach, guiding the model through the steps rather than writing each touch himself. He also builds his own agents in Claude Code, tuned to Mate’s ideal customer profile, to draft outbound at volume he could not produce by hand.

The workflow is the one he wants in engineering: Define the job, give the agent context, inspect the output, and improve the workflow rather than do the task by hand. The work is no longer done by hand. It is directing the agent, inspecting the result, and deciding what is good enough to ship.

The job description for a Mate backend engineer now more closely resembles what other companies call a research or ML platform engineer. The day-to-day is less about writing application code against a known model than about deciding which model the code routes to, under which conditions, at which cost. That decision used to belong to a small group of AI researchers, if to anyone in particular. At Mate it belongs to every backend engineer.

Execution mode, no refusers

That shift is downstream of a culture choice Wiener describes as “execution mode.” Mate does not subject model or tool choice to a legal-policy review cycle.

“I can let my team leverage new tools, new models,” Wiener says. “I don’t need to approve that in legal policy, or whatever it will be.”

The argument is not against governance. It is at odds with the version most large enterprises have built around AI adoption, which treats every new model as a procurement event, with security, legal, and committee sign-off on what an engineer is allowed to try. Wiener has worked inside those structures at Microsoft. His point is narrower: shipping against the best available model the day it ships, rather than the one that cleared review two quarters ago, is the only structural advantage a small company has over a large one. He spends it deliberately. I think he is right about that.

The downstream effect on hiring is direct. Wiener tests for what he calls “resilience for changing” in interviews: the ability to drop a workflow that worked yesterday because a new model has changed what works today. Mate has no so-called AI refusers, he says, because it does not hire candidates uncomfortable with that churn. The framing is closer to a baseline expectation than a punitive culture. If you do not want to throw away a working integration when a better model ships, Mate is not the company for you.

It’s a hard culture line. It’s also where his argument cuts against a familiar failure mode, the hidden engineering cost of bolting AI onto every product roadmap. Companies that add AI on top of an unchanged engineering org, in his telling, do not get the speed they think they are buying. The speed comes from changing what engineers do, not from handing them new tools to do the same job.

Wiener’s honesty about the cost crisis holds the argument together. The AI cost blowout was the result of an org running on cloud-era methods, where the cost layer was invisible to engineers shipping features. Making it visible at the workload level forced the eval discipline, which forced the job rewrite. An AI-native company that ships features without owning model choice is shipping its own runway out the door.

“At the end of the day, they are n

[truncated for AI cost control]