AI News HubLIVE
In-site rewrite6 min read

The Asymmetric Future of AI in Cybersecurity

This article examines the dual-use nature of AI in cybersecurity, highlighting the role of AI agents, the rise of local models, the asymmetry between attackers and defenders, and recommendations for defenders to focus on enrichment and prioritization rather than full automation.

SourceHacker News AIAuthor: mstrada

Cybersecurity has always had an interesting property: the same knowledge can either protect a system or compromise it. A proof-of-concept exploit can help a vendor reproduce and patch a vulnerability, or help an attacker weaponize it before users update their systems. None of this is new, but what is changing is the speed, scale, and accessibility at which these actions could now occur.

This post is slightly different from the previous ones. Rather than explaining a specific technical concept, the goal of the first part is to bring some order to the current and near-future relationship between AI and cybersecurity. In the second part, I will try to make some reasoned predictions that go beyond simply betting on red or black at a roulette table.

If you already know AI 101, I suggest skipping to the next section about open models.

Before discussing about agents, it is important to give a quick and boring clarification on what the term actually means, because it is rapidly becoming one of the most overloaded concepts in AI. An AI agent is a system capable of reasoning over a task, interacting with tools, and executing multi-step actions toward an objective. In cybersecurity, this could range from an agent capable of enumerating an attack surface and chaining vulnerabilities together, to one automatically triaging alerts, or responding to incidents.

At the time of writing, AI agents are not capable of conducting fully autonomous end-to-end cyber operations with consistently reliable success rates. However, their capabilities are improving quickly, especially in coding, code vulnerability detection, and biology so they are expected to improve in consistency and skill over time.

The dual-use nature of cybersecurity also exposes a complexity to manage: intent. In clearly malicious scenarios, such as requesting ransomware deployment scripts or credential-stealing malware, intent is relatively straightforward to classify. The challenge emerges in the much larger grey area, where the exact same technical action may be either legitimate or harmful depending entirely on the context.

This creates an intrinsic problem for language models because intent is ultimately expressed through language. For example, I can ask an LLM to help me validate a bug bounty report or provide a CTF narrative to bypass guardrails and exploit that vulnerability.

To mitigate this problem, LLMs typically rely on a combination of safety alignment techniques, policy-based filtering, reinforcement learning from human feedback (RLHF), and runtime monitoring systems designed to identify harmful intent or dangerous outputs. Safety mechanisms usually operate as moving thresholds, tightening restrictions too aggressively risks making the model unusable for legitimate security researchers, developers, and defenders. Relaxing them too much, however, lowers the barrier for malicious actors, making bypasses easier.

This tension is one of the reasons why newer models, such as Anthropic’s Claude Mythos and OpenAI’s GPT-Cyber, have introduced forms of controlled or trusted access. In some cases, this has been criticized as partially a marketing decision to increase the “hype”, but regardless of the motivation, even organizations that publicly advocate broad accessibility are beginning to introduce verification layers or identity-based access controls.

The Central Role of Local Models

Much of the public discussion around AI safety assumes that access controls can meaningfully limit offensive capabilities. In practice this assumption grows weaker with each passing month. The long-term challenge for centralized control is not frontier cloud models, where companies can enforce access controls and usage policies. Rather, it will probably be the rise of local models, which will play the central role in the future.

As of the time of writing, powerful models like Mythos 5 and GPT 5.6 Sol have been restricted by US authorities, allowing access only to a small set of approved companies. This underlines how these tools are increasingly becoming strategic assets and highlights the importance of having high-quality open source models to avoid such limitations.

At the same time, if capability is the concern, we might see governments in the future impose restrictions on capable open-weight models above a certain threshold. The moment an open-weight model reaches performance comparable to state-of-the-art (SOTA) models, governments could attempt to regulate them just as they have tried to regulate closed models. Such bans could be justified on security grounds while also protecting the revenue of the largest LLM companies.

While banning open source products often results only in a symbolic ban, given the existence of copies and mirrors that cannot realistically be recalled, or the ban attempt of strong encryption in the 1990s there is an important difference.

Running open-weight models with performance comparable to SOTA models requires substantial hardware resources, and with current technology it is impossible to run a model with the capabilities to solve significant cybersecurity tasks on systems with less than 128 GB of RAM without a significant loss in performance or without making it unusably slow. As a result, many users would still need to access these models through hosted endpoints, which are much easier to take down because they are more centralized. Finally, while a ban on open-weight models could be bypassed by individual citizens, it would be much harder for companies to circumvent without exposing themselves to potential fines.

The Asymmetry Between Attackers and Defenders

A common assumption in discussions about AI is that defenders will simply use AI agents too, creating some form of balance between offensive and defensive capabilities. That assumption ignores a critical asymmetry: the cost of failure is fundamentally different for attackers and defenders.

Let’s imagine a scenario (probably not too far away) where better cybersecurity local models become runnable on a medium-to-high-range laptop. At that point, an attacker can easily deploy tens of agents against a single target, in the case of state-sponsored actors, hundreds of autonomous agents, and let them loop, trying different paths until they find something interesting or manage to reach the objective. For an attacker, the worst-case scenario for a failed automated action is often relatively limited: the attack gets detected, you get blocked, and the vulnerable assets you spotted get fixed.

For defenders, the consequences of automation errors are usually much more severe. An autonomous defensive system that incorrectly isolates production infrastructure, blocks internal communications, or disrupts critical services can lead to some really unhappy meetings. Defensive automation operates under much tighter reliability constraints because it directly interacts with operational systems that directly affect the company’s business.

A military analogy is useful here. For years, advanced military systems were optimized around highly expensive platforms. Then asymmetric conflicts demonstrated that cheap drones produced at scale could overwhelm systems that were individually far more advanced and expensive. This became evident with Iranian Shahed drones, which cost on the order of tens of thousands of dollars, being intercepted by air defense systems such as the Patriot missile, where each interceptor can cost several million dollars. Similar dynamics have also emerged during the Russia-Ukraine conflict.

Cybersecurity may experience a similar transition. Even if defenders possess stronger models, more compute, and better infrastructure, attackers may still gain an advantage through scale and a higher tolerance for failure. If one organization deploys only a handful of carefully constrained defensive agents, while another actor deploys hundreds of offensive agents at a fraction of the cost, all orchestrated by a single larger model, the asymmetry in automation freedom may ultimately favor the attacker.

What Defenders Should Do Now

What should defenders do now? Probably less than many vendors currently suggest.

One of the most misleading narratives in cybersecurity today is the idea that organizations should aggressively automate everything as quickly as possible. In reality, for the reasons discussed earlier, automating while still depending on a human bottleneck does not provide as many advantages as it appears to. The short-term opportunity is not full automation everywhere, but enrichment, small automations, and prioritization.

This matters because security teams are usually overloaded and understaffed. Automating repetitive cognitive tasks such as alert triage, reverse engineering assistance, vulnerability prioritization, threat intelligence summarization, and detection engineering can already bring significant benefits. These are areas where models can reduce friction without requiring organizations to fully hand over control.

Another area where current models are already demonstrating practical value is vulnerability discovery and remediation assistance. Models are showing that they can support large-scale code auditing, variant analysis, and bug discovery workflows. SOTA models can find zero days and complex vulnerabilities, but it is important not to forget that they can still produce false positives. This makes automated validation a useful companion if the goal is to automate more of the pipeline.

This is also where the human bottleneck becomes most visible. When you have a model performing code vulnerability analysis and finding hundreds of vulnerabilities in a large codebase, even if you let it fix them, which for most companies is already too much trust, you still need quality testing and a human review step to check whether the model did everything correctly. To decide what a human needs to look at first, you need a clear prioritization process. This only grows in importance as automation increases the raw volume of inputs on both sides. The more findings everyone generates, the more central correct prioritization becomes.

It is also worth being precise about where this capability is heading, because the trajectory is not uniform. So far, we have seen near-exponential improvements in code vulnerability discovery, but it is reasonable to expect those gains to slow down: many of the high-volume vulnerabilities will be found first, while future model improvements will likely uncover smaller, more specific vulnerabilities. A clear example was the discovery of zero days and critical vulnerabilities in Firefox: while the number of vulnerabilities found and fixed increased significantly, Mozilla did not officially disclose how many false positives were produced for each real vulnerability. \ The part that will likely keep improving is different, and arguably more consequential: the ability to connect multiple findings together, chaining several individually minor issues into a coherent and reliable full kill chain.

There is also a complementary point: the cheapest vulnerability to handle is the one that never ships. The same models that are useful for auditing existing code are becoming increasingly valuable at the point of creation: reviewing changes before they merge and flagging insecure patterns, unsafe defaults, or missing input validation while the code is still being written. Given the asymmetry described earlier, where defenders pay far more for failure than attackers do, reducing the number of vulnerabilities that ever reach an exploitable state is one of the few moves that meaningfully pushes cost back onto the attacker.

All of this assumes something the marketing rarely mentions: an AI system is only as good as the data underneath it. A model triaging your alerts cannot prioritize if you do not have a clear asset criticality classification, and it cannot reason about assets you do not know you h

[truncated for AI cost control]