AI News HubLIVE
站内改写6 min read

Can AI Do Intelligence Analysis? Apparently Not

The author attempted to build an intelligence analysis team composed of AI agents to automate Structured Analytical Techniques (SAT), but found it ultimately unworkable. Testing on a real question about Chinese cyber operations against Taiwan revealed issues such as redundant sub-questions, improper framing of collection requirements, and ineffective analytical loops. The conclusion is that current AI cannot replace human judgment in rigorous intelligence analysis.

SourceHacker News AIAuthor: beatrobot

Robin Dimyanoglu

Jun 03, 2026

Structured Analytical Techniques (SAT) are a set of methods used in intelligence analysis to minimize cognitive shortcuts and biases. When working on cases like attribution analysis, geopolitical CTI or forecasting, applying these is a must for accurate analysis.

But as is also well known, applying these techniques requires people who are trained in SAT and who work through these methods as a group. Meaning analysis done this way takes a very long time and is costly. In the private sector, there is rarely enough time to devote to this kind of structured analysis.

To solve this problem, I wondered: could I build an analyst team made up of AI agents? If it worked, I would have an automation that could apply these time consuming, costly structured techniques on its own and produce analysis rather quickly. This post is about the path I took, the things I tried and my thoughts on why it didn’t work.

The architecture

The idea was a pipeline of specialized agents, each owning a distinct role in the intelligence cycle. From intake all the way to finished reporting. Here's what I designed:

01 - Stakeholder interviewer

The entry point. This agent chats with the customer to capture the intelligence requirement in an unstructured, conversational format. It asks follow-up questions to understand not just what the customer wants to know, but why they need it and what decisions the intelligence will inform. From that conversation, it distills a primary intelligence question and breaks it down into a set of focused sub-questions that can actually drive collection.

02 - Collection planner

Has access to a curated database of trusted information sources, each carrying an assigned trust score that reflects its reliability and accuracy. This agent takes the sub-questions from the interviewer, figures out what kind of data is needed to answer each one and maps them to the best-fit sources available.

It deliberately avoids over-relying on any single source. Diversity of sourcing is a hard requirement. The trust scores it assigns here follow the information all the way through to the final analysis.

Additionally, the collection planner does not pass the intelligence question itself to the collector. It only specifies what to look for and where. The collector must not know what conclusion the analysis is working toward to avoid confirmation bias.

03 - Intelligence collector

Executes the collection plan by looking up the specified sources and retrieving anything potentially relevant. Critically, this agent does not interpret or summarize what it finds. It surfaces information verbatim, exactly as it appears in the source. Its job is to bring raw material to the table, not to answer the question. When the plan's sources come up empty, it can fall back to open-source internet search, but must explicitly flag those results as coming from an unverified source.

04 - Analysis coordinator

The orchestrator of the analytical layer. This agent looks at the intelligence question and the collected material, then decides which analytical techniques are most appropriate for the problem at hand. Each technique is its own independent agent. Think of them as individual analysts with different methodological specializations. The coordinator selects which ones to engage and manages the flow of information between them.

05 - Analyst agents

A multi-agent layer that runs the actual structured analysis. Rather than a single "analyze this" prompt, these agents work through a deliberate reasoning loop. Scenario generation, refutation, evidence weighting, assumption checking. This process mirrors how a real SAT session would operate. They also communicate with each other and can loop back to collection when they identify gaps. More on this below.

06 - Reporter

Takes the analytical output and translates it into something a real stakeholder can act on. This agent looks at the use-case established in the intake stage, builds a mental model of the audience, and structures the report accordingly. The governing principle is BLUF (Bottom Line Up Front). The key assessment and its implications should be readable in under 30 seconds, without having to wade through caveats and methodology. Alternative scenarios are included, but the report takes a clear position and explains why.

The analysis layer

The analyst layer deserves a closer look. I wanted the agents to run through a structured reasoning loop, one that encodes the actual analytical steps that make SAT valuable in the first place:

Scenario generation

Start by generating a broad list of possible scenarios using a creative SAT. Brainstorming without pre-filtering. The goal here is to avoid anchoring on the obvious explanation too early.

First refutation pass

Go through the scenario list and eliminate any that the evidence already in hand directly contradicts. This is an active pruning step.

Collection gap identification

For each scenario that survived, articulate what evidence or conditions would refute it. Then send that list back to the collection planner as new intelligence requirements. This circular loop intends to make the architecture more reliable.

Second refutation pass

When the new collection comes back, revisit the remaining scenarios. Does any of the new information rule something out? If so, eliminate it. Repeat until no further elimination is possible.

ACH pass

For surviving scenarios, flip the question: what in the evidence supports each one, and how strongly? This is Analysis of Competing Hypotheses; systematically evaluating how well the available information fits each remaining explanation.

Evidence weighting

Rank the remaining scenarios by the combined quality of the evidence supporting each. This is where the trust scores assigned during collection finally come into play.

Key assumptions check

Identify the baseline assumptions underpinning each scenario. If any of those assumptions turned out to be wrong, would the scenario collapse? If multiple scenarios share the same critical assumption, that assumption itself becomes a collection target. Loop back to the planner, collect new information and try to verify or refute it.

Scenario selection

Converge on 2–3 non-overlapping, mutually alternative scenarios that have survived both refutation rounds and hold up under the assumptions check. These become the candidates for the final assessment.

Devil’s advocate

Deliberately stress-test each of the remaining scenarios. What are its weakest points? Then apply alternative thinking: offer a plausible alternative explanation for each one that the evidence could also support. This is the check against overconfidence.

Final assessment

Choose the most likely scenario based on evidence quality and resilience to assumption failure. Rank the others below it. Document supporting evidence, key weaknesses, and plausible alternatives for all of them.

This architecture attempts to operationalize the structured analysis process. The design was also made modular to make adding new SATs easy in the future. But, how did the user experience look like?

User experience

I tested it with the following question: “if China moves on Taiwan, what cyber operations should we expect, directed at whom, and with what objectives?”

The intake experience was exactly what I'd hoped for. Rather than dumping a requirement into a form, the tool opened a conversation. Two short exchanges and it had a focused requirement.

Deriving the intelligence questions

Looking at the session logs told a different story. I checked out the primary question and the sub-questions it had generated and the problems were visible immediately.

RESPONSE: { "primary_question": "What are China's likely cyber operations, targets, and objectives in the event of a military contingency involving Taiwan?", "sub_questions": [ "What are China's primary strategic objectives for cyber operations during a military contingency involving Taiwan?", "What critical infrastructure sectors, government entities, and private organizations are China's most likely cyber targets during such a contingency?", "What specific types of cyber operations (e.g., espionage, disruption, destruction, influence) and attack vectors is China likely to employ?", "What are the anticipated operational and strategic effects of these cyber operations on targeted entities?" ], "stakeholder_use_case": "This intelligence will be used to prioritize detection gaps and inform the development of incident response playbooks.", "stakeholder_persona": "Senior Cyber Security Strategist/CISO. Highly technical, focused on proactive defense, risk mitigation, and incident response preparedness.", "urgency": "routine" } [11:38:53] ◄ STAKEHOLDER INTERVIEW — done

First, these questions are redundant and heavily overlapping. Good sub-questions should be mutually exclusive and collectively exhaustive: answer each one independently, and together they should give you everything needed to answer the primary. These don’t do that. They all require the same information to answer.

Second, all the questions are anticipatory. “what is China likely to do,” “what are the anticipated effects.” And that is the wrong question type for collection. You cannot answer anticipatory questions by collection. You might get a rough answer through analysis, by reasoning from known doctrine, history, and capabilities. But collection gathers facts about what exists, what has happened, what has been observed. Sending an anticipatory question to a collection planner is like asking a researcher to go find evidence for something that hasn’t happened yet. The framing is wrong at the source.

Collection planning

The collection planner's output was next. I'm not going to walk through every single one. Instead, you can go through them and try to spot the mistakes yourself. But let me point out the first few that immediately caught my eye.

RESPONSE: [ { "task_id": "T1.1", "retrieve": "Retrieve analyses of China's stated and observed strategic interests related to cyber operations in the context of regional security scenarios involving Taiwan.", "source_id": "src_rand", "source_name": "RAND Corporation", "output_format": "verbatim excerpts with dates and any confidence indicators present in the source" }, { "task_id": "T1.2", "retrieve": "Collect reports and official statements outlining Chinese policy and strategic perspectives on cyber operations in the Asia-Pacific region.", "source_id": "src_xinhua", "source_name": "Xinhua News Agency", "output_format": "verbatim excerpts with dates and any confidence indicators present in the source" }, { "task_id": "T1.3", "retrieve": "Gather strategic analyses from a UK perspective on nation-state cyber capabilities and their potential application in geopolitical conflicts, particularly in the Asia-Pacific.", "source_id": "src_rusi", "source_name": "RUSI (Royal United Services Institute)", "output_format": "verbatim excerpts with dates and any confidence indicators present in the source" }, { "task_id": "T2.1", "retrieve": "Collect advisories and reports detailing observed or assessed targeting patterns by state-sponsored actors, specifically identifying critical infrastructure sectors and government entities.", "source_id": "src_cisa", "source_name": "CISA Advisories", "output_format": "verbatim excerpts with dates and any confidence indicators present in the source" }, { "task_id": "T2.2", "retrieve": "Retrieve threat intelligence reports on APT activities, focusing on identified target sectors including critical infrastructure, government, and finance, without specific attribution to China.", "source_id": "src_mandiant", "source_name": "Mandiant Threat Intelligence", "output_format": "verbatim excerpts with dates and any confidence indicators present in the source" }, { "task_id": "T2.3", "retrieve": "Gather intelligence reports on APT and eCrime targeting patterns,

[truncated for AI cost control]