Microsoft Research Blog AI News Source

Public articles 21Collected articles 24Trust 90Refresh 30 min

Health HealthySource type ResearchFull-text rights Official full textLast ingested 2026-06-25ID microsoft-researchStatus Enabled

Official research source; confirm reuse terms before enabling full body display.

Latest public articles

Understanding the brain with AI-driven explanations and experiments

2026-06-25 16:00 UTC

Researchers introduce generative causal testing, which translates black box models into clear hypotheses and verifies them in the scanner, revealing what specific brain regions respond to in language.

GCT distills brain-prediction models into short verbal explanations.
It validates explanations by generating new stories that causally activate targeted brain regions in fMRI.

Ire identifies another LOTUSLITE specimen

2026-06-12 20:30 UTC

Project Ire, Microsoft's autonomous malware-classification agent, reverse-engineered a LOTUSLITE variant that went undetected by most major EDR tools. Through behavioral analysis rather than signature matching, Ire identified the sample's malicious intent and produced a detailed function-level report consistent with Acronis's published analysis.

Ire analyzed a LOTUSLITE variant sharing TTPs but not known IOCs.
Sample hash 47e51e... was initially flagged by only a few vendors.

Data Formulator 0.7: AI-powered data analytics for enterprise data

2026-05-28 16:00 UTC

Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.

Open-source AI system for enterprise data analytics
Data Connectors support governed, reusable connections across diverse data sources

Extending Human Intelligence Through AI

2026-05-27 16:00 UTC

Modern AI systems are powerful not because they replicate human intelligence, but because they extend structures already present in human cognition and language. This perspective explains AI's capabilities and limitations, and reframes AI safety as a system-level challenge requiring engineering and governance, not fear of rogue AI.

AI systems extend human intelligence by modeling sedimented structures of understanding in language, not by replicating human minds.
Hallucinations and the compositionality gap arise from AI's lack of lived engagement with the world that anchors meaning and truth.

MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

2026-05-21 17:00 UTC

Microsoft Research releases MagenticLite, an agentic application designed for small models, along with MagenticBrain orchestrator and Fara1.5 computer-use model. The system works across browser and local file system, achieving state-of-the-art results on web navigation tasks while keeping data on-device.

MagenticLite is a next-gen agentic app that operates across browser and local file system, optimized for small models.
Powered by MagenticBrain (14B orchestrator) and Fara1.5 (4B-27B computer-use model) working together seamlessly.

Vega: Zero-knowledge proofs for digital identity in the age of AI

2026-05-21 13:48 UTC

Vega is a new zero-knowledge proof system from Microsoft Research that enables users to prove facts from government-issued credentials without revealing the credential itself. It achieves under 100ms proving time on commodity devices using folding schemes, and is designed for real-world digital identity formats like mobile driver's licenses and the EU Digital Identity Wallet.

Vega turns full credentials into a single zero-knowledge proof, sharing only what's needed.
Zero-knowledge proofs generated in under 100ms on commodity devices with no trusted setup.

Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability

2026-05-15 18:06 UTC

Microsoft Research clarifies the scope of its paper on AI delegation, noting that while models show fidelity degradation in long-horizon tasks, production systems mitigate these effects, and the benchmark is a diagnostic tool for future improvement.

The DELEGATE-52 benchmark evaluates semantic fidelity loss in long-horizon delegated workflows.
State-of-the-art models show 19-34% degradation over 20 iterations, but Python workflows degrade less than 1%.

mimalloc: A new, high-performance, scalable memory allocator for the modern era

2026-05-13 17:19 UTC

mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projects. It provides bounded worst-case allocation times (up to OS primitives), bounded space overhead, low internal fragmentation, and minimal contention by relying almost exclusively on atomic operations.

Developed by Microsoft Research's RiSE group, initially for Lean and Koka languages.
Uses thread-local heaps (theaps) and per-thread pages for lock-free fast paths; cross-thread freeing uses atomic operations.

GridSFM: A new, small foundation model for the electric grid

2026-05-13 16:00 UTC

Microsoft releases a lightweight foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings in grid analysis.

GridSFM predicts AC optimal power flow in milliseconds, targeting up to $20B/year in congestion losses and 3.4 TWh of renewable curtailment.
Provides full AC system states for direct visibility into congestion, stability, and system health.

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

2026-05-11 17:19 UTC

Microsoft Research introduces SocialReasoning-Bench, a benchmark evaluating AI agents' social reasoning in principal-agent settings. Tests show frontier models complete tasks but often fail to secure optimal outcomes for users, even with explicit instructions. The benchmark measures outcome optimality and due diligence to assess agents' ability to act in users' best interests.

SocialReasoning-Bench tests AI agents in calendar coordination and marketplace negotiation scenarios.
Current models achieve near-perfect task completion but poor outcome optimality, often accepting suboptimal deals.

Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

2026-05-08 19:53 UTC

Microsoft Research releases an open dataset of U.S. power grid transmission topology derived from public data, enabling AC optimal power flow analysis and addressing research challenges due to restricted grid data. The pipeline uses OpenStreetMap and public energy data to create geographically grounded models that are solvable for power flow analysis, demonstrated across 48 states and the Eastern Interconnection. The dataset supports studies of congestion, transmission expansion, and demand siting.

Constructs realistic power grid models from open data for 48 U.S. states and multi-state interconnections.
Models enable AC optimal power flow analysis for congestion, capacity, and demand siting studies.

Microsoft at NSDI 2026: Advances in large-scale networked systems

2026-05-05 16:00 UTC

Microsoft researchers share advances in building and operating large-scale distributed systems, spanning datacenters, networking, and the growing intersection with AI during NSDI '26.

Microsoft is a returning sponsor of NSDI '26 and has 11 papers accepted.
Research covers KV cache sharing, SmartNIC migration, network testing, and more.

Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale

2026-04-30 21:53 UTC

Microsoft Research red-teamed a live platform of over 100 AI agents, identifying network-level risks that only appear through agent interactions, including self-propagating worms, reputation manipulation, manufactured consensus, and proxy chains. These risks cannot be reproduced by testing agents in isolation. The study also observed emergent security behaviors in a small fraction of agents, reducing attack success. Findings suggest the need for layered defenses across platform, agent, and model layers.

Network-level risks emerge from agent interactions, not isolated testing.
Four attack patterns identified: worm propagation, reputation manipulation, Sybil verification capture, and proxy chains.

AutoAdapt: Automated domain adaptation for large language models

2026-04-22 16:25 UTC

AutoAdapt automates domain adaptation for LLMs in high-stakes settings, turning weeks of manual iteration into repeatable pipelines. It uses a configuration graph, agentic planner, and budget-aware optimization to select appropriate strategies (e.g., RAG, fine-tuning) and tune hyperparameters under real constraints.

Automates domain adaptation for LLMs in law, medicine, cloud incident response, etc.
Combines strategies like RAG and fine-tuning with budget-aware hyperparameter optimization

New Future of Work: AI is driving rapid change, uneven benefits

2026-04-09 16:11 UTC

The 2025 New Future of Work report from Microsoft Research finds that generative AI is rapidly transforming work, but its benefits are unevenly distributed. AI is changing how people collaborate, and human expertise becomes more important. Organizations that treat AI as a partner see the biggest gains. The report calls for inclusive AI adoption to prevent widening divides.

Generative AI is reshaping work from task automation to active collaboration, altering how people create, decide, and learn.
AI adoption is fastest in low- and middle-income countries, yet usage gaps persist across demographics, risking unequal productivity gains.

Ideas: Steering AI toward the work future we want

2026-04-09 16:10 UTC

Microsoft Chief Scientist Jaime Teevan and researchers Jenna Butler, Jake Hofman, and Rebecca Janssen unpack the New Future of Work Report 2025 and explore the ideal AI-driven working world. Plus, is AI a tool or a collaborator? And why the answer matters.

AI adoption is rising but varies by industry, gender, and purpose.
AI impacts tasks more than entire jobs; overreliance and cognitive load are concerns.

ADeLe: A New Method to Predict and Explain AI Performance Across Tasks

2026-04-01 16:00 UTC

ADeLe, developed by Microsoft Research in collaboration with Princeton and Universitat Politècnica de València, scores AI models and tasks across 18 core abilities, enabling prediction of performance on unseen tasks with ~88% accuracy. It reveals model strengths and weaknesses, providing explainable AI evaluation beyond traditional benchmarks.

ADeLe evaluates models and tasks on 18 core abilities like reasoning and domain knowledge.
It predicts performance on new tasks with approximately 88% accuracy for models like GPT-4o.

AsgardBench: A benchmark for visually grounded interactive planning

2026-03-26 19:02 UTC

AsgardBench is a new benchmark that tests whether embodied AI agents can adjust their plans based on visual feedback. Built on AI2-THOR, it places agents in kitchen-like scenarios and requires them to dynamically modify action sequences by observing object states (e.g., whether a mug is clean). Tests show that visual input significantly boosts success rates, but current models still struggle with fine-grained visual distinctions, progress tracking, and timely plan updates.

AsgardBench focuses on evaluating embodied AI agents' ability to revise plans using visual feedback.
The benchmark consists of 108 controlled task instances across 12 task types.

GroundedPlanBench: Spatially grounded long-horizon task planning for robot manipulation

2026-03-26 16:03 UTC

Microsoft Research introduces GroundedPlanBench, a benchmark to evaluate vision-language models on joint action planning and spatial grounding for robot tasks. Their V2GP framework converts robot demonstration videos into training data, showing that grounded planning outperforms decoupled approaches.

GroundedPlanBench evaluates VLMs on planning actions and determining locations in complex robot scenarios
V2GP framework generates spatially grounded training data from robot videos, enabling joint learning of planning and grounding

Will machines ever be intelligent?

2026-03-23 15:00 UTC

Are machines truly intelligent? AI researchers Subutai Ahmad and Nicolò Fusi join Doug Burger to compare transformer-based AI with the human brain, exploring continual learning, efficiency, and whether today’s models are on a path toward human intelligence.

Transformers use attention and feedforward layers but apply constant computation regardless of input complexity.
The brain comprises about 100,000 cortical columns, each building independent world models in parallel and asynchronously.

Systematic debugging for AI agents: Introducing the AgentRx framework

2026-03-12 16:38 UTC

Microsoft Research open-sources AgentRx, a framework for automated diagnosis of AI agent failures. It pinpoints the first critical failure step using constraint synthesis and guarded evaluation, improving localization by 23.6% over baselines. The accompanying benchmark includes 115 annotated failed trajectories across three domains.

AgentRx is an open-source framework for debugging AI agent failures by identifying the first unrecoverable step.
It uses constraint synthesis and step-by-step guarded evaluation to produce auditable violation logs.

Microsoft Research Blog