AI News HubLIVE

Today's must-reads

Models

Microsoft Build 2026: the 7 biggest announcements

At Microsoft Build 2026, the company announced new hardware, AI models, and developer tools including the Surface RTX Spark Dev Box, Scout assistant, MAI-Thinking-1 reasoning model, and quantum computing advances.

  • Surface RTX Spark Dev Box: a mini PC for local AI development with Nvidia Arm chip and 128GB memory.
  • Scout: an always-on assistant built on OpenClaw, automating calendar, expense, and email tasks.
In-site article

Trump signs executive order to review AI models before they’re released

President Donald Trump signed an executive order Tuesday creating a "voluntary framework" for AI companies to share their frontier models with the federal government before they're released "to promote secure innovation and strengthen the cybersecurity of critical infrastructure."

  • Trump signs executive order establishing voluntary pre-release review of AI models by federal agencies.
  • Companies can choose to share models up to 30 days before public release, with confidentiality protections.
In-site article

Microsoft’s first advanced reasoning AI is here

Microsoft announced MAI-Thinking-1, a new flagship reasoning model trained on clean data from scratch, along with several other models for image generation, transcription, voice, and coding. The move signals Microsoft's growing independence from OpenAI.

  • Microsoft unveils MAI-Thinking-1, its first advanced reasoning AI model.
  • The model was trained from scratch on clean data, not distilled from third parties.
In-site article
Agents

NVIDIA Partners With Microsoft on Unified Stack for Agentic AI Deployment, From Windows Devices to Cloud to Local

At Microsoft Build, NVIDIA and Microsoft announced an expanded partnership to deliver agentic AI across Windows devices, Azure cloud, and local deployments, featuring new hardware, software, and services.

  • NVIDIA and Microsoft unveil RTX Spark and DGX Station for Windows for native agent AI on PCs.
  • NVIDIA Nemotron 3 Ultra and other open models are now available on Microsoft Foundry.
In-site article

Build 2026: Microsoft's MDASH exits preview with 100+ specialized threat-hunting AI agents

Microsoft's Build 2026 security news centers on an agentic AI vulnerability system designed to find real exploitable flaws, connect them to Defender and GitHub, and help developers fix them faster.

  • MDASH uses an ensemble of over 100 specialized AI agents to triage vulnerabilities, prioritizing real risks over noise.
  • It achieved a 96.55% score on the CyberGym benchmark, up from 88.45% last month.
In-site article

What happens when AI starts selling to AI?

AI is already writing sales emails, updating CRM systems, generating proposals, and responding to RFPs. The next phase could be even more disruptive: AI agents negotiating with other AI agents before a human ever joins. This episode explores how AI transforms enterprise sales, procurement, and the enduring importance of human judgment and relationships.

  • AI automates administrative tasks like CRM updates and RFP responses, freeing reps for relationship building.
  • Agentic AI may handle cold outreach and due diligence, but human trust remains critical in enterprise sales.
In-site article

Microsoft's first reasoning model is one of 7 AIs just released at Build - what we know so far

Microsoft unveiled seven new AI models at its Build conference, including its first reasoning model MAI-Thinking-1, a new code model, and updated image, voice, and transcription models. The models emphasize enterprise-grade data, cost efficiency, and watermarking. Microsoft also announced a partnership with Mayo Clinic for healthcare AI.

  • Microsoft released seven new AI models at Build, including its first reasoning model MAI-Thinking-1.
  • MAI-Thinking-1 is a 35-billion-parameter reasoning model trained on enterprise-grade data.
In-site article

4 Nvidia RTX Spark laptops I'm most excited to try - including Microsoft's new Ultra

Nvidia announced the RTX Spark CPU at Computex 2026, targeting laptops from major brands like Microsoft, Dell, Asus, and MSI. The Arm-based chip boasts up to 1 petaflop of AI performance and 128GB unified memory, with models starting this fall priced over $2,000.

  • Nvidia unveils RTX Spark CPU for laptops, competing with Intel, AMD, and Qualcomm.
  • Arm-based chip offers up to 1 petaflop AI performance and 128GB unified memory.
In-site article
Chips

“A successful attack could be catastrophic”: Anthropic gives more groups access to Claude Mythos

Anthropic expands Project Glasswing to approximately 150 new partners, providing early access to Claude Mythos Preview for vulnerability scanning. The AI model has found thousands of high-severity flaws, but critics raise concerns about transparency and independent validation.

  • Anthropic widens Project Glasswing to ~150 new organizations across utilities, healthcare, and hardware sectors.
  • Claude Mythos model has discovered over 10,000 high- or critical-severity vulnerabilities in major codebases.
In-site article
Startups

OpenAI expands Codex with role-specific plugins to build a general-purpose app for non-developers

OpenAI is expanding Codex with role-specific plugins for data analysis, sales, and investment banking. Five million people use the tool each week, and one in five isn't a developer, the company says. That non-developer group is growing three times faster than the developer base, a sign that OpenAI is positioning Codex as an all-purpose work app.

  • OpenAI introduces role-specific plugins for Codex targeting data analysis, sales, and investment banking.
  • Codex has 5 million weekly active users, 20% of whom are non-developers.
In-site article
Other updates (20)
Agents

Anthropic IPO filing marks AI maturing into enterprise utility

Anthropic's IPO signals generative AI's shift from research-focused venture to stable enterprise utility, with implications for pricing, licensing, and market consolidation.

  • Anthropic's IPO aligns engineering goals with enterprise procurement, introducing structured release schedules and pricing.
  • Enterprise users can plan around formalized pricing tiers and API limits, but may face tighter licensing and model deprecation.
In-site article

Microsoft Scout is a new AI personal assistant built on OpenClaw

Microsoft launches Scout, an always-on AI assistant integrated with Microsoft 365, enabling task automation like scheduling and expense reporting. It monitors traffic and calendar, learns from Teams and email, and is built on OpenClaw. Desktop preview available now for US Frontier customers.

  • Microsoft Scout is a new AI personal assistant built on OpenClaw, integrated with Microsoft 365.
  • It can monitor traffic, calendar, and learn from Teams and email to suggest actions.
In-site article

TinyFish Launches BigSet: An Open-Source Multi-Agent System That Builds Structured Live Datasets from Plain-English Descriptions

TinyFish has released BigSet, an open-source multi-agent system that turns plain-English descriptions into structured, exportable datasets. The system infers a schema, dispatches research agents to the live web, deduplicates results, and provides CSV/XLSX downloads with scheduled refresh. Users can describe data in one sentence and get a table in minutes.

  • BigSet accepts natural language descriptions and autonomously builds structured datasets via web research.
  • Multi-agent architecture: schema inference (Claude Sonnet), orchestrator (Qwen), and parallel sub-agents with tool budgets.
In-site article

How GitHub plans to win developers back

GitHub faces unprecedented growth from AI code generation, leading to outages. The company is scaling infrastructure, moving to Azure, and rebuilding core systems to restore reliability.

  • GitHub experienced hundreds of outages in the past year due to unexpected growth from AI tools.
  • The company is scaling to handle 30x current traffic, moving to Azure, and rebuilding core systems.
In-site article

Microsoft really, really, really wants developers to love Windows again

At its Build developer conference, Microsoft announced a slew of new features aimed at developers, including a developer-optimized Windows 11 experience with dark mode on by default, pre-configured tools, native Unix utilities in PowerShell, WSL containers, an Intelligent Terminal with an agent pane, and policy-driven execution containers for running AI agents. The company is also expanding Windows AI APIs to CPUs and GPUs and introducing two on-device AI models. These moves are designed to lure developers away from Mac and Linux by reducing distractions and providing a familiar environment.

  • Microsoft launches a developer-optimized Windows 11 experience with dark mode, fewer distractions, and pre-configured tools like VS Code and GitHub Copilot.
  • New features include native Unix utilities in PowerShell, WSL containers, an Intelligent Terminal with integrated AI agents, and policy-driven execution containers (MXC) for agents.
In-site article

With Intelligent Terminal, Microsoft is reinventing the Windows terminal

Microsoft unveils Intelligent Terminal, an experimental feature that brings AI agents directly into the Windows 11 shell. It supports GitHub Copilot, Claude Code, and other ACP-compatible agents, detects errors, and suggests fixes with a single click, streamlining developer workflows.

  • Microsoft introduces Intelligent Terminal, integrating AI agents into Windows 11 terminal.
  • Supports GitHub Copilot, Claude Code, Codex, and other agent protocols.
In-site article

Introducing Rubrics: Build Agents that Evaluate and Correct Their Work

Deep Agents' RubricMiddleware adds a self-evaluation loop to your agent runs. Set a rubric, configure a grader, and get reliable outputs on tasks where correctness matters.

  • Agents often produce outputs that need multiple attempts to get right.
  • RubricMiddleware lets agents self-evaluate and correct based on a rubric.
In-site article

Microsoft’s Project Solara is an OS for AI agent gadgets

Microsoft announced Project Solara, a new OS for AI agent gadgets at Build 2026. It runs on Android, not Windows. Two concept devices (desk and badge) were shown. Microsoft will not ship them but offers as reference designs. Companies like AccuWeather, Best Buy, CVS Healthcare, and Target plan pilots.

  • Microsoft unveils Project Solara, an Android-based OS for AI agent devices at Build 2026.
  • Two concept devices: a desk display and a wearable badge with camera and fingerprint scanner.
In-site article

AI Vulnerability Intelligence Agent Converts CVEs to Actionable Security Reports

The CVE AI Agent is an autonomous vulnerability intelligence engine that continuously ingests, enriches, and triages CVE data, delivering findings to platforms like n8n, Jira, Slack, Splunk, or local file exports. It features a token-efficient architecture using deterministic minimization logic to filter noise, with prompts averaging 1,000 tokens. The agent follows a strict Two-Pass architecture: Pass 1 extracts all measurable data deterministically, and Pass 2 uses an LLM to fill qualitative sections. It supports multiple LLM providers, including Gemini, OpenAI, Claude, Groq, and Ollama, and offers a web dashboard.

  • CVE AI Agent is an autonomous vulnerability intelligence pipeline designed for SOC-grade, auditable security.
  • Uses a Two-Pass architecture: deterministic engine for data extraction, LLM only for qualitative enrichment, reducing hallucinations.
In-site article

Work IQ is Microsoft's big bet on agent-first enterprise IT, and I have questions

Microsoft's Work IQ could make enterprise AI agents dramatically smarter, but the shift to agent-first IT brings serious questions about cost, governance, data exposure, and operational risk.

  • Microsoft Work IQ redesigns enterprise software for agent-first operations, enabling dynamic data discovery.
  • Agents use getSchema to understand data structure at runtime without predefined models.
In-site article

How to Evaluate Models for Production Coding Agents

This guide explores the gap between LLM coding benchmarks and real-world production performance. It categorizes popular benchmarks (HumanEval, SWE-bench, Aider Polyglot, etc.) and explains what each actually measures. The article presents a five-step evaluation framework: define quality criteria, select matching benchmarks, run internal evaluations, use weighted scoring, and establish ongoing evaluation. It warns against common pitfalls like over-relying on a single benchmark, ignoring execution-based evaluation, and neglecting infrastructure overhead. The key takeaway: internal evaluation sets built from your actual codebase are the most reliable predictor of production success.

  • Benchmark scores often misalign with production performance; interpret critically
  • Different benchmarks test different skills; no single benchmark is sufficient
In-site article

Microsoft created the mini Surface dev box that Qualcomm couldn’t

Microsoft unveils the Surface RTX Spark Dev Box, a mini PC for developers powered by Nvidia's Arm-based RTX Spark chips with 128GB unified memory, capable of running up to 120 billion parameter models locally. Pre-configured with dev tools like VS Code and GitHub Copilot, it replaces Qualcomm's canceled Snapdragon Dev Kit and will be available later this year.

  • The Surface RTX Spark Dev Box features an aluminum chassis that doubles as a heatsink, similar to an Xbox Series X top, with a 100W thermal envelope.
  • It includes 128GB of unified memory, enabling local execution of models up to 120 billion parameters.
In-site article

OpenAI’s Codex adds new tools — Sites, Annotations, more plugins — for knowledge workers

OpenAI announced that 20% of Codex’s 5 million weekly active users are now knowledge workers, prompting new features including Sites for creating interactive websites, Annotations for targeted document editing, and specialized plugins for data analytics, sales, and more.

  • 20% of Codex users are now knowledge workers, not coders.
  • Sites allows creating and sharing interactive websites via URL.
In-site article
Policy

Google’s Phone app will tell you if a scammer is impersonating one of your contacts

Google is launching a new feature for its Phone app that uses end-to-end encrypted RCS to detect AI impersonation scams. It flags calls that appear to be from your contacts but are actually from scammers. The FBI reports Americans lost over $893 million to AI scams in 2025. The feature is default-on for Android 12+ Pixel phones and requires both parties to use Google Phone. Other updates include kids' safety, AirDrop support, AI try-on, and more.

  • Google Phone app now detects scam calls impersonating your contacts using AI voice cloning.
  • FBI reported $893 million lost in 2025 due to AI-powered scams.
In-site article

Object detection with Amazon Nova 2 Lite

This post walks through implementing object detection with Amazon Nova 2 Lite using Amazon Bedrock, AWS Lambda, and API Gateway. Learn to craft prompts, process JSON output, and visualize results. Covers real-world applications in manufacturing, agriculture, and logistics.

  • Amazon Nova 2 Lite detects objects via natural language prompts without training.
  • Deploy a serverless app with Amazon Bedrock, Lambda, and API Gateway.
In-site article

Show HN: Tailor Your Resume to each role with AI

Refer Me launches an AI resume tailoring tool that automatically optimizes your resume for any job description, increasing your chances of passing ATS screening and standing out to employers.

  • AI-powered resume customization based on job description
  • Optimizes for ATS compatibility with keyword matching
In-site article
Tools

Of Hammers and Nails: What AI Can and Cannot Do for a Data Analyst

This article explores the real-world utility and limitations of AI in data analysis. AI significantly speeds up code writing and data asset development, but its ability to answer ad hoc data questions and analyze metric changes suffers from inconsistency (around 86% accuracy) and requires extensive data preparation. AI cannot replace the judgment, context, and institutional knowledge that human analysts provide. The author advocates a balanced approach: leverage AI where it helps, but remain clear-eyed about its shortcomings.

  • AI substantially improves efficiency in writing code and building data assets.
  • AI's accuracy for ad hoc data queries is about 86%, not yet reliable enough to replace dashboards.
In-site article
Models

Microsoft debuts Surface RTX Spark Dev Box to run LLMs without cloud costs

Microsoft unveiled the Surface RTX Spark Dev Box at Build 2026, a compact desktop with Nvidia's Blackwell-architecture RTX Spark processor and 128GB unified memory, delivering 1 petaflop of AI compute. It allows developers to run models over 120 billion parameters locally, challenging the per-token cloud pricing model.

  • Runs AI models exceeding 120 billion parameters locally
  • Packs Nvidia Blackwell RTX Spark processor and 128GB unified memory
In-site article

Mythos and GPT-5.5 Will Find a Lot of Vulnerabilities. Is That Enough?

Frontier AI models like Mythos and GPT-5.5 can uncover real vulnerabilities, but enterprise-ready offensive security requires much more than finding bugs, including coverage, validation, safety, governance, and operational integration.

  • AI models can find vulnerabilities, but enterprise defense demands full coverage and validation.
  • Multi-step reasoning, persistent coverage, and safety are key challenges for AI security systems.
In-site article

Trump signs executive order seeking early access to new AI releases

Trump signed an executive order creating a voluntary framework for the federal government to vet powerful AI models before public release, up to 30 days in advance, aiming to tighten control over cybersecurity and national security threats, marking a shift from his deregulatory stance.

  • Trump signs executive order for voluntary pre-release review of AI models by government
  • Tech companies must submit models up to 30 days before public release