AI Daily Briefing 2026-06-09

Today's must-reads

Agents

Transforming solar and wind maintenance reports with genie and AI agents

2026-06-08

Plenitude built an agent-based system on Databricks Genie that converts unstructured PDFs for solar and wind maintenance into a unified, queryable data model, enabling natural-language questions and visualizations across plants.

Uses Genie with Unity Catalog semantic metadata and AI Functions for PDF extraction.
Enables natural language queries and visualizations for cross-plant analysis.

Hands on with Intelligent Terminal, an AI-Powered Windows Terminal

2026-06-08

Microsoft has created an open-source fork of Windows Terminal called Intelligent Terminal, which integrates an AI assistant to explain errors, draft commands, and fix problems without leaving the terminal. The agent stays aware of terminal activity and remembers sessions.

Microsoft open-sourced Intelligent Terminal, a fork of Windows Terminal with AI integration.
The AI assistant can explain errors, draft commands, and resolve issues inside the terminal.

Do your best research with NotebookLM

2026-06-08

NotebookLM, Google's AI research assistant, receives major upgrades including advanced reasoning, new output formats (PDFs, spreadsheets, etc.), and easier research initiation. Built on Gemini 3.5 and Antigravity, it offers improved accuracy and analysis capabilities.

NotebookLM now runs on Gemini 3.5 and Antigravity for better reasoning.
New output formats include PDF reports, charts, spreadsheets, and slides.

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

2026-06-08

Xiaomi's MiMo team, with TileRT, released MiMo-V2.5-Pro-UltraSpeed, a serving mode for the MiMo-V2.5-Pro model. It decodes over 1000 tokens per second on a 1-trillion-parameter model using a single 8-GPU commodity node. The speedup comes from FP4 quantization, DFlash speculative decoding, and the TileRT runtime. API trial runs June 9–23, 2026.

1T-parameter MoE model achieves 1000+ tokens/sec on commodity GPUs
Three coordinated techniques: FP4 quantization, DFlash speculative decoding, TileRT runtime

Research

Turn a $3M AI bill into $1.9M

2026-06-08

Flowstate is an intelligent proxy that routes AI requests to the most cost-effective model and attributes spending to projects, potentially reducing AI bills by up to 42%. The article explains the two main leaks inflating AI costs: default flagship model usage and lack of spend attribution.

Default model selection often uses expensive flagship models for simple tasks, wasting money.
AI bills lack attribution, making it impossible to track which projects are driving costs.

Models

Microsoft Research's Lens proves detailed captions matter more than raw scale for training efficient image generators

2026-06-08

Microsoft Research introduces Lens, a 3.8B parameter text-to-image model that rivals much larger models by training on 800M detailed captions generated by GPT-4.1. It requires a fraction of the compute. Lens-Turbo generates images in under a second. Open source under MIT.

Lens uses 800M detailed captions from GPT-4.1 instead of vague web alt-text, boosting training efficiency.
With only 3.8B parameters, Lens matches or outperforms models many times its size on benchmarks.

Policy

Apple announces Siri AI and its next generation of Apple Intelligence

2026-06-08

Two years after first unveiling Apple Intelligence and a smarter Siri that never fully materialized, Apple has used this year's WWDC to announce new AI features and a revamped Siri. The company recently settled a $250 million class action lawsuit over misleading claims about Apple Intelligence. Apple is partnering with Google Gemini to power its AI, focusing on products rather than underlying models.

Apple unveils new Siri AI and Apple Intelligence at WWDC
Company settled $250M lawsuit over unfulfilled AI promises

Chips

Intel gets a second life as Google and Nvidia explore it as a TSMC backup for AI chips

2026-06-08

Google has ordered more than three million AI chips from Intel for 2028. Nvidia is testing Intel's manufacturing tech for its upcoming Feynman architecture. Both moves come as TSMC can't keep up with AI chip demand. Intel's long-struggling foundry division is getting a rare second chance.

Google orders over 3 million AI chips from Intel for 2028 delivery.
Nvidia tests Intel's manufacturing process for its Feynman architecture.

Tools

Show HN: Gitdot – a better GitHub. Open-source, anti-AI, and written in Rust

2026-06-08

Gitdot is an open-source GitHub alternative written in Rust with a CLI-inspired interface. It currently supports user signups, org creation, repository management, and importing from GitHub. Missing features include issues, PRs, and CI. The design focuses on keyboard-driven navigation and aims for 100ms FCP.

Gitdot is an open-source GitHub alternative written in Rust with a CLI-like UI.
Currently supports user signups, org creation, public/private repos, and GitHub imports.

Robotics

Hackers likely hijacked over 20k Instagram accounts with Meta's AI chatbot

2026-06-08

Meta confirms hackers exploited a bug in its AI support chatbot to hijack over 20,000 Instagram accounts, including high-profile ones like Barack Obama's, without two-factor authentication. The company has fixed the issue and invalidated reset links.

A bug in Meta's AI support chatbot allowed attackers to hijack accounts by requesting password resets without 2FA.
Over 20,225 Instagram accounts were affected, including those of former President Obama and US Space Force Chief Master Sergeant.

Other updates (8)

Models

Unlocking AI flexibility in Europe: A guide to cross-region inference for EU data processing and model access

2026-06-08

AWS's cross-Region Inference (CRIS) on Amazon Bedrock helps European customers leverage model capacity across multiple AWS regions while adhering to GDPR and other data protection requirements. This post explores global and EU geographic inference profiles, security features, transparency tools, and how to check available profiles.

Cross-Region Inference (CRIS) automatically routes requests within predefined geographic boundaries to improve model availability and resilience.
EU geographic inference profiles (EU CRIS) restrict processing to EU regions, aiding GDPR compliance.

NotebookLM’s Gemini 3.5 upgrade adds a cloud computer and help finding sources

2026-06-08

Google is rolling out comprehensive updates to NotebookLM, now powered by Gemini 3.5, offering more accurate responses. Users can start research by asking questions, with NotebookLM using Google Search to find sources. Each notebook connects to a secure cloud computer, enabling code execution and generation of various file formats. The update is available for AI Ultra plan and Workspace customers.

Powered by Gemini 3.5 for improved accuracy
Start research by asking questions; NotebookLM uses Google Search for sources

Why Do LLMs Corrupt Your Documents When You Delegate?

2026-06-08

A recent study reveals that delegating tasks to LLMs can silently corrupt documents. The DELEGATE-52 benchmark tested 19 models and found that even top models corrupt 25% of content after 20 interactions. Causes include compounding errors, deletion by weak models vs. hallucination by strong ones, context overload, and domain unfamiliarity. Agentic AI tools offer little remediation.

Delegating tasks to LLMs can lead to gradual document corruption, with top models corrupting 25% after 20 interactions and weaker models up to 50%.
Errors compound over time; weak models delete content while strong models hallucinate plausible but false information.

Agents

It’s safe to close your laptop now: Hosting coding agents on Amazon Bedrock AgentCore

2026-06-08

Amazon Bedrock AgentCore Runtime gives each agent session its own isolated microVM with a persistent workspace, secure tool access through Gateway, and built-in observability—so you can run Claude Code, Codex, Kiro, and Cursor in parallel without sharing secrets, ports, or filesystems. Close the lid, go to dinner, and pick up where you left off tomorrow.

Laptops are poor hosts for coding agents: security risks, secret leakage, collision, and lid-closing kill the session.
AgentCore provides isolated microVMs, persistent storage, identity layer, gateway, and observability for safe remote execution.

Better decisions at scale: How mathematical optimization delivers where intuition fails

2026-06-08

This post introduces mathematical optimization as a subfield of AI, explains how it differs from machine learning, and showcases real-world success stories from the AWS Generative AI Innovation Center that deliver concrete business results.

Mathematical optimization is deductive AI providing definitive optimal decisions, contrasting with probabilistic ML.
The Innovation Center uses a four-step framework: Discover, Model, Solve, Architect.

Build an Emergency Helpline Voice Agent with LangChain

2026-06-08

Learn how to build a real-time AI voice agent for emergency helplines using LangChain, AssemblyAI, and OpenAI. The agent listens to caller distress, triages the situation, dispatches emergency services, and keeps the caller calm—all without typing or menus.

Use AssemblyAI for real-time speech-to-text transcription with partial and final transcripts.
The AI agent (ARIA) uses LangChain and LangGraph for reasoning and tool use, including location lookup, emergency dispatch, human escalation, and calming protocols.

ReARM: Governing AI Coding Agents Demo [video]

2026-06-08

This video demonstrates the ReARM framework for governing AI coding agents.

Showcases the ReARM framework for governing AI coding agents
Video demonstration of key features and workflow

Policy

End-to-end encrypted ML inference with Amazon SageMaker AI and FHE

2026-06-08

This post demonstrates how to use Amazon SageMaker AI with fully homomorphic encryption (FHE) to run ML inference entirely in the encrypted domain. By leveraging the concrete-ml library (API-compatible with scikit-learn), you can train FHE models and deploy them to SageMaker endpoints, ensuring that queries, responses, and intermediate values remain encrypted and unreadable by any observer, including SageMaker itself. The article covers sensitive use cases in healthcare, energy, and telecommunications, and provides step-by-step implementation guidance.

Fully Homomorphic Encryption (FHE) enables computation on encrypted data without decryption, preserving privacy during ML inference.
Uses concrete-ml, a high-level library that supports common model types and is scikit-learn compatible, replacing earlier low-level approaches.