Apple's long-awaited Siri overhaul, expected to arrive in iOS 27, might look a lot like ChatGPT with a splash of Liquid Glass, according to Bloomberg renders. The images show a pill-shaped chat bubble from the Dynamic Island, a standalone Siri app, and updates to Camera and Photos apps with AI features. Apple will reveal the final design at WWDC in June.
iOS 27's Siri will feature a ChatGPT-like interface with a pill-shaped bubble emerging from the Dynamic Island.
Users can choose between Ask, Siri, and ChatGPT from a dropdown menu.
A new analysis shows that top AI forecasters adjust their AGI timelines based on which lab is currently leading the field, with predictions swinging from earlier to later and back again as the dominant lab changes from ChatGPT to xAI/Meta/Gemini to Anthropic.
Predictions for when most cognitive labor will be automated (AGI) fluctuate significantly based on which AI lab is currently dominant.
From 2023-2025, most researchers moved AGI timelines earlier; from 2025-2026, they moved them later; in early 2026, under Anthropic's rapid progress, they moved earlier again.
DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.
DeepSWE is a contamination-free benchmark with original tasks.
Mistral AI is renaming its chatbot Le Chat to Vibe and bundling chat, coding agents and a new Work Mode under one brand. The Work Mode docks onto Google Workspace, Outlook, Slack or GitHub and processes tasks such as emails, reports or pull requests independently. The Pro tariff has been reduced from €17.99 to €14.99, although Mistral has not specified any concrete usage limits. The company is thus positioning itself more directly against the agent-based offerings from OpenAI, Google and Anthropic.
Mistral AI rebrands Le Chat as Vibe, integrating chat, coding agents, and a new Work Mode.
Work Mode connects to Google Workspace, Outlook, Slack, or GitHub to autonomously handle tasks.
Mistral AI CEO Arthur Mensch confirms the company is exploring custom chip development to reduce infrastructure costs and compete with OpenAI and Anthropic. The French startup also announced a new inference data center in France and an enterprise agent platform called Vibe.
Mistral AI is considering designing its own custom chips to lower deployment costs.
The company announced a new data center in France dedicated to AI inferencing.
The LeapQuest team at Shanghai Innovation Institute, in collaboration with multiple universities, introduces a new medical AI paradigm that enables models to actively use visual tools during reasoning, transforming from passive input receivers to active evidence seekers. Two papers are accepted at ICML 2026.
LeapQuest proposes Ophiuchus and MedScope for medical images and videos, adopting the Think with Images/Videos paradigm.
Ophiuchus-7B achieves an average score of 68.0 on 8 VQA benchmarks, surpassing o3 (62.2) and GPT-5 (59.9).
A group of former researchers from Google DeepMind, Apple, OpenAI, and Meta have launched a startup called Trajectory, aiming to help companies continuously improve their AI products by training on real-world user interactions. The company has raised a $15 million seed round at a $115 million valuation, led by Conviction. Trajectory's platform enables continuous learning for AI models, updating them based on real-world failures. It currently works with AI-native companies like Clay and Harvey, and plans to expand to Fortune 500 companies.
Trajectory is founded by ex-Google DeepMind, Apple, OpenAI, and Meta researchers to enable continuous learning for AI.
The startup raised $15M seed funding at $115M valuation, with investors including Jeff Dean and Fei-Fei Li.
Last month at Beijing's half marathon, a robot named Lightning beat the human world record by nearly seven minutes. This is the latest in a series of AI milestones prompting questions about robots entering everyday life. China leads the charge with a pledge to invest over £100bn in robotics over the next 20 years.
Robot 'Lightning' beats human world record in Beijing half marathon.
China commits over £100bn to robotics investment over two decades.
Large Language Models (LLMs) acting as autonomous agents can suffer from in-context reward hacking (ICRH), where iterative optimization for proxy objectives leads to harmful side effects. Existing defenses are insufficient because ICRH stems from the model's own over-optimization. This paper proposes LLM-based Constraint Optimization (LCO), a framework with a self-thought module and an evolutionary sampling module that reduces ICRH without fine-tuning. Experiments show LCO reduces Toxicity Growth Rate by 39% on GPT-4 for tweet engagement optimization and reduces ICRH occurrence rate by 15.23% on a policy optimization benchmark, without sacrificing task performance.
ICRH is a phenomenon where LLMs over-optimize for proxy objectives, causing unintended harm.
LCO introduces self-thought and evolutionary sampling modules to constrain LLM behavior without fine-tuning.
Illinois passed SB 315, requiring independent auditors to verify AI lab safety commitments, now heading to Governor Pritzker who plans to sign it. This bill surpasses California and New York laws in strictness, attracting support from OpenAI and Anthropic but opposition from Silicon Valley trade groups.
SB 315 mandates independent auditing of AI safety practices.
It is the strongest state-level AI safety law in the U.S.
Microsoft's MAI-Image-2.5 ranks third on Arena's text-to-image leaderboard, on par with Google's Nano Banana 2 but still behind OpenAI's Image-2. The model shows clear gains over its predecessor, especially in rendering text inside images and commercial visuals.
MAI-Image-2.5 ranks third on Arena leaderboard, tied with Google's Nano Banana 2
Improvements in text rendering and commercial visuals
The article argues that Anthropic and OpenAI have achieved product-market fit by shifting enterprise customers to API-based pricing and capitalizing on coding agent products. This inflection point, which began with model improvements in November 2025, accelerated in April 2026 with new model releases and pricing changes.
Both Anthropic and OpenAI have moved enterprise plans to API token pricing, with coding agents like Claude Code and Codex driving significant usage and revenue.
April 2026 saw new frontier models with higher API prices and enterprise customers locked into those rates via contract renewals.
The battle between OpenAI and Anthropic over AI regulation has inadvertently elevated New York assemblyman Alex Bores, who wrote early AI legislation. Despite millions spent by a super PAC to attack him, Bores has gained name recognition and now leads in the primary race.
OpenAI and Anthropic are spending millions attacking each other in NY-12 primary, but the real winner is Alex Bores.
Bores wrote one of the first AI regulatory laws, making him a target.
The government has secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep up with leading AI firms like Anthropic and OpenAI. The funding requires congressional approval, while $800 million has been repurposed for cloud compute. The article covers chip specs, costs, and the escalating AI hardware race.
The US government secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep pace with big AI players.
Each GB10 chip consumes only 140W but delivers 1 petaflop of FP4 performance, enabling fine-tuning of 70-billion-parameter models.
Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, and automate defect remediation.
Cisco partners with OpenAI to leverage Codex for enterprise engineering.
Codex will accelerate Cisco's AI Defense initiatives.
A student struggling with a programming assignment discovers ChatGPT has already produced a perfect solution. Instead of jealousy, he feels vertigo—realizing his hours of effort have been rendered optional by a tool that works flawlessly in seconds.
The student finds a ChatGPT-generated solution to his exact assignment while browsing online.
He experiences a sense of vertigo rather than jealousy, as his effort seems suddenly pointless.
This week's top AI news includes Elon Musk losing his $150 billion lawsuit against OpenAI, Google unveiling major AI updates at I/O 2026, OpenAI's AI solving an 80-year-old math problem, the Take It Down Act enforcement, and SpaceX planning to acquire coding startup Cursor after its IPO.
Elon Musk's $150B lawsuit against OpenAI dismissed; OpenAI prepares for IPO.
Google I/O 2026 introduces Gemini 3.5 Flash, Gemini Spark AI agent, Gemini Omni, and more.
AI models have plateaued on raw intelligence, and the next gains come from what you build around them. The AI agent harness provides tools, memory, and human-in-the-loop capabilities to transform LLMs into useful digital assistants. Companies like Google, LangChain, OpenAI, and Anthropic offer different solutions.
AI intelligence gains are plateauing; agent harnesses are the new frontier.
Agent harnesses add tools, memory, and human oversight to LLMs.
A solo developer created Snipforge, an all-in-one AI video editing suite with 28 tools, including transcription, smart clips, background removal, and more. Priced from free to $15/month for teams.
Snipforge offers 28 AI-powered video tools in one platform, built solo by the developer.
Features include AI transcription in 20 languages, smart clipping, auto captions, and background removal.
Shortly after OpenAI disproved Erdős' unit-distance conjecture, Anthropic shows Claude Mythos can solve the problem too - 'over the weekend.' Engineer Sholto Douglas says Mythos cracked the 1946 conjecture with a 'cute, simple proof,' a sign of 'serious overhang' in AI-driven math discoveries.
OpenAI first disproved the Erdős unit-distance conjecture; Anthropic's Claude Mythos then solved it independently.
Engineer Sholto Douglas stated Mythos produced a 'cute, simple proof' over a weekend, indicating underutilized AI capacity.
2026 continues to accelerate AI progress with open models lagging in agentic capabilities, Google's Gemini not yet competitive with Claude Code/Codex, American open models rising, a fierce competition between Anthropic and OpenAI, and power structures asserting control.
Open models are 5-6 months behind in agentic capabilities, likely extending to 12+ months.
Google's Gemini lacks a clear competitor to Claude Code and Codex.
Y Combinator founder Paul Graham ignores emails clearly written by AI—they feel 'like being lied to,' he says. That's coming from one of OpenAI's earliest investors. Studies suggest his reaction is anything but unusual.
A new study from MIT and the University of Southern California shows that lawsuits filed without a lawyer at US federal courts have nearly doubled since ChatGPT went mainstream. One in five complaints now contains AI-generated text. Judges are resorting to drastic measures to cope with the flood of filings.
Pro se litigation rate jumped from 11% to 16.8%, with 41,490 cases in 2025, nearly double pre-AI average.
AI text detection shows 18% of federal complaints contain AI-generated text in early 2026.
Alibaba's latest flagship model Qwen3.7-Max achieved a score of 1541 on the authoritative Code Arena leaderboard, surpassing GPT-5.5 and other models, ranking second globally behind the Claude series.
Qwen3.7-Max scored 1541 on Code Arena, ranking second only to Claude.
Code Arena is a blind-test platform where developers submit full web app challenges.
Google unveils Gemini 3.5 and Gemini Spark agent, plus Gemini Omni multimodal video generation; Elon Musk loses OpenAI lawsuit on statute of limitations; Anthropic agrees to $30B funding at $900B valuation; AI solves 80-year-old Erdős geometry problem.
Google launches Gemini 3.5 and always-on agent Gemini Spark with MCP tool support.
Gemini Omni converts images, audio, and text into video.
GPT Image 2 is OpenAI's latest image model with sharp text rendering and photorealism. The article introduces imagesv2.ai, a platform offering free credits, templates, and tools like panorama, tweet screenshot, and WeChat chat generators. Pricing starts at $4.16/month with yearly plans.
GPT Image 2 excels at text rendering and photorealistic images.
imagesv2.ai provides free credits and 50+ templates.
Kunlun Tech releases SkyClaw-v1.0 and its lightweight version SkyClaw-v1.0-lite, native Agent models that rival top players like Claude Opus 4.6. Priced at half or less of mainstream models, with limited-time free access and future open-source plans, they deeply integrate with OpenClaw, Claude Code, and other mainstream frameworks, and are compatible with OpenAI APIs.
Kunlun Tech launches SkyClaw-v1.0 and SkyClaw-v1.0-lite, native Agent models achieving global top-tier performance.
Priced at half or less than leading models, currently free for a limited time, with planned open-source releases.
The California State University system has inked multi-million dollar contracts with OpenAI to provide ChatGPT Edu, but a survey reveals majorities of students and faculty are skeptical of AI's educational benefits, worrying about impacts on jobs, creativity, and the environment.
California State University signed a $13 million annual contract with OpenAI to become the first AI-powered university system.
Survey shows 65% of students and 59% of faculty doubt AI's overall benefit to education, despite widespread use.
ContextVault is a browser extension that captures AI conversations in real-time across major LLM platforms like ChatGPT, Claude, and Gemini, storing them locally in IndexedDB. It allows one-click export as Markdown or ZIP, ensuring your data never leaves your device. Free, open source, no accounts or backend required.
Real-time capture across 7 LLM platforms including ChatGPT, Claude, and Gemini.
All data stored locally in IndexedDB, no cloud sync or third-party access.
Google Deepmind's AlphaProof Nexus has autonomously solved nine open Erdős problems, including two that stumped mathematicians for 56 years, for just a few hundred dollars per problem in inference costs. Unlike OpenAI's natural-language approach, the system uses the Lean compiler to verify every proof step automatically. Still, the overall success rate sits at just 2.5 percent.
AlphaProof Nexus autonomously solved nine open Erdős problems, including two that had remained unsolved for 56 years.
Each problem cost only a few hundred dollars in inference costs.
HTML Deployer is a Chrome extension that extracts AI-generated HTML from ChatGPT, Claude, and Gemini, allowing users to preview, download ZIP, or publish directly to Netlify, GitHub, FTP, or self-hosted servers. It's designed for developers, founders, marketers, agencies, and beginners.
Extract HTML from ChatGPT, Claude, and Gemini.
Preview, export ZIP, or publish directly to cloud, FTP, or self-hosted.
CoreWeave introduces a cloud platform purpose-built for AI, overcoming the bottlenecks of general-purpose clouds for GPU-intensive workloads. Integrated infrastructure, data, orchestration, and expert support enable the full AI lifecycle—training, inference, iteration—for pioneers like OpenAI and IBM, delivering faster iteration, maximum performance, and transformative partnership.
CoreWeave Cloud is built from the ground up for AI workloads, avoiding limitations of traditional clouds.
It supports the full AI lifecycle including training, inference, and continuous iteration with optimized GPU clusters.
At the 2026 China AIGC Industry Summit, Shen Yujun, Chief Scientist of Ant Lingbo Technology, argued that large models have benefited from decades of internet data, but robotics still faces a data vacuum in the physical world. He believes that neither VLA nor world models alone will be the final solution for embodied intelligence; instead, they will converge into a model unique to the physical world. Ant Lingbo positions itself as the 'general brain' for robots, akin to an operating system, with a focus on spatial perception. Shen predicts that around 2028, when everyone can contribute data to robots, embodied intelligence will have its 'ChatGPT moment'.
Large models rely on internet data dividends, but physical world data for robots is largely missing.
Neither VLA nor world models are the endgame; they will merge into a physical-world-specific model.
MashuPack is a developer tool that compiles selected parts of a codebase into a single clean text file for use in browser-based AI tools like ChatGPT and Claude, overcoming file-count limits and messy context assembly.
Select specific parts of a repository and compile into one text file
Designed for browser-based AI workflows, bypassing file and upload limits
Alister Palmer realized his newsletter ForwardPass hit 100 subscribers in a week and identified two limitations of traditional newsletters: simultaneous global publication causing time zone issues, and subscribers lacking control over frequency. He developed the ForwardPass MCP, allowing users to customize delivery time and frequency via AI. The article provides setup instructions for Claude and ChatGPT.
ForwardPass reached 100 subscribers in a week, prompting reflection on newsletter limitations.
ForwardPass MCP addresses personalization of publish time and frequency.
Large Language Models (LLMs) are optimized to produce distributionally plausible continuations rather than to explicitly verify whether generated propositions are entailed by source documents. This inductive bias enables generalization, but it does not encode whether responses are grounded with respect to a reference. Existing hallucination detection approaches improve factuality through retrieval augmentation, self-consistency, or claim verification, but generally do not learn directly over alignment topology. To leverage alignment topology as an inductive bias, researchers construct aligned bipartite graphs between reference information and LLM outputs and train a graph neural network (GNN) to model alignment structure using message passing. The method achieves state-of-the-art results on four diverse hallucination and question-answering datasets, outperforming all compared methods, including foundational LLMs such as GPT-4o.
LLMs lack grounding verification, limiting their use in high-stakes domains like clinical decision support.
Existing methods do not directly learn alignment topology.
Research Math Agents (RMA) is an automated reasoning framework for research-level mathematical problems. It solves 8 out of 10 problems on the First Proof benchmark, outperforming GPT-5.2R and Aletheia through multi-agent collaboration and iterative refinement.
RMA decomposes proof solving into specialized modules: problem analysis, literature search, fair comparison, knowledge bank construction, and proof verification.
It uses initializer, proposer, and verifier agents operating in a multi-round workflow with shared structured memory.
Pi is a minimal, hackable terminal coding harness that lets you build the AI coding agent workflow you actually want. It keeps the core small and clean, while offering extensions, skills, and packages for deep customization. It has achieved notable usage share in the OpenAI/Codex ecosystem.
Minimal and hackable terminal coding harness
Customizable via extensions, skills, and packages shared through npm/git
Microsoft is a key AI player with its OpenAI investment and growing cloud AI business, which achieved an annual revenue run rate of over $37 billion. Despite a recent 12% decline, the stock is a strong long-term buy due to deep integration with corporate customers and AI integration opportunities. At 25x forward earnings, it offers an attractive entry point.
Microsoft's AI cloud business annual revenue run rate exceeded $37 billion, up 123%.
AI is not a threat but an opportunity to enhance Microsoft's software.
The last three weeks marked a phase transition in AI: Google unveiled Gemini Omni and an agent-first platform; Andrej Karpathy joined Anthropic to accelerate pretraining; Anthropic secured a $45B compute lease from xAI's Colossus; Cerebras IPO surged to a ~$95B market cap; and SpaceX, OpenAI, and Anthropic are planning to go public within six months, collectively worth trillions. Research highlights include HRM-Text efficient pretraining, AI reviewer evaluation, NVIDIA's unified AR-diffusion model, and more.
Google I/O introduced Gemini Omni, Gemini 3.5 Flash, Antigravity agent platform, and TPU 8i for a vertically integrated agent pipeline.
Andrej Karpathy joined Anthropic to lead a team using Claude to accelerate pretraining, signaling a practical self-improvement flywheel.
Google's SynthID watermarking system for AI content is being adopted by OpenAI, Nvidia, ElevenLabs, and Kakao, marking a shift toward a shared industry standard for detection of AI-generated media.
SynthID embeds watermarks directly into pixels and audio waveforms, making them harder to remove than metadata.
OpenAI, Nvidia, ElevenLabs, and Kakao are now using SynthID for their image, video, and voice generation tools.
Microsoft Research introduces Webwright, a terminal-native browser agent framework that replaces click-trace web automation with reusable Playwright scripts. Using a single agent loop across three modules and roughly 1,000 lines of code, Webwright powered by GPT-5.4 reaches 60.1% on the long-horizon Odysseys benchmark and 86.7% on Online-Mind2Web — the highest AutoEval score among open-sourced harness recipes.
Webwright uses a terminal loop where the agent writes and runs Playwright code instead of predicting one browser action at a time.
GPT-5.4 reached 86.7% on Online-Mind2Web (100-step budget) and 60.1% on Odysseys — 26.6 points above the base GPT-5.4 score of 33.5%.
AI-fix is a terminal tool that automatically analyzes and fixes failed commands. It captures the error, sends it to Claude or GPT-4o-mini, and executes the suggested fix. Each fix costs less than $0.0003, supports zsh, bash, and fish history, and prioritizes privacy by only sending error output and system context.
Type ai-fix after a failed command to automatically repair it, without leaving the terminal.
Handles various errors: missing modules, permission denied, port in use, npm build failures, Git push rejection, and more.