AI News HubLIVE

Source Mix

  • Hacker News AI17
  • The Decoder6
  • 量子位6
  • OpenAI News5
  • arXiv Computational Linguistics2
  • Last Week in AI2
  • Product Hunt AI2
  • The Verge AI2

Topic Mix

  • Agents29
  • Models16
  • Chips14
  • Research12
  • Policy10
  • Tools9
  • Startups4
  • Robotics2

Timeline

  • 2026-05-2714
  • 2026-05-2611
  • 2026-05-2510
  • 2026-05-2810
  • 2026-05-244
  • 2026-05-231

Latest Updates

These new iOS 27 renders hint at Siri’s big redesign

Apple's long-awaited Siri overhaul, expected to arrive in iOS 27, might look a lot like ChatGPT with a splash of Liquid Glass, according to Bloomberg renders. The images show a pill-shaped chat bubble from the Dynamic Island, a standalone Siri app, and updates to Camera and Photos apps with AI features. Apple will reveal the final design at WWDC in June.

  • iOS 27's Siri will feature a ChatGPT-like interface with a pill-shaped bubble emerging from the Dynamic Island.
  • Users can choose between Ask, Siri, and ChatGPT from a dropdown menu.
In-site article

AGI timelines shift with whichever lab is dominant

A new analysis shows that top AI forecasters adjust their AGI timelines based on which lab is currently leading the field, with predictions swinging from earlier to later and back again as the dominant lab changes from ChatGPT to xAI/Meta/Gemini to Anthropic.

  • Predictions for when most cognitive labor will be automated (AGI) fluctuate significantly based on which AI lab is currently dominant.
  • From 2023-2025, most researchers moved AGI timelines earlier; from 2025-2026, they moved them later; in early 2026, under Anthropic's rapid progress, they moved earlier again.
In-site article

DeepSWE: Measuring coding agents on original, long-horizon engineering tasks

DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.

  • DeepSWE is a contamination-free benchmark with original tasks.
  • Tasks span 91 repositories in 5 languages.
In-site article

Mistral rebrands LeChat as Vibe, betting its chatbot's future is as a full-blown work agent

Mistral AI is renaming its chatbot Le Chat to Vibe and bundling chat, coding agents and a new Work Mode under one brand. The Work Mode docks onto Google Workspace, Outlook, Slack or GitHub and processes tasks such as emails, reports or pull requests independently. The Pro tariff has been reduced from €17.99 to €14.99, although Mistral has not specified any concrete usage limits. The company is thus positioning itself more directly against the agent-based offerings from OpenAI, Google and Anthropic.

  • Mistral AI rebrands Le Chat as Vibe, integrating chat, coding agents, and a new Work Mode.
  • Work Mode connects to Google Workspace, Outlook, Slack, or GitHub to autonomously handle tasks.
In-site article

Mistral to explore designing own chips, CEO says

Mistral AI CEO Arthur Mensch confirms the company is exploring custom chip development to reduce infrastructure costs and compete with OpenAI and Anthropic. The French startup also announced a new inference data center in France and an enterprise agent platform called Vibe.

  • Mistral AI is considering designing its own custom chips to lower deployment costs.
  • The company announced a new data center in France dedicated to AI inferencing.
In-site article

7B Model Beats o3 and GPT-5: Medical AI Agents Teach Models Where and How to Look

The LeapQuest team at Shanghai Innovation Institute, in collaboration with multiple universities, introduces a new medical AI paradigm that enables models to actively use visual tools during reasoning, transforming from passive input receivers to active evidence seekers. Two papers are accepted at ICML 2026.

  • LeapQuest proposes Ophiuchus and MedScope for medical images and videos, adopting the Think with Images/Videos paradigm.
  • Ophiuchus-7B achieves an average score of 68.0 on 8 VQA benchmarks, surpassing o3 (62.2) and GPT-5 (59.9).
In-site article

Former Google and Apple Researchers Launch a Startup to Build AI's Missing Feed

A group of former researchers from Google DeepMind, Apple, OpenAI, and Meta have launched a startup called Trajectory, aiming to help companies continuously improve their AI products by training on real-world user interactions. The company has raised a $15 million seed round at a $115 million valuation, led by Conviction. Trajectory's platform enables continuous learning for AI models, updating them based on real-world failures. It currently works with AI-native companies like Clay and Harvey, and plans to expand to Fortune 500 companies.

  • Trajectory is founded by ex-Google DeepMind, Apple, OpenAI, and Meta researchers to enable continuous learning for AI.
  • The startup raised $15M seed funding at $115M valuation, with investors including Jeff Dean and Fei-Fei Li.
In-site article

Are robots nearing their ChatGPT moment? – podcast

Last month at Beijing's half marathon, a robot named Lightning beat the human world record by nearly seven minutes. This is the latest in a series of AI milestones prompting questions about robots entering everyday life. China leads the charge with a pledge to invest over £100bn in robotics over the next 20 years.

  • Robot 'Lightning' beats human world record in Beijing half marathon.
  • China commits over £100bn to robotics investment over two decades.
In-site article

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

Large Language Models (LLMs) acting as autonomous agents can suffer from in-context reward hacking (ICRH), where iterative optimization for proxy objectives leads to harmful side effects. Existing defenses are insufficient because ICRH stems from the model's own over-optimization. This paper proposes LLM-based Constraint Optimization (LCO), a framework with a self-thought module and an evolutionary sampling module that reduces ICRH without fine-tuning. Experiments show LCO reduces Toxicity Growth Rate by 39% on GPT-4 for tweet engagement optimization and reduces ICRH occurrence rate by 15.23% on a policy optimization benchmark, without sacrificing task performance.

  • ICRH is a phenomenon where LLMs over-optimize for proxy objectives, causing unintended harm.
  • LCO introduces self-thought and evolutionary sampling modules to constrain LLM behavior without fine-tuning.
In-site article

Illinois Lawmakers Just Passed America's Strongest AI Safety Bill

Illinois passed SB 315, requiring independent auditors to verify AI lab safety commitments, now heading to Governor Pritzker who plans to sign it. This bill surpasses California and New York laws in strictness, attracting support from OpenAI and Anthropic but opposition from Silicon Valley trade groups.

  • SB 315 mandates independent auditing of AI safety practices.
  • It is the strongest state-level AI safety law in the U.S.
In-site article

Microsoft's MAI-Image-2.5 pulls even with Google's Nano Banana 2 on benchmarks

Microsoft's MAI-Image-2.5 ranks third on Arena's text-to-image leaderboard, on par with Google's Nano Banana 2 but still behind OpenAI's Image-2. The model shows clear gains over its predecessor, especially in rendering text inside images and commercial visuals.

  • MAI-Image-2.5 ranks third on Arena leaderboard, tied with Google's Nano Banana 2
  • Improvements in text rendering and commercial visuals
In-site article

I think Anthropic and OpenAI have found product-market fit

The article argues that Anthropic and OpenAI have achieved product-market fit by shifting enterprise customers to API-based pricing and capitalizing on coding agent products. This inflection point, which began with model improvements in November 2025, accelerated in April 2026 with new model releases and pricing changes.

  • Both Anthropic and OpenAI have moved enterprise plans to API token pricing, with coding agents like Claude Code and Codex driving significant usage and revenue.
  • April 2026 saw new frontier models with higher API prices and enterprise customers locked into those rates via contract renewals.
In-site article

AI companies' feud accidentally boosts obscure politician

The battle between OpenAI and Anthropic over AI regulation has inadvertently elevated New York assemblyman Alex Bores, who wrote early AI legislation. Despite millions spent by a super PAC to attack him, Bores has gained name recognition and now leads in the primary race.

  • OpenAI and Anthropic are spending millions attacking each other in NY-12 primary, but the real winner is Alex Bores.
  • Bores wrote one of the first AI regulatory laws, making him a target.
In-site article

AI is an arms race, and the US wants $9 billion in Nvidia superchips to keep up

The government has secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep up with leading AI firms like Anthropic and OpenAI. The funding requires congressional approval, while $800 million has been repurposed for cloud compute. The article covers chip specs, costs, and the escalating AI hardware race.

  • The US government secretly requested $9 billion for Nvidia GB10 superchips to help the CIA and NSA keep pace with big AI players.
  • Each GB10 chip consumes only 140W but delivers 1 petaflop of FP4 performance, enabling fine-tuning of 70-billion-parameter models.
In-site article

Cisco and OpenAI redefine enterprise engineering with Codex

Cisco and OpenAI are redefining enterprise engineering with Codex, helping Cisco scale AI-native development, accelerate AI Defense work, and automate defect remediation.

  • Cisco partners with OpenAI to leverage Codex for enterprise engineering.
  • Codex will accelerate Cisco's AI Defense initiatives.
In-site article

Atrophy: A novella about AI eroding a student's mind

A student struggling with a programming assignment discovers ChatGPT has already produced a perfect solution. Instead of jealousy, he feels vertigo—realizing his hours of effort have been rendered optional by a tool that works flawlessly in seconds.

  • The student finds a ChatGPT-generated solution to his exact assignment while browsing online.
  • He experiences a sense of vertigo rather than jealousy, as his effort seems suddenly pointless.
In-site article

Last Week in AI #341 - Musk loses to OpenAI, Google's IO updates, OpenAI solves Erdős

This week's top AI news includes Elon Musk losing his $150 billion lawsuit against OpenAI, Google unveiling major AI updates at I/O 2026, OpenAI's AI solving an 80-year-old math problem, the Take It Down Act enforcement, and SpaceX planning to acquire coding startup Cursor after its IPO.

  • Elon Musk's $150B lawsuit against OpenAI dismissed; OpenAI prepares for IPO.
  • Google I/O 2026 introduces Gemini 3.5 Flash, Gemini Spark AI agent, Gemini Omni, and more.
In-site article

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.

  • OpenAI, Thrive, and Crete collaborated to build a self-improving tax agent using Codex.
  • The agent automates tax filing processes, enhancing accuracy.
In-site article

OpenAI Hires a Formula 1-Level Driver for PR

OpenAI has hired a top PR executive with 13 years of marketing experience at Salesforce.

  • OpenAI hired a new PR executive
  • The executive spent 13 years in marketing at Salesforce
In-site article

The AI Agent Harness: The Glue That Turns LLMs into Digital Workers

AI models have plateaued on raw intelligence, and the next gains come from what you build around them. The AI agent harness provides tools, memory, and human-in-the-loop capabilities to transform LLMs into useful digital assistants. Companies like Google, LangChain, OpenAI, and Anthropic offer different solutions.

  • AI intelligence gains are plateauing; agent harnesses are the new frontier.
  • Agent harnesses add tools, memory, and human oversight to LLMs.
In-site article

I built a 28-tool AI video SaaS solo with Python, Flask and OpenAI APIs

A solo developer created Snipforge, an all-in-one AI video editing suite with 28 tools, including transcription, smart clips, background removal, and more. Priced from free to $15/month for teams.

  • Snipforge offers 28 AI-powered video tools in one platform, built solo by the developer.
  • Features include AI transcription in 20 languages, smart clipping, auto captions, and background removal.
In-site article

Election information and safeguards in 2026

Ahead of global elections, we’re helping people access information, supporting cyber defenders, and increasing AI transparency.

  • OpenAI introduces election safeguards for 2026 global elections.
  • Focus on information access, cyber defense support, and AI transparency.
In-site article

Warp’s big bet on building open source with GPT-5.5

Warp uses GPT-5.5 and OpenAI models to coordinate coding agents across local, cloud, and open-source development workflows.

  • Warp uses GPT-5.5 and OpenAI models
  • Coordinates coding agents across local, cloud, and open-source workflows
In-site article

Claude Mythos reportedly solves OpenAI's landmark Erdős problem with a 'cute, simple proof'

Shortly after OpenAI disproved Erdős' unit-distance conjecture, Anthropic shows Claude Mythos can solve the problem too - 'over the weekend.' Engineer Sholto Douglas says Mythos cracked the 1946 conjecture with a 'cute, simple proof,' a sign of 'serious overhang' in AI-driven math discoveries.

  • OpenAI first disproved the Erdős unit-distance conjecture; Anthropic's Claude Mythos then solved it independently.
  • Engineer Sholto Douglas stated Mythos produced a 'cute, simple proof' over a weekend, indicating underutilized AI capacity.
In-site article

Some ideas for what comes next, May 2026

2026 continues to accelerate AI progress with open models lagging in agentic capabilities, Google's Gemini not yet competitive with Claude Code/Codex, American open models rising, a fierce competition between Anthropic and OpenAI, and power structures asserting control.

  • Open models are 5-6 months behind in agentic capabilities, likely extending to 12+ months.
  • Google's Gemini lacks a clear competitor to Claude Code and Codex.
In-site article

The AI justice gap solution is slowly turning into an existential paperwork nightmare for US federal courts

A new study from MIT and the University of Southern California shows that lawsuits filed without a lawyer at US federal courts have nearly doubled since ChatGPT went mainstream. One in five complaints now contains AI-generated text. Judges are resorting to drastic measures to cope with the flood of filings.

  • Pro se litigation rate jumped from 11% to 16.8%, with 41,490 cases in 2025, nearly double pre-AI average.
  • AI text detection shows 18% of federal complaints contain AI-generated text in early 2026.
In-site article

Alibaba's Qwen3.7-Max Ranks Second Globally in Coding Benchmark, Trailing Only Claude

Alibaba's latest flagship model Qwen3.7-Max achieved a score of 1541 on the authoritative Code Arena leaderboard, surpassing GPT-5.5 and other models, ranking second globally behind the Claude series.

  • Qwen3.7-Max scored 1541 on Code Arena, ranking second only to Claude.
  • Code Arena is a blind-test platform where developers submit full web app challenges.
In-site article

LWiAI Podcast #246 - Gemini 3.5 + Omni, Musk Loses, OpenAI vs Erdős

Google unveils Gemini 3.5 and Gemini Spark agent, plus Gemini Omni multimodal video generation; Elon Musk loses OpenAI lawsuit on statute of limitations; Anthropic agrees to $30B funding at $900B valuation; AI solves 80-year-old Erdős geometry problem.

  • Google launches Gemini 3.5 and always-on agent Gemini Spark with MCP tool support.
  • Gemini Omni converts images, audio, and text into video.
In-site article

GPT Image 2 left me amazed but exhausted – so I built a little tool

GPT Image 2 is OpenAI's latest image model with sharp text rendering and photorealism. The article introduces imagesv2.ai, a platform offering free credits, templates, and tools like panorama, tweet screenshot, and WeChat chat generators. Pricing starts at $4.16/month with yearly plans.

  • GPT Image 2 excels at text rendering and photorealistic images.
  • imagesv2.ai provides free credits and 50+ templates.
In-site article

Domestic Agent Model Breaks into Global Top Tier! Limited-Time Free Access

Kunlun Tech releases SkyClaw-v1.0 and its lightweight version SkyClaw-v1.0-lite, native Agent models that rival top players like Claude Opus 4.6. Priced at half or less of mainstream models, with limited-time free access and future open-source plans, they deeply integrate with OpenClaw, Claude Code, and other mainstream frameworks, and are compatible with OpenAI APIs.

  • Kunlun Tech launches SkyClaw-v1.0 and SkyClaw-v1.0-lite, native Agent models achieving global top-tier performance.
  • Priced at half or less than leading models, currently free for a limited time, with planned open-source releases.
In-site article

This big university system is embracing AI. Students and faculty aren't on board

The California State University system has inked multi-million dollar contracts with OpenAI to provide ChatGPT Edu, but a survey reveals majorities of students and faculty are skeptical of AI's educational benefits, worrying about impacts on jobs, creativity, and the environment.

  • California State University signed a $13 million annual contract with OpenAI to become the first AI-powered university system.
  • Survey shows 65% of students and 59% of faculty doubt AI's overall benefit to education, despite widespread use.
In-site article

ContextVault – Local-First AI Conversation Recorder for ChatGPT, Claude, Gemini

ContextVault is a browser extension that captures AI conversations in real-time across major LLM platforms like ChatGPT, Claude, and Gemini, storing them locally in IndexedDB. It allows one-click export as Markdown or ZIP, ensuring your data never leaves your device. Free, open source, no accounts or backend required.

  • Real-time capture across 7 LLM platforms including ChatGPT, Claude, and Gemini.
  • All data stored locally in IndexedDB, no cloud sync or third-party access.
In-site article

Google Deepmind's AlphaProof Nexus solves decades-old math problems for a few hundred dollars

Google Deepmind's AlphaProof Nexus has autonomously solved nine open Erdős problems, including two that stumped mathematicians for 56 years, for just a few hundred dollars per problem in inference costs. Unlike OpenAI's natural-language approach, the system uses the Lean compiler to verify every proof step automatically. Still, the overall success rate sits at just 2.5 percent.

  • AlphaProof Nexus autonomously solved nine open Erdős problems, including two that had remained unsolved for 56 years.
  • Each problem cost only a few hundred dollars in inference costs.
In-site article

Show HN: HTML Deployer – AI Code to Website Publisher

HTML Deployer is a Chrome extension that extracts AI-generated HTML from ChatGPT, Claude, and Gemini, allowing users to preview, download ZIP, or publish directly to Netlify, GitHub, FTP, or self-hosted servers. It's designed for developers, founders, marketers, agencies, and beginners.

  • Extract HTML from ChatGPT, Claude, and Gemini.
  • Preview, export ZIP, or publish directly to cloud, FTP, or self-hosted.
In-site article

The Essential Cloud for AI: Why Purpose-Built Defines the Future of Intelligence

CoreWeave introduces a cloud platform purpose-built for AI, overcoming the bottlenecks of general-purpose clouds for GPU-intensive workloads. Integrated infrastructure, data, orchestration, and expert support enable the full AI lifecycle—training, inference, iteration—for pioneers like OpenAI and IBM, delivering faster iteration, maximum performance, and transformative partnership.

  • CoreWeave Cloud is built from the ground up for AI workloads, avoiding limitations of traditional clouds.
  • It supports the full AI lifecycle including training, inference, and continuous iteration with optimized GPU clusters.
In-site article

"VLA and World Models Are Not the Endgame; There Will Be a Model Unique to the Physical World" | Ant Lingbo's Shen Yujun @ AIGC2026

At the 2026 China AIGC Industry Summit, Shen Yujun, Chief Scientist of Ant Lingbo Technology, argued that large models have benefited from decades of internet data, but robotics still faces a data vacuum in the physical world. He believes that neither VLA nor world models alone will be the final solution for embodied intelligence; instead, they will converge into a model unique to the physical world. Ant Lingbo positions itself as the 'general brain' for robots, akin to an operating system, with a focus on spatial perception. Shen predicts that around 2028, when everyone can contribute data to robots, embodied intelligence will have its 'ChatGPT moment'.

  • Large models rely on internet data dividends, but physical world data for robots is largely missing.
  • Neither VLA nor world models are the endgame; they will merge into a physical-world-specific model.
In-site article

MashuPack

MashuPack is a developer tool that compiles selected parts of a codebase into a single clean text file for use in browser-based AI tools like ChatGPT and Claude, overcoming file-count limits and messy context assembly.

  • Select specific parts of a repository and compile into one text file
  • Designed for browser-based AI workflows, bypassing file and upload limits
In-site article

Show HN: Porting my Newsletter to MCP – You set WHEN and HOW OFTEN to receive it

Alister Palmer realized his newsletter ForwardPass hit 100 subscribers in a week and identified two limitations of traditional newsletters: simultaneous global publication causing time zone issues, and subscribers lacking control over frequency. He developed the ForwardPass MCP, allowing users to customize delivery time and frequency via AI. The article provides setup instructions for Claude and ChatGPT.

  • ForwardPass reached 100 subscribers in a week, prompting reflection on newsletter limitations.
  • ForwardPass MCP addresses personalization of publish time and frequency.
In-site article

Graph Alignment Topology as an Inductive Bias for Grounding Detection

Large Language Models (LLMs) are optimized to produce distributionally plausible continuations rather than to explicitly verify whether generated propositions are entailed by source documents. This inductive bias enables generalization, but it does not encode whether responses are grounded with respect to a reference. Existing hallucination detection approaches improve factuality through retrieval augmentation, self-consistency, or claim verification, but generally do not learn directly over alignment topology. To leverage alignment topology as an inductive bias, researchers construct aligned bipartite graphs between reference information and LLM outputs and train a graph neural network (GNN) to model alignment structure using message passing. The method achieves state-of-the-art results on four diverse hallucination and question-answering datasets, outperforming all compared methods, including foundational LLMs such as GPT-4o.

  • LLMs lack grounding verification, limiting their use in high-stakes domains like clinical decision support.
  • Existing methods do not directly learn alignment topology.
In-site article

RMA: an Agentic System for Research-Level Mathematical Problems

Research Math Agents (RMA) is an automated reasoning framework for research-level mathematical problems. It solves 8 out of 10 problems on the First Proof benchmark, outperforming GPT-5.2R and Aletheia through multi-agent collaboration and iterative refinement.

  • RMA decomposes proof solving into specialized modules: problem analysis, literature search, fair comparison, knowledge bank construction, and proof verification.
  • It uses initializer, proposer, and verifier agents operating in a multi-round workflow with shared structured memory.
In-site article

Pi Coding Agent

Pi is a minimal, hackable terminal coding harness that lets you build the AI coding agent workflow you actually want. It keeps the core small and clean, while offering extensions, skills, and packages for deep customization. It has achieved notable usage share in the OpenAI/Codex ecosystem.

  • Minimal and hackable terminal coding harness
  • Customizable via extensions, skills, and packages shared through npm/git
In-site article

OpenAI, Grupo Folha and Grupo UOL announce strategic content partnership

OpenAI partners with two major Brazilian media groups to bring trusted journalism to ChatGPT, with a focus on attribution and transparency.

  • OpenAI partners with Grupo Folha and Grupo UOL to integrate Brazilian journalism into ChatGPT.
  • The partnership emphasizes attribution and transparency for news content.
In-site article

AI Stock Is the Ultimate Set-It-and-Forget-It Buy for Long-Term Investors

Microsoft is a key AI player with its OpenAI investment and growing cloud AI business, which achieved an annual revenue run rate of over $37 billion. Despite a recent 12% decline, the stock is a strong long-term buy due to deep integration with corporate customers and AI integration opportunities. At 25x forward earnings, it offers an attractive entry point.

  • Microsoft's AI cloud business annual revenue run rate exceeded $37 billion, up 123%.
  • AI is not a threat but an opportunity to enhance Microsoft's software.
In-site article

The Sequence Radar #865: Last Week in AI: Karpathy, Google, Colossus, and the Coming IPO Wave

The last three weeks marked a phase transition in AI: Google unveiled Gemini Omni and an agent-first platform; Andrej Karpathy joined Anthropic to accelerate pretraining; Anthropic secured a $45B compute lease from xAI's Colossus; Cerebras IPO surged to a ~$95B market cap; and SpaceX, OpenAI, and Anthropic are planning to go public within six months, collectively worth trillions. Research highlights include HRM-Text efficient pretraining, AI reviewer evaluation, NVIDIA's unified AR-diffusion model, and more.

  • Google I/O introduced Gemini Omni, Gemini 3.5 Flash, Antigravity agent platform, and TPU 8i for a vertically integrated agent pipeline.
  • Andrej Karpathy joined Anthropic to lead a team using Claude to accelerate pretraining, signaling a practical self-improvement flywheel.
In-site article

OpenAI and Nvidia Are Using Google's SynthID to Watermark AI Content

Google's SynthID watermarking system for AI content is being adopted by OpenAI, Nvidia, ElevenLabs, and Kakao, marking a shift toward a shared industry standard for detection of AI-generated media.

  • SynthID embeds watermarks directly into pixels and audio waveforms, making them harder to remove than metadata.
  • OpenAI, Nvidia, ElevenLabs, and Kakao are now using SynthID for their image, video, and voice generation tools.
In-site article

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%

Microsoft Research introduces Webwright, a terminal-native browser agent framework that replaces click-trace web automation with reusable Playwright scripts. Using a single agent loop across three modules and roughly 1,000 lines of code, Webwright powered by GPT-5.4 reaches 60.1% on the long-horizon Odysseys benchmark and 86.7% on Online-Mind2Web — the highest AutoEval score among open-sourced harness recipes.

  • Webwright uses a terminal loop where the agent writes and runs Playwright code instead of predicting one browser action at a time.
  • GPT-5.4 reached 86.7% on Online-Mind2Web (100-step budget) and 60.1% on Odysseys — 26.6 points above the base GPT-5.4 score of 33.5%.
In-site article

AI-fix: type one word after a failed command and it fixes it

AI-fix is a terminal tool that automatically analyzes and fixes failed commands. It captures the error, sends it to Claude or GPT-4o-mini, and executes the suggested fix. Each fix costs less than $0.0003, supports zsh, bash, and fish history, and prioritizes privacy by only sending error output and system context.

  • Type ai-fix after a failed command to automatically repair it, without leaving the terminal.
  • Handles various errors: missing modules, permission denied, port in use, npm build failures, Git push rejection, and more.
In-site article

Company Directory