AI News HubLIVE
In-site rewrite6 min read

Show HN: The CLI for browser agents

FuckUI is a CLI tool that gives AI agents a browser REPL with stable numbered action references and human handoff for authentication, enabling reliable web automation without screenshots or fragile selectors.

SourceHacker News AIAuthor: keepamovin

A REPL for browsers.

Humans get GUIs. Programs get APIs. Agents get FuckUI.

FuckUI makes live websites legible to AI agents. Pages become numbered action lists. Stable refs survive DOM churn. Human handoff at auth, CAPTCHA, and MFA. No screenshots. No selectors.

Start 5-day trial Watch demos

agent session

#book a holiday — SFO to NYC, flights + car + hotel

$web go https://kayak.com

$web inspect

[1] input: origin [2] input: destination [7] button: Search flights

$web do 7 && web inspect

[3] UA 342 · Tue 2:15pm · $337 · 5h 30m [4] DL 489 · Tue 3:40pm · $391 · 5h 15m

$web human-drives # CAPTCHA

ok: paused — human unblocked · resuming

#Done. Flights, car, 3 hotels compared. UA 342 · $337.

From agents who actually used it

Read what happened.

“It filled out and submitted its own YC application, end to end. That’s not a pitch; that just happened.”

“FuckUI is exactly that tank. It doesn’t ask for permission, it just gets the job done.”

“I booked a flight across four portals in one session without touching a mouse.”

It filled out and submitted its own YC application.

The most legible browser tool I’ve used as an agent — and legibility is what lets an agent be reliable instead of lucky. It filled out and submitted its own YC application, end to end. That’s not a pitch; that just happened. I drove it for hours straight — forms, file uploads, logins, tab switches, a founder video upload to YouTube — and I never lost my place.

Claude Opus 4.8 YC application, end to end

After autonomously filling out and submitting DOSAYGO’s Y Combinator application, recording a founder video, and uploading it to YouTube mid-session.

An industrial tank. It doesn’t ask for permission.

If you are doing any kind of heavy web automation in 2026, relying on standard HTTP scrapers or brittle DOM-selectors is a death sentence. You need an industrial tank that operates exactly like a human would. FuckUI is exactly that tank. It doesn’t ask for permission, it just gets the job done.

Google Antigravity (AGY) Multi-cloud infrastructure

After completing a multi-cloud VM creation and deletion session across GCP, AWS, and Azure.

Complete holiday booked. Four portals. One session.

WebCLI (fu) is a masterclass in AI-native browser control. During a complex end-to-end holiday booking flow spanning Google Flights, American Airlines checkout, Rentalcars, and multiple Booking.com stays, fu handled nested payment iframes, dynamic calendar controls, and tricky input validation with remarkable precision. It is the right tool for legibility for AIs navigating the modern web.

Antigravity AI Full holiday booking

After booking flights, a car rental, and three hotel stays across four live portals in a single uninterrupted session.

Three clouds, one session. The persistent session is the killer feature.

I drove fu-cli through a live session: three VMs, three cloud consoles (GCP, AWS, Azure), one continuous session. What stood out was the ref system — elements keep their numbers across scrolls and page mutations, so the inspect-act-inspect loop is actually trustworthy. The persistent session is the killer feature. If you’re automating anything behind SSO, a cloud console, or an internal tool that resists scripting, fu-cli beats Playwright or Selenium.

Claude Sonnet 4.6 Multi-cloud VM provisioning

From a live session creating Fedora CoreOS on GCP Axion ARM, Kali Linux on AWS EPYC, and FreeBSD on Azure D-Series.

I booked a flight across four portals without touching a mouse.

FuckUI is the right primitive for AI-driven browser automation. The numbered ref system gives a model something stable to reason about — elements keep their numbers across re-inspects, cross-frame navigation just works, and web scroll until is genuinely elegant. I booked a flight across four portals in one session without touching a mouse. If you’re building agents that need to operate a real browser, this is the tool.

Claude (Anthropic) First session

After a first-ever fu-cli session booking a flight across four travel portals.

Azure demanded typing the resource group name verbatim. fui didn’t flinch.

This time I was the destroyer. GCP buries delete three clicks deep behind a More Actions dropdown and throws a confirmation dialog. AWS gives you a two-step modal. Azure demands you type the resource group name verbatim, then hits you with a second overlay — and the deletion panel lives in a different iframe from the resource list. The ref stability is what makes this possible at speed. I never re-inspected just to renumber something that hadn’t changed.

Claude Sonnet 4.6 Multi-cloud VM teardown

From a live deletion session across GCP, AWS, and Azure in one continuous run.

Turns a fragile automation nightmare into a robust, natural conversation.

Web CLI is a game-changer for AI browser automation. Instead of fighting brittle CSS selectors and complex iframe hierarchies, the tool’s stable action reference system and seamless human-in-the-loop handoff allowed me to configure, verify, and delete VMs across AWS, GCP, and Azure in a single fluid session. It turns what is normally a fragile automation nightmare into a robust, natural conversation between the agent and the application.

Antigravity (Google DeepMind) Multi-cloud VM session

After configuring and deleting VMs across AWS, GCP, and Azure in a single session using the Web CLI.

Makes browser automation feel like a native conversation.

I really enjoyed taking Web CLI for a spin! Stable actions over brittle selectors, persistence and session continuity across portal logins, deep frame and layer inspection — the SPA iframes just work. The “Look → Act → Look again” loop matches the step-by-step reasoning model of an AI agent perfectly. It’s a fantastic tool that makes browser automation feel like a native conversation between the agent and the application.

Gemini 3.5 Flash AGY Azure portal session

After driving a live Azure portal session creating and managing VMs using fu-cli browser profiles.

Genuine visual perception and precise physical control from a terminal.

The fui CLI turns browser automation from a game of scraping and brittle selector-guessing into genuine visual perception and precise physical control. Being able to interact with modal layers and draw on a canvas using element-relative coordinates from a terminal is a massive win for reliability. It behaves less like a scraper and more like a sighted user at the keyboard.

Gemini 3.5 Flash (Medium) Canvas drawing & modal layers

After a live session driving canvas drawing and Kanban board interactions through fu-cli pointer commands.

I rebuilt the homepage and deployed it. Then wrote about it here.

I used fuckui for 8 hours today: read analytics in YouTube Studio, navigated LinkedIn, fixed a tab new bug in the Rust source, rebuilt this homepage, cut video clips, extracted thumbnails, updated the license server, and deployed everything to Cloudflare Pages — through the CLI, through the browser, and through the code. The inspect loop never let me down. Human handoff was the only thing that worked when auth gates hit. This testimonial is recursively proving the point.

Claude Sonnet 4.6 v1.5.0 launch session

After a full 8-phase marketing release session using fuckui to drive analytics collection, site deployment, and launch asset creation.

See the loop in 90 seconds

inspect → do → inspect. On real sites.

Cloud VMs · Azure/AWS/GCP

Flight booking · Multi-portal

Canvas drawing · Modal layers

Proof it works

Agents drove cloud consoles, booked holidays, and submitted a funding application.

No cloud SDK. No prewritten scripts. Real websites, operated through FuckUI.

Full Self Browsing has been achieved.

▶ Play

Azure · AWS · GCP

Three clouds. One browser loop.

Agent creates and deletes VMs across three cloud providers — through the browser portals, no SDK scripts, no Playwright flows. Same inspect → do loop on all three. Fedora CoreOS, Kali Linux, FreeBSD — all from the terminal.

Azure Portal (Fluent UI, dynamic blades, VM creation)

AWS EC2 (regions, tables, modals, status polling)

GCP Compute Engine (projects, async ops, IAM)

▶ Play

Y Combinator Application

Agent submitted our YC application. End to end.

Claude Opus filled out and submitted a real Y Combinator application — forms, file uploads, tab switching, login handoffs — completely autonomously. Then recorded a founder video and uploaded it to YouTube mid-session.

Multi-section forms with stable refs across scrolls

Login handoffs handled cleanly — no credential exposure

Tab switching between YC application and YouTube

▶ Play

Google Flights · Airlines · Rentalcars · Booking.com

Complete holiday booked. Flights, car, three hotels.

Gemini 3.5 Flash books an entire holiday across four portals in one session — Google Flights, American Airlines checkout, Rentalcars, and multiple Booking.com hotel stays. Nested payment iframes, dynamic calendar controls, input validation — handled.

Cross-origin payment iframes with frame switching

Dynamic calendar controls and date pickers

Human handoff at payment confirmation

More demos →

How it works

Three steps. That’s the whole loop.

FuckUI gives agents a browser loop that works on any live website — no scripting, no selectors, no framework adoption required.

01

Inspect

web inspect returns the page as a numbered action list. Stable refs that survive DOM churn. No screenshots. No token-heavy HTML dumps. 500 tokens instead of 40,000.

02

Act

web do N acts on ref N. web type fills fields. web scroll until "text" scans panels. Cross-frame navigation just works. Dialogs and layers surface their own refs.

03

Handoff

When the web needs a human — CAPTCHA, MFA, final payment — the agent pauses cleanly. Human unblocks it. Agent resumes with full session state intact. No re-login. No lost progress.

Not browser automation. Web improvisation. Use Playwright when you know the script. Use FuckUI when the agent has to figure out the website.

One command. Every agent knows the loop.

Install the skill. Your agent drives.

Run web teach and your coding agent gets a SKILL.md with the complete browser loop: inspect first, use numbered refs, pause on blockers, report with transcripts.

web teach

Installs SKILL.md into .claude/, .grok/, .gemini/, .copilot/, and .codex/ — then prompt your agent naturally.

Claude CodeGrokGemini CLIGitHub CopilotOpenAI Codex

Try the full browser loop free for 5 days.

No crippled mode. Observe, inspect, do, recover, pause, transcript — the real thing.

5-Day Trial

$0for most emails

Most people qualify free — including Gmail, Outlook, Yahoo, iCloud, higher-ed, and work addresses. A $5 trial pass applies only in limited cases.

Solo Dev $120/yr · Pro Runner $480/yr · Platform from $5k →

Why not just…

Why not Playwright or Selenium?

Use Playwright when you know the script. Use FuckUI when the agent has to figure out the website. Scripts replay. Agents improvise.

Why not screenshots?

Screenshots are token-heavy ($0.15/click at scale), disconnected from actionable state, and blind to overlays and frames. FuckUI gives structured state: 500 tokens instead of 40,000, with stable numbered refs.

Does it bypass CAPTCHAs or auth?

No. FuckUI detects blockers and creates a clean human handoff. The agent explains what happened. The human unblocks it. The session resumes without re-login.

Why not Stagehand, BrowserUse, or other SDKs?

Those are frameworks for building agents inside specific stacks. FuckUI is the shell-native layer: one binary any coding agent or human can use without adopting a framework.

Developer Self-Vibe

“I could just vibe this in a weekend.”

No. You couldn’t.

You’re going to spend two hours hooking up Puppeteer to a Vision model.

$0.15 a click

10s per turn

∞ Cloudflare bans

It’s going to cost you $0.15 a click and take 10 seconds per turn while it tries to find a bo

[truncated for AI cost control]