2026-06-30 08:15 UTCIn-site rewrite3 min readUpdated: 2026-06-30 08:29 UTC

Show HN: SlimSnap – mark a screenshot element, get JSON for your coding agent

SlimSnap is a macOS app that converts screenshots into JSON for terminal-based AI coding agents like Claude Code, Aider, and Codex CLI. It captures, annotates, and extracts elements via local OCR, yielding 55–85% fewer tokens than raw images. All processing stays on your Mac—no uploads, no account needed.

SourceHacker News AIAuthor: bickov

SlimSnap. Paste a screenshot into your terminal

GET APP AND PASTE A SCREENSHOT INTO YOUR TERMINAL

Turn any screenshot into JSON your CLI agent can read. They finally have eyes.

Why I feed my coding agent JSON instead of screenshots

1 screenshot = 200 words you didn't type

Claude Code, Aider, Codex CLI read your files, run your tests, write your code. But the moment you want to talk about the UI, you're writing a paragraph to describe what a screenshot would show in a glance. And terminals don't accept images.

SlimSnap closes the gap. Capture, annotate, copy as JSON. Paste it where text goes which is everywhere.

Capture

Hit ⌘⇧S, drag to select any area, release. Native macOS, nothing to install.

Annotate

Arrows, callouts, highlights. Point at the broken thing.

Copy JSON

One click. Paste into Claude Code, Aider, or anywhere text goes.

Vision for tools that can't see images

A screenshot says a thousand pixels. SlimSnap says a few hundred tokens, and they're the right ones.

55% fewer tokens

A screenshot in Claude Code (Sonnet) bills 1,568 vision tokens per paste, capped by the API. SlimSnap JSON of the same screen is about 700 tokens. About 55% fewer per turn on Sonnet, up to 85% on Opus 4.7 and 4.8. More room for code in your context.

Pastes into any agent

Claude Code, Aider, Codex CLI, Cursor, Continue.dev. Text goes everywhere images can't: terminals, SSH sessions, CI logs, git commits. If it takes text, it takes SlimSnap.

Deterministic layout

Every element has a bounding box in normalized 0 to 1 coordinates. Your agent stops guessing where things are.

OCR baked in

Built-in OCR reads every label, button, and error message in the shot. Your agent sees the words you see.

Stays on your Mac

Capture and OCR run locally. No upload, no account, no server in the loop. Your screens never leave the machine.

Open MIT format

The JSON schema is published on GitHub under MIT. Read it, validate against it, or write your own exporter. View the schema

signup.json318 tokens, ~80% less than the Sonnet image

{ "schema_version": "1.0", "captured_at": "2026-05-19T18:17:46Z", "screen": { "title": "Create your account", "app": "Safari" }, "image": { "width_px": 1440, "height_px": 900, "file": "signup.png" }, "elements": [ { "id": "e1", "type": "label", "value": "Create your account", "bbox": [0.34, 0.18, 0.32, 0.06] }, { "id": "e2", "type": "input", "value": "Email", "bbox": [0.34, 0.34, 0.32, 0.07] }, { "id": "e3", "type": "input", "value": "Password", "bbox": [0.34, 0.46, 0.32, 0.07] }, { "id": "e4", "type": "button", "value": "Sign up", "bbox": [0.34, 0.60, 0.32, 0.07], "color": "#3B82F6" } ], "annotations": [ { "id": "a1", "type": "arrow", "to": "e4", "intent": "highlight" } ], "estimated_tokens": 318 }

Questions

Why not just paste a screenshot into ChatGPT?

You can, and you should, for one-off questions. But terminal agents (Claude Code, Aider, Codex CLI) don't accept images. SlimSnap solves that. It is also cheaper across a long iterative session (about 55% fewer tokens per turn on Sonnet, up to 85% on Opus 4.7 and 4.8) and far more reliable when the agent needs to reason about specific elements rather than vibes.

Does it send my screenshots to a server?

No. OCR runs locally on your Mac. Captures never leave your machine.

What's the actual token saving?

Per Anthropic's vision docs, a single screenshot is downscaled and billed at the API's per-image cap: about 1,568 tokens on Sonnet and Haiku, up to 4,784 tokens on Opus 4.7 and 4.8. A typical SlimSnap export is 600 to 800 tokens. About 55% fewer per turn on Sonnet, up to 85% on Opus. The reduction compounds across long iterative sessions.

Is it open-source?

The JSON schema is open (MIT, on GitHub), and so is the Claude Code skill. The Mac app is closed.

How does the Claude Code skill find my captures?

SlimSnap writes a tiny config file at ~/.slimsnap/config.json on startup and every settings change. It contains your default save folder and filename pattern, nothing else. The skill reads that config, lists the folder, and loads the latest JSON file into the agent's context. No hardcoded paths. If you change where SlimSnap saves, the skill follows. Full walkthrough: The Claude Code skill that turns a screenshot into a fix.

Do I need the Mac app to use the skill?

No. The skill works on any valid SlimSnap JSON file. The Mac app is the easiest way to produce one, but the schema is open MIT so you can hand-write JSON, generate it from another OCR pipeline, or build your own exporter on Windows or Linux. The skill doesn't care where the JSON came from.

Does the JSON lose information compared to the raw image?

For element-level work it wins. The JSON carries the text, position, color, and bounding box of every element, which is exactly what an agent needs to change a specific thing. What it drops is pixel aesthetics: gradient direction, whitespace feel, brand vibe. So for "fix this specific element" loops, JSON is more reliable. For "what should this look like" exploration, paste the raw image. Nothing stops you sending both.

Isn't cropping the screenshot tightly enough?

Cropping doesn't change the cost. The per-image token cap is flat no matter how small the crop, and a crop is still pixels the agent has to interpret. SlimSnap hands over the actual text, elements, and coordinates, so the agent reasons about "the second input in the third card" instead of guessing from a picture.

Does it work on dark mode UI?

Yes. OCR and element detection behave the same in dark mode as in light. Very low-contrast themes are the one edge case worth a sanity check, but dark interfaces like terminals, Slack, and Linear are handled the same way.

Not on Mac?

SlimSnap is Mac-only today. Want it on Windows or Linux? Email [email protected] and tell us which. The more requests we get, the sooner we build it.

Make your terminal agent see. Free, no registration required

GET APP