2026-06-10站内改写5 min readUpdated: 2026-06-13

SlimSnap

SlimSnap is a free Mac app that converts annotated screenshots into structured JSON, reducing AI token costs and improving element detection accuracy. It includes an open-source schema and a Claude Code skill.

SourceProduct Hunt AIAuthor: Alexander Bickov

SlimSnap: Your AI doesn't know which button you mean | Product Hunt

SlimSnap

Launched this week

Your AI doesn't know which button you mean

261 followers

Your AI doesn't know which button you mean

261 followers

Visit website

Screenshots and screen recording apps

The AI reads your screenshot as a pixel blob and guesses which button you meant. SlimSnap converts the screenshot plus your annotation into structured JSON: every element has coordinates, an ID, and your arrow points at a specific one. Around 700 tokens vs 1,568 raw on Sonnet. Free Mac app. Schema and Claude Code skill are open MIT. Runs entirely on-device.

Overview

Reviews

Alternatives

Built with

Forum

Team

Free

Launch tags:Design Tools•Productivity•Artificial Intelligence

Launch Team / Built With

Forum Threads

p/slimsnap•

23h ago

Which AI tool needs a screenshot auto-loader next?

SlimSnap ships a Claude Code skill that auto-loads your latest screenshot so the agent reads the structured JSON without you pasting anything. That's the "magic" version of the workflow.

For everything else (Cursor, Lovable, bolt.new, Replit AI, claude.ai), the screenshot spec works but you paste the JSON into chat manually. It works, just not as smooth as the Claude Code flow.

Where do you actually paste screenshots into AI tools today? Vote in the comments, ideally with one line on what your screenshot loop looks like there. That tells me which agent gets the next auto-loader.

Don't take this as a date promise. Just trying to figure out which one to start with after the current backlog.

p/slimsnap•

23h ago

Six screenshot/AI requests from launch week. Which one should I ship next?

These came from people who tried the SlimSnap screenshot-to-JSON workflow during launch and asked for something specific. Listing in the order they came in.

Screenshot capture for scrollable content and open dropdowns. Right now the capture clips to the visible window, so if a dropdown or long list is open, parts get cut. Balpreet S flagged this on LinkedIn.

Native Mac app screenshot support (scope verification). Several people asked whether SlimSnap captures Linear / Notion / Figma desktop screenshots, or just browser windows. The answer changes who can use it.

Confidence and overlap indicators on annotations. When the arrow is ambiguous (e.g. drawn between two close buttons in the screenshot), the JSON should signal that. Corey Clark asked on LinkedIn.

Nested element hierarchy in the schema. Current screenshot schema is flat with bbox containment for nesting. Jyoti S Mohanty asked whether to make hierarchy explicit. Schema v2 candidate.

Hybrid mode (JSON + raw screenshot). For users who want the safety net of pixels alongside the structured spec. Martin Zokov asked on X. Optional, would double the per-screenshot token cost.

Windows screenshot support. Multiple people. OCR layer is Mac-native, so this is a real porting project, not a one-line change.

Which one should I build next? Vote in the comments, ideally with one line on why it matters for your screenshot workflow.

I have my own gut ranking but want to see what actual users prioritize.

View all

SocialX

Promoted

Maker

📌

The day I shipped this started with me yelling at Claude Code for the fifth time. I'd pasted a screenshot of a misaligned form. I'd typed "fix this." Claude moved the wrong input. I retyped. Claude moved a different wrong input. I gave up and fixed it manually.

The reason it kept guessing: it was reading raw pixels. It had no way to know which rectangle was the input I meant, so it picked one that looked plausible.

SlimSnap converts the screenshot into a spec the AI can parse element by element. Each element has coordinates, OCR text, color values, and (if you drew an arrow on it) a target reference saying "this one."

It also happens to be ~700 tokens versus the 1,568 raw screenshots cost on Sonnet (up to 4,784 on Opus 4.7+). That part is just bonus.

Open: the JSON schema (MIT, github.com/bickov/slimsnap-schema) and a Claude Code skill that auto-loads your latest capture (MIT, github.com/bickov/slimsnap-skill). The Mac app is closed but free.

Other tools (Cursor, Lovable, bolt.new, Replit, ChatGPT Vision): the spec works, but you paste the JSON into chat yourself. Cleaner than raw images. Not as smooth as the Claude Code auto-loader. Someone with time on their hands could write the equivalent skill for any of them.

A real question: which AI tool do you reach for most when you need to point at something specific on screen? Tells me where to build the next auto-loader.

Report

3d ago

This is a real pain with Claude Code and Cursor. The agent usually understands the general UI, but still touches the wrong element. Does SlimSnap keep enough context when there are multiple similar buttons or inputs on the same screen?

Report

2d ago

Maker

@farrukh_butt1 Yes, exactly the case the schema was built for. Each element gets a unique ID regardless of how visually similar it is to others. OCR text + bbox coordinates + (if present) parent context disambiguate the duplicates. So if there are five "Submit" buttons on the screen, they show up as e_button_5, e_button_8, e_button_11 (or whatever IDs they get), and your arrow annotation points at exactly one of them.

The edge case where it still struggles: identical floating elements with no surrounding container or distinguishing text (rare but possible in canvas-based apps). For 95% of UI work, the ID + bbox + annotation combo holds up.

What kind of UI are you hitting this with most? Cursor with React forms? Claude Code with admin dashboards? Useful for prioritizing where to harden the schema.

Report

2d ago

Would love to see a Windows version!

Report

1d ago

Maker

@umberto_abbatantuono Hearing this a lot today. Windows port isn't in the short-term roadmap (OCR layer is Mac-native, needs a different pipeline), but if there's enough signal it moves up the list. If anyone else here is on Windows and would actually use this, reply to this comment or email [email protected]. That's how I'll prioritize.

Report

1d ago

Maker

One follow-up question for anyone scrolling: when you paste a screenshot into your AI tool (ChatGPT, Claude, Cursor, Lovable, whatever), what's the #1 thing the AI gets wrong about it? Trying to figure out which gap to close next.

Report

2d ago

@bickov I tend to find that sometimes it wants to change too much and then I have to backtrack. Modifying other elements or changing the layout of the thing I’m talking about are what I find the most annoying.

Report

2d ago

Maker

@montverde That's the exact failure mode the target_ref field tries to address. When you annotate the misaligned button and the agent sees annotation.target_ref = e_button_3, it has a stronger anchor for what to touch and what to leave alone. Doesn't eliminate scope creep entirely (the agent still decides whether layout shifts are necessary), but it shifts the default from "rewrite the whole component" toward "fix the specific element referenced."

The backtracking compounds in longer sessions. Which AI tool is this happening most for you? Different agents handle scope differently and that helps me figure out which auto-loader to build next.

Report

2d ago

@bickov I think that’s super helpful, definitely a time saver.

For me, OpenAI was worse for unwanted changes. I use Claude the majority of the time and it still happens but not to the same degree.

Report

2d ago

Maker

@montverde Yeah, that matches what I've seen. Claude tends to respect the "change only this" intent better than GPT does, even before SlimSnap. With the Claude Code skill the loop gets tighter still: it auto-loads the latest capture so you don't even paste the JSON, just type "fix what I marked" and the agent reads the spec.

Curious if you're on Claude Code specifically or claude.ai / API. If it's Claude Code, the skill is at github.com/bickov/slimsnap-skill, MIT, install instructions in the README.

Report

2d ago

@bickov That sounds like it works a lot better then.

I use Claude.ai and Cursor mostly, I prefer it over Claude code.

Report

1d ago

Maker

@montverde The auto-loader skill is Claude Code only right now. For Cursor or claude.ai it's a manual paste step. SlimSnap exports the JSON, you drop it into chat with your prompt. Element refs still work, the agent just doesn't auto-grab the latest capture for you.

Cursor-native skill is on the wishlist if demand shows up. What makes Cursor + Claude.ai your default over Claude Code? That answer shapes which auto-loader I build next.

Report

1d ago

@bickov Got it, thanks!

Report

1d ago

The underlying problem is real, Claude guessing the wrong element from a raw screenshot is a genuine frustration. But the demo might be selling it short: changing a button color is exactly the case where anyone would just open DevTools. The pitch lands harder on complex layouts with 40 overlapping components where "the second input in the third card" means nothing to a pixel reader. Would love to see a demo on a gnarly real-world UI rather than a clean form :)

Report

1d ago

Maker

@keirodev Yeah fair. The form demo is way too clean. Anyone'd just open DevTools for that. Real wedge is exactly your example: 40 overlapping components where "second input in the third card" is the only useful way to point at it. Picked the form because it fits in one screenshot. Wrong asset for selling the real case.

Redoing the demo on something messier is on the list. If you've got a real dashboard you'd want me to throw it at, send a screenshot and I'll post what the JSON comes out as.

Report

1d ago