Hacker Newsnew | past | comments | ask | show | jobs | submit | jzapletal's commentslogin

We're building an open-source tool that makes your screen activity searchable via AI.

Processing hundreds of screenshots/hour forced us to optimize for token costs.

The surprise: send video, not images

- Single screenshot (1698×894): 1,812 tokens

- Same frame in video: 258 tokens (Gemini 2.5) or ~70 tokens (Gemini 3)

- Full 8-hour workday: ~$1-3

Video gives you timestamps for free and compresses well since consecutive frames are nearly identical. We keep costs down by having the LLM write short summaries while running OCR locally for text extraction.


I did this while trying to figure out what to use in our own tool. The task was to analyze around 12,000 screenshots and find recurring manual workflows worth automating.

Results:

- Claude Sonnet 4.6: 8/10, $0.53/run — wins on quality

- Kimi K2.5: 7/10, $0.09/run — 6x cheaper, now my production pick

- GPT-5.2: 6/10, $0.41/run — missed the most obvious patterns, odd

- DeepSeek V3.2: 0/10 — gave me a garbled XML...

Models that flagged a one-time DKIM setup as "recurring automation candidate" got penalized.

Happy to share more if folks find this interesting.


we support local models, just configure the custom endpoint and model name and you're good to go


That's great news! Thanks, I'll check it out


We built a desktop app that takes screenshots as you work, analyzes them with AI, saves the output locally and lets you query this "context" via MCP.

Next thing I thought of was why not plug it into Openclaw. Sure, when I did, it started referencing meetings and action points from last week and suggesting follow ups. It's like 10x more proactive.

I'm gonna test it a bit more but would appreciate feedback/pointers from others with similar use cases.


all of those questions can be answered with https://github.com/deusXmachina-dev/memorylane


but that's nothing to do with planning. I just checked. :|


We built an open-source tool that screenshots your desktop and feeds summaries to Claude/Cursor via MCP.

What surprised us:

- Cost: $0.0002/screenshot (we budgeted 100x more), guess cloud vision APIs got cheap fast

- CPU: 5% (exp. 50%) and laptop stays cool

- Quality: night and day vs local models, we tried running vision locally first and it was mediocre

It works by triggering a screenshot on activity, sending it to a cloud vision model for summarization, then deleting the screenshot and storing only the text in local SQLite. You query it via MCP – "what was I working on before lunch?" and Claude actually knows.


Screen sharing to any remote API is a nonstarter for me. I don’t care if the API claims ZDR; Snowden’s revelations are still echoing. So, I appreciate that the app supports a custom endpoint for local models.

Which local models did you try? GLM-OCR seems like it would excel at this: https://huggingface.co/zai-org/GLM-OCR


I've got it installed with Qwen3-VL-4B running in LM Studio on my MBP M1 Pro. (Yes, the fans are running.) GLM-OCR didn't work because it returns all text on the screen, despite the instructions asking only for a summary.

Screenshots are summarized in ~28 seconds. Here's the last one:

> "The user switched to the Hacker News tab, displaying item 47049307 with a “Gave Claude photographic memory for $0.0002/screenshot” headline. The chat now shows “Sonnet 4.6” and a message asking “What have I been doing in the past 10 minutes?” profile, replacing prior Signal content. The satellite map background remains unchanged."

The satellite map background remains unchanged message appears in every summary (my desktop background is a random Google Maps satellite image that rotates every hour).

I would like to experiment with custom model instructions – for example, to ignore desktop background images.

Earlier in my testing it was sending screenshots for both of my displays at the same time, which was much slower, but now it's only sending screenshots of my main screen. Does MemoryLane only send screenshots for displays that have active windows?

Here's the first test of the MCP server in Claude – https://ss.strco.de/SCR-20260217-onbp.png – it works!


Update: I switched to Qwen3 VL 2B (`qwen3-vl-2b-instruct-mlx@bf16`) which is 2.5× faster than 4B (11s vs 18s per screenshot) and my meager M1 Pro is able to keep up without the fans spinning 100% of the time.


This is great stuff, have you tried with local models? Summarization etc. is easy but I haven’t played with image to text models locally? Any ideas. I can run 32b models fine and for summarization kind of tasks they are extremely good I’d even say more than necessary


Hey, just released a new version with support for local models - you just configure the custom endpoint and model name and it should just work. Let us know what you think:)


I've been trying to automate my real estate business but I honestly find automation so hard.

I know HOW to automate things because there's 100+ tools and they're constantly improving. But I don't know WHAT to automate. Templates like the ones in the link are too general. I need something tailored to my workflows. Don't want an agency and don't have time to investigate what exactly everyone in my company does.

How do you all solve this problem??


Start with your current, biggest pain point, adding the minimal amount of automation that will improve things. Step-by-step things will get better over time but you don't run the risk of wasting a bunch of time and money on an automation that doesn't actual improve things in the end.


I feel like it's going to take a lot of time/effort doing this manually


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: