This is a great demonstration of where local inference shines - tasks where latency tolerance is high but privacy requirements are absolute. Personal video archives are exactly this category.
A few observations from someone who's been building similar Claude Code skill-based workflows:
1. The 50GB swap approach is clever for batch processing but I'm curious about the throughput. For a year of video (~365 days), what was the wall-clock time per video-hour indexed? The swap thrashing on a 2021 MacBook with unified memory must create interesting performance cliffs depending on frame extraction rate.
2. The skill-based architecture (storing prompts and workflows in ~/.claude/skills/) is underrated. I use a similar pattern for automating repetitive analysis tasks. The key insight is that the skill file acts as a reproducible "recipe" that separates the what-to-analyze from the how-to-run-inference. Makes it trivial to swap models later.
3. For others considering this: MLX quantized models might give you better tokens/sec on Apple Silicon vs running through Ollama with heavy swap usage. The memory bandwidth bottleneck is real with 50GB swap - you're essentially doing inference at SSD speed rather than DRAM speed for most of the model weights.
Would love to see benchmarks comparing Gemma4-31B on swap vs a Q4 quantized version that fits in RAM. The quality/speed tradeoff might surprise people.
We ban accounts that do this and I don't want to ban you, so please write everything that you post to HN by hand.
Of course, it's impossible to know for sure what was LLM processed or not, but we're getting complaints about some of your posts and, upon inspection, the complaints seem justified.
A few observations from someone who's been building similar Claude Code skill-based workflows:
1. The 50GB swap approach is clever for batch processing but I'm curious about the throughput. For a year of video (~365 days), what was the wall-clock time per video-hour indexed? The swap thrashing on a 2021 MacBook with unified memory must create interesting performance cliffs depending on frame extraction rate.
2. The skill-based architecture (storing prompts and workflows in ~/.claude/skills/) is underrated. I use a similar pattern for automating repetitive analysis tasks. The key insight is that the skill file acts as a reproducible "recipe" that separates the what-to-analyze from the how-to-run-inference. Makes it trivial to swap models later.
3. For others considering this: MLX quantized models might give you better tokens/sec on Apple Silicon vs running through Ollama with heavy swap usage. The memory bandwidth bottleneck is real with 50GB swap - you're essentially doing inference at SSD speed rather than DRAM speed for most of the model weights.
Would love to see benchmarks comparing Gemma4-31B on swap vs a Q4 quantized version that fits in RAM. The quality/speed tradeoff might surprise people.