Hacker Newsnew | past | comments | ask | show | jobs | submit | tenpa0000's commentslogin

I run Llama 3.2 3B locally for latency-sensitive classification (sub-50ms, so no room for bigger models). At that scale Q2_K vs Q4_K_M isn't just smaller — Q2 starts flipping yes/no answers that Q4 gets right. Not often, but enough to notice in production.

So the KL divergence numbers here are more useful to me than the MMLU tables honestly. I've had MMLU hold steady while the output distribution drifted enough to break things downstream.

Does the calibration dataset make much difference at 3B though? There's so little redundancy that I'd expect it to hit a floor pretty fast regardless of how good the calibration data is.


What do you use for sub-50ms inference?


Could be bank statement line item Classification


For a simple classification task you generally want to prioritize regularization over more sophisticated behavior, so fewer parameters with larger quantization makes sense. For more generic chat-like purposes, Q2 of a larger model may often be preferable to Q4 of a smaller one.


This is great timing — I've been putting off making animated diagrams for a blog post because the Manim setup was too much friction for what I needed.

Tried the live demo and the 3D orbit scene is surprisingly smooth. Curious about a couple things:

- How are you handling the animation interpolation? Manim's rate functions (smooth, there_and_back, etc.) have some quirks that are easy to get subtly wrong. Did you reimplement those from scratch or find a way to match the Python easing curves exactly? - For the py2ts converter — how far does it get on real-world scripts? I have a few older Manim CE scripts with custom VMobjects and I'm wondering if it handles subclassing or if it's more of a "simple scenes only" thing.

One suggestion: it'd be really useful to have an export-to-GIF or export-to-MP4 option directly in the browser (maybe via MediaRecorder API). A lot of the Manim use case is generating assets for slides/posts, not just live playback.


The flip side is enumerable IDs. Back when I was scraping a site for a side project, sequential photo IDs were basically a free sitemap. YouTube's random-ish IDs aren't just branding — they at least make bulk harvesting annoying.


You cannot afford to "bulk harvest" YouTube.


Bulk harvesting doesn't necessarily mean downloading every video, there is still useful metadata. But in this case non-enumerable IDs are on purpose, it would defeat the "unlisted video" system.


.git/hooks is underrated. I have a pre-push hook that runs my test suite — annoying to set up the first time but I've probably avoided a dozen broken CI runs by now.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: