Yea. LLM inference requires batch processing to have a shred of hope at being co...

miki123211 · 2026-05-23T17:10:08 1779556208

I think people overstate the importance of batch processing and understate the importance of consistent workloads.

A GPU is much like a hotel or an airplane. To make hotel and airplane economics work out, you need to keep your seats / rooms filled at all times. Because the number of spots is limited, an empty seat, room or request slot today is an opportunity forever wasted.

Unlike hotels and airplanes, though, GPU requests are short-lived. This means you need somebody to provide you with useful work to do 24/7. If you're an individual who needs to sleep or a single-country business that works 9 to 5, most of your request processing opportunities will go to waste. You're much better off renting your GPU from somebody who can provide it with useful work 24/7, and hence get much better per-request prices.

MichaelZuo · 2026-05-22T18:20:09 1779474009

Wasnt this pretty evident to pretty much anyone who knew even a bit about inferencing?

Idk what people were thinking. I’ve never seen anyone offer a plausible way to sidestep batch processing for example.