Hacker Newsnew | past | comments | ask | show | jobs | submit | RandyOrion's commentslogin

Check out Fig. 6 in this paper, it shows the comparison between the proposed method and pytorch native FSDP offload method.

No open weights.

Besides, I'm old enough to recall that META has trained a version of LLAMA 4 specifically for LM arena elo benchmaxxing and PR things, and proceeded to release a different version of LLAMA 4.


Thank you Gemma team for releasing small dense VLM(s).

The elo ranking [1] is too good to be true. I don't know why gemma-4-26b-a4b performs better than gemma-4-31b.

Also waiting for more bugfixes in llama.cpp, sglang and vllm to do proper evaluations.

[1] https://arena.ai/leaderboard/text/expert?license=open-source


Please no.

If you want to install APKs directly on Android phones selling in China, you'll face even more draconian restrictions imposed by both Chinese OEMs and Chinese government, e.g., cannot install telegram [1], cannot install VPNs [2], called by local police station after installing VPNs [3], and so on. And you do not have the freedom to even talk about these restrictions freely without getting sued or censored.

[1] https://xcancel.com/whyyoutouzhele/status/168915238841261670...

[2] https://xcancel.com/whyyoutouzhele/status/197843066556268971...

[3] https://xcancel.com/whyyoutouzhele/status/170299205759627676...


Yeah, let's hold Google accountable. Is there a way to practice anti-trust laws?


Thank you for standing against the Android Developer Verification enforced by Google. Now in addition to stopping using Youtube, replacing chrome with ungoogled chromium, I'm moving to de-googled AOSP builds, e.g., lineageOS, insted of stock OEM ROMs.


Wow, just wow.

1.5M records of PRs affected. Does Microsoft copilot ask users for the permission of adding ads inside their PRs before actually doing the thing? Do users show their consents on this matter?

Now EVERYONE can see ads disguised as PRs on GitHub. Does Microsoft asks everyone for the permission of showing ads before actually doing the thing? Do users show their consents on this matter?

Good taste Microslop.


This project shows an interesting automated search for engineering problems that I like to see more.

The experience of utilizing tiered storage (gpu vram, ram, and ssd) is generally poor for a lot of LLM inference engines out there, e.g., llama.cpp, sglang, vllm, etc..

My own experience shows that both weight and KV cache offload to ram on sglang and vllm is unavailable or unusable. Copying extra parameters from documents and adding them to already working commands results in errors. Llama.cpp does support weight offload, but the experience is not pleasant, low pcie (gpu <-> ram) utilization, low gpu utilization, and really low tokens per second.


First, thank you Junyang and Qwen team for your incredible work. You deserve better.

This is sad for local LLM community. First we lost wizardLM, Yi and others, then we lost Llama and others, now we lost Qwen...


Cool. However, one still need CPU to send commands to GPU in order to let GPU do CPU things.


> Cool. However, one still need CPU to send commands to GPU in order to let GPU do CPU things.

Doesn't the Raspberry Pi's GPU boot up first, and then the GPU initializes the CPU?

With this technology, we've eliminated the need for that superfluous second step.


Well, I don't have enough knowledge on the boot process of RPi. However, I do expect that most modern hardware, e.g. x86, do not work like RPi, so your words do not hold in most realistic scenarios, at least for now. Besides, do current GPUs (not only GPUs on RPi) have the ability to self instruct in order to achieve what you said?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: