Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am running fp16 LLaMA 30B (via vanilla-llama) on six AMD MI25s. Computer has 384 GB of RAM but the model fits in the VRAM. It takes up about 87 GB of VRAM out of the 96 GB available on the six cards. Performance is about 1.6 words per second in an IRC chat log continuation task and it pulls about 400W additional when "thinking."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: