Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s not reasonably possible (currently?) to get the same performance from a 7 billion parameter model as a 175 billion parameter model with just an additional 6000 lines of finetuning data.


It gets very similar performance, so maybe it is possible. But also when I fine-tune 3.5, its response quality is indistinguishable from when I use the base model.

All of this is to say: this shit is way harder than it needs to be. I'm not an ML engineer but I do know my data and how to get it. Why is it still so hard to specialize a model?


Because we don’t have a clear understanding of how the things work yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: