| 1. | | 10Gb/s Ethernet: what I did to get it working in my home (gilesthomas.com) |
| 232 points by gpjt 8 days ago | past | 173 comments |
|
| 2. | | 10Gb Ethernet: what I had to (re)learn (gilesthomas.com) |
| 1 point by gpjt 8 days ago | past | 1 comment |
|
| 3. | | LLM from scratch, part 33 – what I learned from the appendices (gilesthomas.com) |
| 5 points by gpjt 14 days ago | past |
|
| 4. | | LLM from scratch (32l) – Interventions: updated instruction fine-tuning results (gilesthomas.com) |
| 1 point by gpjt 16 days ago | past |
|
| 5. | | How an LLM becomes more coherent as we train it (gilesthomas.com) |
| 3 points by gpjt 19 days ago | past |
|
| 6. | | LLM from scratch, part 32k – Interventions: gradient accumulation (gilesthomas.com) |
| 2 points by gpjt 21 days ago | past |
|
| 7. | | Provision: LLM-powered server setup from Markdown (provision.sh) |
| 2 points by gpjt 26 days ago | past |
|
| 8. | | LLM from scratch, part 32j – trying to train a better model in the cloud (gilesthomas.com) |
| 2 points by gpjt 27 days ago | past |
|
| 9. | | Writing an LLM from scratch, part 32i – Interventions: what is in the noise? (gilesthomas.com) |
| 1 point by gpjt 29 days ago | past |
|
| 10. | | Writing an LLM from scratch, part 32h – Interventions: full fat float32 (gilesthomas.com) |
| 7 points by gpjt 33 days ago | past |
|
| 11. | | Writing an LLM from scratch, part 32g – Interventions: weight tying (gilesthomas.com) |
| 2 points by gpjt 43 days ago | past |
|
| 12. | | Writing an LLM from scratch, part 32f – Interventions: weight decay (gilesthomas.com) |
| 6 points by gpjt 44 days ago | past |
|
| 13. | | Writing an LLM from scratch, part 32e – Interventions: the learning rate (gilesthomas.com) |
| 3 points by gpjt 57 days ago | past |
|
| 14. | | Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com) |
| 6 points by gpjt 89 days ago | past |
|
| 15. | | Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com) |
| 1 point by gpjt 3 months ago | past |
|
| 16. | | Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com) |
| 2 points by gpjt 3 months ago | past |
|
| 17. | | Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com) |
| 1 point by gpjt 3 months ago | past |
|
| 18. | | Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com) |
| 1 point by gpjt 3 months ago | past |
|
| 19. | | Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com) |
| 2 points by gpjt 3 months ago | past |
|
| 20. | | Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (gilesthomas.com) |
| 1 point by gpjt 3 months ago | past |
|
| 21. | | LLM from scratch, part 29 – using DDP to train a base model in the cloud (gilesthomas.com) |
| 2 points by gpjt 3 months ago | past |
|
| 22. | | LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com) |
| 540 points by gpjt 5 months ago | past | 121 comments |
|
| 23. | | Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com) |
| 1 point by gpjt 6 months ago | past |
|
| 24. | | Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com) |
| 4 points by gpjt 6 months ago | past |
|
| 25. | | Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com) |
| 2 points by gpjt 6 months ago | past |
|
| 26. | | Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com) |
| 1 point by gpjt 6 months ago | past |
|
| 27. | | Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com) |
| 3 points by gpjt 6 months ago | past |
|
| 28. | | Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com) |
| 1 point by gpjt 6 months ago | past |
|
| 29. | | Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com) |
| 254 points by gpjt 6 months ago | past | 10 comments |
|
| 30. | | Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com) |
| 2 points by gpjt 6 months ago | past |
|
|
| More |