2 Comments
User's avatar
The AI Architect's avatar

Brilliant walkthrough on the decoder architecture. The way you broke down pre-LayerNorm vs post-LayerNorm makes way more sense now, i always wondered why modern LLMs switched to Pre-LN. When I was implementing transformres last year I kept getting gradient issues and didnt realize the LayerNorm placement was key. Really appreciate the dimension tracking alongside the code too.

Dr. Ashish Bamania's avatar

Thank you! I’m glad that you found it helpful!