Cross-Entropy Loss in LLMs, Explained Visually
A visual guide to understand how LLMs are trained using the cross-entropy loss, step by step.
LLMs are trained to predict next tokens well.
Given all the classes/ tokens in the vocabulary, an LLM is trained to pick the right one at each training step. This is essentially a Multi-class classification problem in machine learning.
The Categorical Cross-entropy loss (or simply the Cross-entropy loss) is used to train an LLM to solve this multi-class classification problem.




