Once a model is built, it goes through two very different phases: training and inference.
Training is when the model learns — it's fed massive amounts of data and adjusts its internal weights through trial and error. This is compute-heavy and only happens once (or occasionally, with updates).
Inference is when the trained model is actually used. You type a prompt; the model predicts the next word, over and over, until it's done. Inference is faster and cheaper than training, but still resource-intensive for large LLMs.
Engineering lens: Training is like building the engine. Inference is turning the key and driving.
Memory trick: "Train once, infer many times."