{
"$type": "site.standard.document",
"canonicalUrl": "https://jacob.blog/notes/next-token-prediction-llms",
"path": "/notes/next-token-prediction-llms",
"publishedAt": "2025-06-03T00:00:00.000Z",
"site": "at://did:plc:ckthoyuvsmkp254fyuinyzb2/site.standard.publication/3mndm6tiamb26",
"tags": [
"ai",
"llm"
],
"textContent": "Large language models are fundamentally next-token (next-word) predictors: a sequence goes in, the model assigns probabilities over the vocabulary, and generation proceeds one token at a time—optionally sampling from the top few candidates for variety rather than always picking the argmax.\n\n- When input–output relationships are highly complex or high-dimensional, linear models fail; neural networks scale to arbitrarily non-linear relationships.\n- A natural-language lexicon is huge (~tens of thousands of “classes”); code vocabularies are smaller, so code LLMs can feel disproportionately capable at similar parameter scales.",
"title": "Next-token prediction (LLMs)"
}