{
  "$type": "site.standard.document",
  "canonicalUrl": "https://jacob.blog/notes/next-token-prediction-llms",
  "path": "/notes/next-token-prediction-llms",
  "publishedAt": "2025-06-03T00:00:00.000Z",
  "site": "at://did:plc:ckthoyuvsmkp254fyuinyzb2/site.standard.publication/3mndm6tiamb26",
  "tags": [
    "ai",
    "llm"
  ],
  "textContent": "Large language models are fundamentally next-token (next-word) predictors: a sequence goes in, the model assigns probabilities over the vocabulary, and generation proceeds one token at a time—optionally sampling from the top few candidates for variety rather than always picking the argmax.\n\n- When input–output relationships are highly complex or high-dimensional, linear models fail; neural networks scale to arbitrarily non-linear relationships.\n- A natural-language lexicon is huge (~tens of thousands of “classes”); code vocabularies are smaller, so code LLMs can feel disproportionately capable at similar parameter scales.",
  "title": "Next-token prediction (LLMs)"
}