Jacob Bennett

Next-token prediction (LLMs)

Jacob Bennett June 3, 2025

Large language models are fundamentally next-token (next-word) predictors: a sequence goes in, the model assigns probabilities over the vocabulary, and generation proceeds one token at a time—optionally sampling from the top few candidates for variety rather than always picking the argmax. - When input–output relationships are highly complex or high-dimensional, linear models fail; neural networks scale to arbitrarily non-linear relationships. - A natural-language lexicon is huge (~tens of thousands of “classes”); code vocabularies are smaller, so code LLMs can feel disproportionately capable at similar parameter scales.

Discussion in the ATmosphere