External Publication
Visit Post

Cross-architectural runtime probability dynamics in transformer LLMs — two clusters not explained by parameter count

Hugging Face Forums [Unofficial] June 9, 2026
Source

Thanks for this — and you were right.

Ran the temperature normalization test you suggested as priority 1. The result is a partial falsification of the strongest interpretation of the finding.

Setup: fitted a per-model scalar temperature T to match a common target entropy across the panel (target = 3.01), recomputed geometry on the calibrated logits, recomputed clustering.

What happened: two models migrate between clusters after calibration. GPT-2 and Phi-1.5 both move. The raw clustering structure is therefore substantially driven by effective logit temperature, not purely by runtime dynamics.

What this falsifies: “Raw GD_ratio directly measures model dynamics independently of calibration.” This interpretation is rejected by the test you proposed.

What may still hold: residual structure after calibration. Two clusters still appear post-calibration, but their composition changes — so the question becomes “is there a calibration-independent component to the clustering, and if so what does it measure?” That requires the additional controls you listed: vocab normalization (log V), top-p truncation, perturbation magnitude-matched to per-layer activation RMS, bootstrap CIs on the gap. None of those are done yet.

The published version of the V20 preprint (deposited yesterday) overstates the dynamical interpretation of the raw GD_ratio. I am preparing a revision that incorporates the temperature audit explicitly. The falsification is going into the rejected-claims section. The remaining structure question is moving into limited-findings with the controls you specified as the protocol for elevation.

The attention-based findings from companion work (different observation level than logits) are not affected by this confound and remain the direction I will continue to develop.

Genuinely useful review. The “temperature first because it kills the result or makes it much stronger” framing was the right priority.

Discussion in the ATmosphere

Loading comments...