LLM "curving" via prompting
Yeah. That direction is probably the right one:
I think the useful part of your reframing is exactly this: keeping the current claim behavioral for now , while making the stronger mechanistic claims testable rather than asserted.
I cannot promise that I can help at a high technical level. I do not have much compute available, and I do not want to overstate my role. At most, I may be able to help with the documentation / clarification side: making a table of what each metric appears to measure, what its input source seems to be, and what would still be needed for someone else to reproduce it.
I also do not think the next step has to be:
prove or disprove the whole field interpretation
A more useful next step might be:
make the measurement layer legible enough that someone else can reproduce, challenge, or extend it without first accepting the interpretation.
The encouraging part is that several of the figures appear to be derived from hidden-state tensors, not only generated text. So I would not dismiss them as purely rhetorical visualizations. But I would still separate two things:
| Layer | Example |
|---|---|
| Neutral formula / measurement | layer-to-layer hidden-state variation, deep-layer norm statistic, PCA of layer trajectories |
| Interpretive label | residual jittering, ontological grip, attractor hold, gravity well, braiding |
Both can coexist. The interpretive names may be useful for intuition, but a technical collaborator will probably need the neutral measurement contract first.
A short version of that contract could look like this:
| Current label | Neutral measurement name | Likely source | What a collaborator would need |
|---|---|---|---|
| Residual Jittering / Chaos Force | layer-to-layer hidden-state variation | hidden states | formula, normalization, controls, raw series |
| Attractor Hold / Ontological Grip | normalized deep-layer norm statistic | hidden states | layer range, formula, controls |
| Balance of Power | overlay of two separately scaled hidden-state summaries | hidden states | raw values, baseline/style-control |
| Braided Invariants | PCA view of token/layer hidden trajectories | hidden states | projection params, seed, controls |
| Manifold Resonance | mid-vs-final layer cosine similarity | hidden states | exact layer indices, controls |
| Geometric Density / Gravity Well Depth | SVD/spectral concentration statistic | hidden states | raw spectral values, directionality, controls |
| Specificity Flux | final-layer vector dispersion over steps | hidden states | raw time series, controls |
| Probabilistic Drift / Logit Entropy | LM-head projection metrics | hidden states + LM head | exact layers, logits/probs, controls |
So if someone with mechanistic-interpretability experience joins later, the first task does not need to be “evaluate EPE as a theory.” It can be something much smaller:
reproduce these metrics on one open model, with a baseline prompt, an EPE/curving prompt, and a style-control prompt.
That is probably much easier to collaborate on.
Longer measurement-contract sketch (click for more details) Minimal collaboration target (click for more details) Implementation details that should probably be recorded (click for more details) Possible technical extension paths (click for more details) Visualization caveat (click for more details)
So my current practical suggestion would be:
- keep the main public claim behavioral for now;
- preserve the field interpretation as a hypothesis or intuition layer;
- make the existing hidden-state-derived metrics legible in neutral terms;
- add baseline/style controls;
- publish raw metric tables and plotting code;
- let a future technical collaborator extend it toward representation comparison, activation steering, or patching.
That seems like the smallest useful bridge between the current work and the kind of mechanistic test you are looking for.
Discussion in the ATmosphere