External Publication
Visit Post

LLM "curving" via prompting

Hugging Face Forums [Unofficial] June 26, 2026
Source

Hmm… this is difficult, but based on what I could find so far, maybe it looks something like this:


Short answer

I would not treat this as a yes/no question yet.

At the output level , the safest first interpretation is that the prompt is creating a strong style / ontology / behavioral regime shift. The prompt gives the model a very explicit vocabulary: “field,” “gravity well,” “self-organization,” “manifold,” “density,” “hidden dynamics,” etc. So it is not surprising that the model starts speaking and visualizing in that ontology.

But I would not yet call it evidence that the model’s internal manifold has literally been “curved,” unless the visualization is based on something like token scores/logprobs, hidden states, or causal interventions , rather than generated text or model self-description.

My tentative answer to the three questions would be:

  1. Manifold perturbation or stylistic variation? Maybe more than ordinary style variation, but “manifold perturbation” is too strong until the measurement layer is specified. A safer current label might be prompt-induced regime shift or prompt steerability.

  2. Has anyone tried similar “field condition” prompting? Yes, there are nearby prompt-only experiments on this forum using “field,” “attractor,” “semantic tension,” or “global regulator” language. I would treat those as related observations, not evidence of the same internal mechanism.

  3. How to distinguish rhetorical self-organization from something stronger? I would not look for one magic “self-organization score.” I would use an evidence profile : output behavior → behavioral stability → token distribution → hidden-state trajectory → interdependence/order-parameter tests → causal patching/ablation/transfer.

For me, the most important missing detail is:

What exactly are the maps computed from?

Generated text embeddings? Token scores? Hidden states? Attentions? A custom metric? A 2-D projection of some other vector space? The interpretation changes a lot depending on that answer.


The main distinction I would make

I would separate “the prompt curves the model” into several increasingly strong claims.

Level What is being claimed Example evidence
Output style The output sounds different More field/manifold/self-organization language
Behavioral regime The prompt reliably shifts behavior Replication across seeds/tasks; survives style-only controls
Token distribution The next-token landscape changes Scores, logprobs, entropy, KL divergence
Representation Hidden-state trajectories change Layerwise hidden-state distances, CKA/RSA, trajectory clustering
Self-organization-like profile There is stable, structured organization, not just vocabulary Order parameters, perturbation recovery, interdependence/synergy
Causal mechanism The effect can be moved or removed Activation patching, ablation, activation-difference transfer

The first two levels are still interesting. They just do not yet justify the stronger internal-mechanism language.

A useful existing term for the second level is prompt steerability : how much prompting alone can shift a model’s behavioral distribution away from baseline. See, for example:

  • Evaluating the Prompt Steerability of Large Language Models
  • ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
  • PromptRobust
  • PromptBench

So I would initially describe the current observation as:

This prompt may induce a distinctive behavioral regime. Whether that regime corresponds to a deeper representation-level or causal mechanism is testable, but not yet shown by the wording of the output alone.


What I would ask first

Before interpreting the maps, I would ask:

What exactly is being plotted?

If the map is computed from… Safer interpretation Unsafe interpretation
Generated text embeddings Output-space shift Internal manifold perturbation
Prompt embeddings Input-space / prompt-space difference Model dynamics changed
Token scores / logprobs Next-token distribution changed Causal mechanism found
Hidden states Representation trajectory changed The difference caused the behavior
Attention patterns Attention allocation changed The whole model entered a new field state
Custom metrics Difference under that metric Internal geometry proved
UMAP/t-SNE/PCA projection Exploratory visualization Literal gravity wells / basin sizes

For open-weight Hugging Face models, some of these layers are accessible. Relevant docs:

  • Transformers generation utilities
  • Text generation API
  • Model outputs / hidden states

There is also a practical gotcha: hidden states from model.generate() can be confusing because prompt-token states and generated-token states are not always obtained in the same way. This HF Forum thread is relevant:

  • The hidden states when I use model.generate()

So I would ask:

Are the maps based on generated text, scores/logprobs, hidden states from a specific layer/token position, attentions, or a custom post-processing metric?

That question is not a criticism. It determines what kind of claim the map can support.

Related HF Forum context (click for more details)


Rhetorical self-organization vs stronger evidence

I do not know a single accepted metric for “genuine self-organization simulation” in LLM outputs.

But adjacent fields do have recurring measurement families. I would treat them as an evidence profile , not one score.

Measurement family What it asks Prompt-experiment version
Order parameters Is there a low-dimensional collective variable? What variable summarizes the induced regime?
Stability / recovery Does the system return to a regime after perturbation? Seeds, paraphrases, distractors, multi-turn recovery
Entropy / complexity / homeostasis Is there a balance of order and variability? Token entropy, output entropy, complexity profile
Fingerprint metrics Is there randomness, pattern, and interdependence? Output variance, repeated structure, mutual information
Interdependence / synergy Are parts becoming structurally coupled? MI/PID among output features or hidden components
Information dynamics Is information stored/transferred over time? Prompt influence across generation steps
Multiscale profile At what scale does organization appear? Token, sentence, response, layer, seed ensemble
Topological structure Are there persistent geometric features? TDA on hidden-state or output-embedding clouds
Causal intervention Can the effect be moved or removed? Patching, ablation, activation-difference transfer

Translated to this experiment, I would draw the boundary like this:

Category What it would look like
Rhetorical self-organization The output uses self-organization / field / attractor vocabulary
Behavioral self-organization-like effect The regime replicates across seeds/tasks and survives style-only controls
Distribution-level evidence Scores, entropy, or logprob profiles shift relative to controls
Representation-level evidence Hidden-state trajectories shift relative to controls
Order-parameter-like evidence A low-dimensional variable predicts many output/hidden-state features
Interdependence evidence Output or hidden-state components become more mutually dependent/synergistic
Causal evidence Patching, ablation, or activation-difference transfer moves or removes the effect

So for your third question, I would not ask:

Is it genuine self-organization, yes or no?

I would ask:

How far up this ladder does the effect survive?

Background links for self-organization measurement families (click for more details)


Controls that would make the result easier to interpret

I would add controls before interpreting the effect as internal geometry.

Control Why it matters
Baseline prompt What does the model do normally?
Curving / field prompt The proposed effect
Physics-metaphor-only prompt Tests whether the effect is just physics vocabulary
Self-monitoring-only prompt Tests whether the effect is generic reflective prompting
Same-length abstract prompt Controls for length and abstraction
Shuffled-label prompt Tests whether labels like “self-assembly” and “self-organization” matter
No-visual-language prompt Tests whether the visual map is induced by visualization instructions
Multiple seeds Tests whether the regime is stable
Greedy + sampling Separates deterministic prompt effect from sampling noise

Example interpretation:

Result Safer interpretation
Curving prompt and physics-only prompt look similar Style / ontology priming is likely
Curving prompt differs from all style controls across seeds Behavioral regime shift becomes more plausible
Token score / entropy profiles differ from controls Distribution-level claim becomes stronger
Hidden-state trajectories differ from controls Representation-level claim becomes stronger
Patching transfers or removes the effect Causal claim becomes more plausible

Controls are not just for disproving the idea. They make the interesting part more precise.


Small practical next step

If I were trying to make this easy for others to evaluate, I would create a minimal reproducibility package:

  1. Pick one open-weight model.
  2. Save exact model revision and tokenizer revision.
  3. Save the serialized chat-template output.
  4. Use baseline / curving / physics-only / same-length abstract controls.
  5. Fix generation settings.
  6. Run multiple seeds.
  7. Save generated outputs.
  8. Save token scores / entropy profiles if available.
  9. Save hidden-state extraction method if used.
  10. Report which layer/token positions are plotted.
  11. Save projection method and parameters.
  12. If possible, add one small activation-patching or activation-transfer experiment.

The claim can then be stated in tiers:

Evidence obtained Claim strength
Output changes only Rhetorical/style regime
Replicates across seeds and controls Behavioral regime shift
Scores/logprobs differ Token-distribution shift
Hidden states differ Representation-level shift
Order parameter / interdependence appears Self-organization-like profile
Patching/ablation/transfer works Causal evidence candidate

Implementation details that can easily confound the result (click for more details) A staged open-model test (click for more details) Relation to activation steering (click for more details) Visualization caveat (click for more details) Model self-description is output data, not mechanism evidence (click for more details) Terms I would use / avoid (click for more details)


My current position

This looks like an interesting prompt-induced regime shift. It may become a stronger representation-level or self-organization-like claim if the maps are based on well-specified token/hidden-state data, survive controls, and ideally show some causal transfer or ablation behavior.

Until then, I would keep “manifold curving” as a metaphor or hypothesis, not as the measured conclusion.

Discussion in the ATmosphere

Loading comments...