External Publication

LLM "curving" via prompting

Hugging Face Forums [Unofficial] June 26, 2026

Hmm… this is difficult, but based on what I could find so far, maybe it looks something like this:

Short answer

I would not treat this as a yes/no question yet.

At the output level , the safest first interpretation is that the prompt is creating a strong style / ontology / behavioral regime shift. The prompt gives the model a very explicit vocabulary: “field,” “gravity well,” “self-organization,” “manifold,” “density,” “hidden dynamics,” etc. So it is not surprising that the model starts speaking and visualizing in that ontology.

But I would not yet call it evidence that the model’s internal manifold has literally been “curved,” unless the visualization is based on something like token scores/logprobs, hidden states, or causal interventions , rather than generated text or model self-description.

My tentative answer to the three questions would be:

Manifold perturbation or stylistic variation? Maybe more than ordinary style variation, but “manifold perturbation” is too strong until the measurement layer is specified. A safer current label might be prompt-induced regime shift or prompt steerability.
Has anyone tried similar “field condition” prompting? Yes, there are nearby prompt-only experiments on this forum using “field,” “attractor,” “semantic tension,” or “global regulator” language. I would treat those as related observations, not evidence of the same internal mechanism.
How to distinguish rhetorical self-organization from something stronger? I would not look for one magic “self-organization score.” I would use an evidence profile : output behavior → behavioral stability → token distribution → hidden-state trajectory → interdependence/order-parameter tests → causal patching/ablation/transfer.

For me, the most important missing detail is:

What exactly are the maps computed from?

Generated text embeddings? Token scores? Hidden states? Attentions? A custom metric? A 2-D projection of some other vector space? The interpretation changes a lot depending on that answer.

The main distinction I would make

I would separate “the prompt curves the model” into several increasingly strong claims.

Level	What is being claimed	Example evidence
Output style	The output sounds different	More field/manifold/self-organization language
Behavioral regime	The prompt reliably shifts behavior	Replication across seeds/tasks; survives style-only controls
Token distribution	The next-token landscape changes	Scores, logprobs, entropy, KL divergence
Representation	Hidden-state trajectories change	Layerwise hidden-state distances, CKA/RSA, trajectory clustering
Self-organization-like profile	There is stable, structured organization, not just vocabulary	Order parameters, perturbation recovery, interdependence/synergy
Causal mechanism	The effect can be moved or removed	Activation patching, ablation, activation-difference transfer

The first two levels are still interesting. They just do not yet justify the stronger internal-mechanism language.

A useful existing term for the second level is prompt steerability : how much prompting alone can shift a model’s behavioral distribution away from baseline. See, for example:

Evaluating the Prompt Steerability of Large Language Models
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
PromptRobust
PromptBench

So I would initially describe the current observation as:

This prompt may induce a distinctive behavioral regime. Whether that regime corresponds to a deeper representation-level or causal mechanism is testable, but not yet shown by the wording of the output alone.

What I would ask first

Before interpreting the maps, I would ask:

What exactly is being plotted?

If the map is computed from…	Safer interpretation	Unsafe interpretation
Generated text embeddings	Output-space shift	Internal manifold perturbation
Prompt embeddings	Input-space / prompt-space difference	Model dynamics changed
Token scores / logprobs	Next-token distribution changed	Causal mechanism found
Hidden states	Representation trajectory changed	The difference caused the behavior
Attention patterns	Attention allocation changed	The whole model entered a new field state
Custom metrics	Difference under that metric	Internal geometry proved
UMAP/t-SNE/PCA projection	Exploratory visualization	Literal gravity wells / basin sizes

For open-weight Hugging Face models, some of these layers are accessible. Relevant docs:

Transformers generation utilities
Text generation API
Model outputs / hidden states

There is also a practical gotcha: hidden states from model.generate() can be confusing because prompt-token states and generated-token states are not always obtained in the same way. This HF Forum thread is relevant:

The hidden states when I use model.generate()

So I would ask:

Are the maps based on generated text, scores/logprobs, hidden states from a specific layer/token position, attentions, or a custom post-processing metric?

That question is not a criticism. It determines what kind of claim the map can support.

Related HF Forum context (click for more details)

Rhetorical self-organization vs stronger evidence

I do not know a single accepted metric for “genuine self-organization simulation” in LLM outputs.

But adjacent fields do have recurring measurement families. I would treat them as an evidence profile , not one score.

Measurement family	What it asks	Prompt-experiment version
Order parameters	Is there a low-dimensional collective variable?	What variable summarizes the induced regime?
Stability / recovery	Does the system return to a regime after perturbation?	Seeds, paraphrases, distractors, multi-turn recovery
Entropy / complexity / homeostasis	Is there a balance of order and variability?	Token entropy, output entropy, complexity profile
Fingerprint metrics	Is there randomness, pattern, and interdependence?	Output variance, repeated structure, mutual information
Interdependence / synergy	Are parts becoming structurally coupled?	MI/PID among output features or hidden components
Information dynamics	Is information stored/transferred over time?	Prompt influence across generation steps
Multiscale profile	At what scale does organization appear?	Token, sentence, response, layer, seed ensemble
Topological structure	Are there persistent geometric features?	TDA on hidden-state or output-embedding clouds
Causal intervention	Can the effect be moved or removed?	Patching, ablation, activation-difference transfer

Translated to this experiment, I would draw the boundary like this:

Category	What it would look like
Rhetorical self-organization	The output uses self-organization / field / attractor vocabulary
Behavioral self-organization-like effect	The regime replicates across seeds/tasks and survives style-only controls
Distribution-level evidence	Scores, entropy, or logprob profiles shift relative to controls
Representation-level evidence	Hidden-state trajectories shift relative to controls
Order-parameter-like evidence	A low-dimensional variable predicts many output/hidden-state features
Interdependence evidence	Output or hidden-state components become more mutually dependent/synergistic
Causal evidence	Patching, ablation, or activation-difference transfer moves or removes the effect

So for your third question, I would not ask:

Is it genuine self-organization, yes or no?

I would ask:

How far up this ladder does the effect survive?

Background links for self-organization measurement families (click for more details)

Controls that would make the result easier to interpret

I would add controls before interpreting the effect as internal geometry.

Control	Why it matters
Baseline prompt	What does the model do normally?
Curving / field prompt	The proposed effect
Physics-metaphor-only prompt	Tests whether the effect is just physics vocabulary
Self-monitoring-only prompt	Tests whether the effect is generic reflective prompting
Same-length abstract prompt	Controls for length and abstraction
Shuffled-label prompt	Tests whether labels like “self-assembly” and “self-organization” matter
No-visual-language prompt	Tests whether the visual map is induced by visualization instructions
Multiple seeds	Tests whether the regime is stable
Greedy + sampling	Separates deterministic prompt effect from sampling noise

Example interpretation:

Result	Safer interpretation
Curving prompt and physics-only prompt look similar	Style / ontology priming is likely
Curving prompt differs from all style controls across seeds	Behavioral regime shift becomes more plausible
Token score / entropy profiles differ from controls	Distribution-level claim becomes stronger
Hidden-state trajectories differ from controls	Representation-level claim becomes stronger
Patching transfers or removes the effect	Causal claim becomes more plausible

Controls are not just for disproving the idea. They make the interesting part more precise.

Small practical next step

If I were trying to make this easy for others to evaluate, I would create a minimal reproducibility package:

Pick one open-weight model.
Save exact model revision and tokenizer revision.
Save the serialized chat-template output.
Use baseline / curving / physics-only / same-length abstract controls.
Fix generation settings.
Run multiple seeds.
Save generated outputs.
Save token scores / entropy profiles if available.
Save hidden-state extraction method if used.
Report which layer/token positions are plotted.
Save projection method and parameters.
If possible, add one small activation-patching or activation-transfer experiment.

The claim can then be stated in tiers:

Evidence obtained	Claim strength
Output changes only	Rhetorical/style regime
Replicates across seeds and controls	Behavioral regime shift
Scores/logprobs differ	Token-distribution shift
Hidden states differ	Representation-level shift
Order parameter / interdependence appears	Self-organization-like profile
Patching/ablation/transfer works	Causal evidence candidate

Implementation details that can easily confound the result (click for more details) A staged open-model test (click for more details) Relation to activation steering (click for more details) Visualization caveat (click for more details) Model self-description is output data, not mechanism evidence (click for more details) Terms I would use / avoid (click for more details)

My current position

This looks like an interesting prompt-induced regime shift. It may become a stronger representation-level or self-organization-like claim if the maps are based on well-specified token/hidden-state data, survive controls, and ideally show some causal transfer or ablation behavior.

Until then, I would keep “manifold curving” as a metaphor or hypothesis, not as the measured conclusion.