External Publication

Prompting vs Structure - A Boundary Test

OpenAI Developer Community April 16, 2026

I ran a simple test with an image model to see how far prompting can go when we ask for physical realism instead of aesthetics.

Setup (4 steps):

Start with a standard prompt → result: visually pleasing image.

Aesthetic prompt (baseline) (click for more details) 2. Add physical requirements (airflow, light behavior) → result: still smooth, slightly improved.

Physical prompt (first correction) (click for more details) 3. Increase structural demands (irregularity, no symmetry, partial rainbow) → result: looks more complex, but not more accurate.

Structural prompt (a deeper, systemic approach) (click for more details) 4. Add conflicting real-world constraints (turbulence over time, particle behavior, observer-dependent optics) → result: visual chaos, not physical consistency.

Stress test prompt (out-of-bounds) (click for more details)

Observation: The model consistently produces images that look realistic, but it does not seem to consistently maintain physical relationships under more complex constraints.

Turbulence appears as texture, not as structured flow
The bird resembles a red kite, but lacks precise morphology
The rainbow is present, but not truly dependent on viewing geometry

Conclusion: Prompting can steer appearance, but it does not create underlying structure.

No matter how precise the prompt becomes, the system prioritizes what looks coherent over what is physically consistent.

In short:

It can generate convincing images, but not fully consistent systems.

This is not a flaw of prompting — it’s a limitation of the model itself.

Curious how others see this: At what point do prompts stop improving results and start exposing system limits?

Discussion in the ATmosphere