Collection of GPT-image-generator 2.0 issues, bugs, and work-around tips (check first post)
OpenAI Developer Community
May 9, 2026
I think one difficulty with this test is that some of the “fantasy words” may not actually be neutral fantasy words.
I googled a few of them, and at least some seem to already exist in some form. That means they may already carry associations from training data, search-like knowledge, brand names, names, or similar existing terms. So they are not completely neutral test inputs.
Even if we use truly invented words, the model can still create associations based on:
* phonetic similarity to known words
* typical fantasy-sounding syllables
* the current chat context
* user memory or long-term project context
* previous image generation behavior
* the selected ChatGPT mode/model
* possible automatic prompt expansion
So I think this kind of test can still be interesting, but it is very hard to isolate what exactly is being tested.
To really reduce those effects, the test would probably need a very “cold” setup: no relevant memory, no long chat history, no project context, ideally a fresh API setup, and then the same prompts tested repeatedly across direct Image API, Responses API, and ChatGPT.
Otherwise we may not be testing only the image model. We may also be testing the accumulated context and interpretation layer around it.
Discussion in the ATmosphere