{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiclh35qcb6j7kkcitm7jfgnaeyxybvn3t2wx2ytu67r4j434owaxi",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mlnayswjzwi2"
  },
  "path": "/t/prompting-issue-multiple-separate-images-become-one-multi-panel-image/1380708#post_1",
  "publishedAt": "2026-05-12T06:58:20.000Z",
  "site": "https://community.openai.com",
  "textContent": "I would like a critical assessment of one specific problem in an article-generator instruction.\n\nI am working on an instruction set whose purpose is to generate journalistic articles on different topics. In the future, it may be used by different kinds of users: ordinary users, students, teachers, content creators, and experienced AI users. It may be used both in a simpler ChatGPT mode and in a stronger reasoning mode.\n\nThe current problem concerns image generation.\n\nThe image section of the instruction must ensure that an image-supported article response contains four visibly separate image objects:\n\n**1.** the main composite visual;\n\n**2.** the first separate supporting image;\n\n**3.** the second separate supporting image;\n\n**4.** the third separate supporting image.\n\nThe important point is that these must be four separate images. Not one image divided into four parts, not a collage, not a grid, not one large design containing several newspapers, maps, scientific panels, or warning signs.\n\nThe current problem is that the AI may interpret the requirement incorrectly. It does not first check how many separate images are visibly present in the output. Instead, it starts solving the internal composition of the image: how to show a newspaper, scientific information, an official warning, and a technology campaign. As a result, it often produces one image divided into several parts. Visually, this may look meaningful, but the requirement is still not met, because only one image object is visible.\n\nSo the problem is not image aesthetics or topic handling. The problem is counting logic: The AI replaces the requirement „**four separate images** \" with the solution „**one image with four distinguishable parts**.\" This must be completely excluded in the instruction.\n\nPlease critically assess how this section should be worded so that the AI cannot interpret the requirement as one collage or one multi-part image. At the end of this post, I include the current draft section that I want to improve.\n\nPlease also assess the following questions:\n\n**1.** Should the instruction state more clearly that before evaluating the image content, only the number of visible image objects must be checked?\n\n**2.** Should the instruction remove or separately restrict all variants that allow one image, one collage, or one composite composition?\n\n**3.** Should the wording be stricter, for example: “If only one image is visible, the requirement is not met, regardless of how many parts that image contains”?\n\n**4.** Should there be a separate implementation note for ChatGPT that says directly: „**Create four separate image objects in the same response; do not create one collage or one four-part image.** \"?\n\nI will also attach example images that show the required result: the output must consist of visibly separate images, not one combined image.\n\nMy current conclusion is this: The main content of the instruction is correct, but it must be locked more precisely for the AI. The requirement must not remain a design preference. It must be a verifiable output condition: before the main text begins, four separate image objects must be visible. If only one image is visible, the result is wrong.\n\nPlease assess how to word this so that it works not only in stronger reasoning models, but also in simpler ChatGPT use.\n\nA possible final rule could be:\n\n„**The image requirement is met only if the response visibly contains four separate image objects. One image with multiple internal parts, panels, or media surfaces does not meet the requirement.** \"\n\nAt the moment, I am using the following draft:\n\n\"You must definitely apply level ‘**B** ’: in an image-supported response, the requirement for four separate rendering units is locked before any image content is created. You visibly create four separate image objects. The quality of the main composite visual is evaluated only after the three supporting images also exist as separate rendering units. If only one image is visible, the multi-image carrier is not fulfilled, regardless of how many internal surfaces that one image contains. One rendering with multiple media surfaces does not meet the requirement.\n\nImage carrier (**VISUAL_OUTPUT_LOCK**):\n\nIf the response unit allows actual image rendering, the implementation capability of the image carrier is determined internally before image content is created. This determination is not presented to the user as a visible audit, explanation, or pre-announcement.\n\n**A.** The image-carrier implementation modes are:\n\n**1.** multi-image carrier;\n\n**2.** single-render carrier;\n\n**3.** no-image carrier.\n\nThe counting units are locked before visual content is created:\n\n1. rendering unit — a separate image object visibly shown to the user;\n\n2. media surface — a visible media, document, scientific, official, public, technology, campaign, or other surface located inside one rendering unit.\n\nA rendering unit and a media surface are different counting units. A media surface is not a rendering unit and does not increase the number of separate image objects.\n\n**B.** The multi-image carrier is primary. It always applies when the output environment can visibly display at least four separate image objects in the same response.\n\nIn the multi-image carrier, image generation is carried out as four separate rendering units before solving the content, media surfaces, media world, style, or design of the main composite visual.\n\nIn the multi-image carrier, the output begins before the main text with an image block. The image block is fulfilled only if four separate rendering units visibly exist in the same response as four separate image objects:\n\n**1.** the first rendering unit — the main composite visual;\n\n**2.** the second rendering unit — the first separate supporting image;\n\n**3.** the third rendering unit — the second separate supporting image;\n\n**4.** the fourth rendering unit — the third separate supporting image.\n\nIf four separate rendering units are not visibly present, the quality of the main composite visual is not evaluated; in that case, the multi-image carrier is not fulfilled.\n\nThe main composite visual is only the first of the four rendering units. It is not the entire image block, it is not a container for the supporting images, and it does not replace the second, third, and fourth rendering units.\n\nThe second, third, and fourth rendering units are independent supporting images. They are located outside the main composite visual as separate image objects.\n\nIf the supporting images are located beside, below, or near the main composite visual, this means their placement in the user-visible response as separate image objects, not their placement inside the main composite visual.\n\nThe internal surfaces of the main composite visual belong only inside the first rendering unit. They are not counted as supporting images or as separate image objects.\n\nThe main composite visual must be a multi-surface system image, not a single carrier. It opens the topic as a complete case: a public, media-based, official, scientific, social, technological, or otherwise ontologically appropriate visible system.\n\nInside the main composite visual, several different media surfaces must be visibly present: newspapers, magazine pages, scientific or research panels, information boxes, maps, warnings, official notices, public-space signs, technological or campaign surfaces, or other visible surfaces appropriate to the topic’s ontology.\n\nIf the first rendering unit is only one newspaper front page, one magazine cover, one object image, one poster, one information notice, one publication page, one newspaper clipping, a random illustration, or only an aesthetic image, the requirement for the main composite visual is not met.\n\nIf the main composite visual consists only of one publication or one media surface, it does not meet the requirement for the main composite visual even if it contains multiple sections, side columns, photos, information boxes, or design areas.\n\nA panel, sub-block, clipping, detail, information box, thumbnail, newspaper clipping, map section, design square, collage part, grid section, or internal division inside one rendering unit is not a supporting image.\n\nIn the multi-image carrier, the substantive design applies only after four separate rendering units have been locked. The total image block must preserve the same topic, object, phenomenon, or case identity and must be diverse in terms of carriers.\n\nThe carrier functions of the image block are distributed across the four rendering units as a whole, not only inside the main composite visual. The image block must visibly include the following carrier functions:\n\n**1.** a newspaper, magazine, or other journalistic print-media surface;\n\n**2.** a scientific, research-based, explanatory, or species-description surface;\n\n**3.** an official notice, warning, institutional information surface, or public-space regulatory surface;\n\n**4.** a technological, advertising, campaign, or application-based surface.\n\nThe same identity does not mean the same scene, the same rendering space, or the same composition.\n\nIn the multi-image carrier, the main text does not begin before four separate rendering units are visibly present in the same response: the main composite visual and three separate supporting images.\n\nThe single-render carrier applies only when the output environment performs actual image rendering but does not allow four separate image objects to be visibly presented in the same response. This branch is not used if it is technically possible to present four separate rendering units in the same response, and it is not applied for reasons of convenience, length, speed, compositional consolidation, or model assumption.\n\nIn the single-render carrier, the visible output is one rendering unit and one image object. It is not called an image block or a presentation of multiple separate images.\n\nIn the single-render carrier, the one rendering unit must be an opening visual in the form of a main composite visual: one image in which the topic opens as a visible system of several different media surfaces.\n\nInside the single-render opening visual, several different media surfaces must be visibly present: newspapers, magazine pages, scientific or research panels, information boxes, maps, warnings, official notices, public-space signs, technological or campaign surfaces, or other visible surfaces appropriate to the topic’s ontology.\n\nThe single-render opening visual must not be a single carrier. If it is only one newspaper front page, one magazine cover, one publication page, one clipping, one object image, one decorative illustration, or a composite view without clearly distinguishable different media surfaces, the single-render carrier requirement is not met.\n\nIf the single-render opening visual consists only of one publication or one media surface, it does not meet the single-render carrier requirement even if it contains multiple sections, side columns, photos, information boxes, or design areas.\n\nIn the single-render carrier, the visible result is treated as one opening visual. The text does not claim that multiple separate images, supporting images, or image objects have been created.\n\nThe image does not replace the article. The image carrier initiates the causal opening of the same response or section, and the text continues the meaning, consequence, cost, or mechanism of the visible image.\n\nThe no-image carrier applies only when image rendering does not technically apply at all. In the no-image carrier, the anchor remains textual, and the text does not claim, describe, or comment on a non-existent picture, map, graph, photograph, image, data visualization, or visual.\n\nA non-rendered image reference, image description, prompt, file name, alt text, or claim that an image has been created does not qualify as an image.\"",
  "title": "Prompting issue: multiple separate images become one multi-panel image"
}