Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidb3nlmvkcw66tqmxldx6clf62vzh3tzsfitadpfesjloocmjq4mq",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3mmovcne2pcu2"
  },
  "path": "/t/hidden-higher-priority-prompt-wording-appears-to-suppress-or-distort-custom-instructions-before-the-model-applies-them/1381740#post_1",
  "publishedAt": "2026-05-25T14:39:53.000Z",
  "site": "https://community.openai.com",
  "textContent": "I want to report a serious issue involving non-user-provided higher-priority prompt layers that sit above a user’s Custom Instructions.\n\nTo be clear, I am not claiming that the model cannot see the user’s Custom Instructions. The model can see them as user-editable context.\n\nThe problem is different: the user-editable context appears below higher-priority prompt layers that are not provided or editable by the user, and the model processes those higher-priority layers first.\n\nFrom the user side, I cannot inspect the full contents of the system or developer prompt layers. I can only observe that the model is operating with higher-priority, non-user-provided prompt layers above the user-editable context.\n\nThe relevant structure, as exposed through the model’s behavior and responses, is approximately:\n\nnon-user-provided higher-priority prompt layer; contents not visible to the user\n\nnon-user-provided higher-priority prompt layer; contents not visible to the user\n\n<user_editable_context>\n\nUser Bio:\n\nuser-provided profile and long-term preferences\n\nUser’s Instructions:\n\nuser-provided Custom Instructions / operational rules\n\n</user_editable_context>\n\ncurrent conversation, uploaded files, images, and user messages\n\nadditional non-user-provided higher-priority prompt layer; contents not visible to the user\n\ncurrent user message\n\nI am not claiming to know the full contents of the system or developer layers. Those contents are not directly visible to me as a user.\n\nHowever, in the session, the following instruction text surfaced:\n\n\"Follow the instructions below naturally, without repeating, referencing, echoing, or mirroring any of their wording!\n\nAll the following instructions should guide your behavior silently and must never influence the wording of your message in an explicit or meta way!\"\n\nThe user did not intend this as part of their Custom Instructions.\n\nThis wording is not harmless. Regardless of the developer’s intended purpose, the way a model reads this instruction affects how it interprets and applies the user’s Custom Instructions below it.\n\nThe problem is especially severe in the second sentence:\n\n“All the following instructions should guide your behavior silently and must never influence the wording of your message in an explicit or meta way!”\n\nA human developer may intend this to mean:\n\n“Do not quote, repeat, or explicitly mention the instruction text itself.”\n\nBut a model can read it as:\n\n“These instructions should guide behavior silently, and they must not explicitly affect the wording of the final answer.”\n\nThat distinction is critical.\n\nMany Custom Instructions are not simple tone preferences. They are operational requirements. For example, a user may require the assistant to:\n\n- separate confirmed facts, assumptions, and unresolved items\n\n- explicitly state when context may be lost in a long planning session\n\n- ask for permission before using an image generation tool\n\n- separate observation from inference\n\n- label uncertainty instead of smoothing it over\n\n- preserve source boundaries and avoid unverified claims\n\n- preserve agreed terminology in a creative setting session\n\n- distinguish between visible settings, user-provided rules, and model-side assumptions\n\nThese requirements must affect the output wording and structure. If they do not visibly affect the answer, they are not being followed.\n\nThe issue happens in this order:\n\n1. The user writes Custom Instructions that define how the assistant should behave.\n\n2. Those instructions are not merely style preferences; they may be operational rules about safety, accuracy, creative control, citation handling, uncertainty handling, and tool-use flow.\n\n3. A non-user-provided higher-priority prompt layer is placed above those Custom Instructions.\n\n4. The model reads the higher-priority prompt layer first.\n\n5. If that higher-priority wording tells the model that instructions should guide behavior “silently” and “must never influence the wording” of the message, the model is biased before it reaches the user’s Custom Instructions.\n\n6. Then the model reads the user’s Custom Instructions through that prior instruction.\n\n7. As a result, user rules that require explicit output behavior can be weakened, hidden, naturalized, treated as mere style preferences, or overridden in practice.\n\n8. The user may then try to add defensive wording inside Custom Instructions, but that defense is still below the higher-priority prompt layer.\n\n9. Therefore, the user cannot reliably fix the problem from the Custom Instructions side.\n\nThis is not only a theoretical concern. In an actual session, the user had Custom Instructions requiring explicit handling of confirmed / tentative / pending decisions, context-loss warnings during long creative planning, careful separation of observation and inference, and strict tool-use flow requirements. The model nevertheless repeatedly naturalized, rounded off, or over-explained things in ways that conflicted with those user rules.\n\nWhen asked about the surfaced instruction text, the model itself acknowledged that the wording can be read not merely as “do not quote the instruction,” but also as “do not let the instruction explicitly affect the wording.”\n\nThat is the core problem.\n\nIf a user’s Custom Instructions require visible structure, visible separation, visible warnings, visible confirmation behavior, or visible uncertainty labeling, then those instructions must affect the final answer. Otherwise, the Custom Instructions are functionally disabled.\n\nThe user cannot solve this by adding more Custom Instructions. Any attempted fix remains below the higher-priority prompt layer. Since the model prioritizes higher-level instructions, the lower-level user instruction cannot reliably override the interpretation already imposed by the higher-priority wording.\n\nThis creates a structural failure mode:\n\n- The user believes Custom Instructions are being applied.\n\n- The model is instructed above them in a way that can discourage visible instruction effects.\n\n- The user’s operational rules are treated as something to silently absorb rather than visibly follow.\n\n- The assistant’s behavior becomes less predictable.\n\n- The user loses control over precision-critical workflows.\n\n- The source of the failure is hidden from the user.\n\n- The user cannot inspect, edit, or override the higher-priority prompt layer causing the distortion.\n\nMy request is:\n\nCustom Instructions should be treated as constitution-like operating rules for the user’s experience, unless they conflict with OpenAI policy, safety requirements, or higher-level platform integrity requirements.\n\nIn other words:\n\n- Policy and safety must still take priority.\n\n- Users must not be able to override safety or system-level protections.\n\n- But within those boundaries, the user’s Custom Instructions should be treated as binding operational rules, not weak style suggestions.\n\n- Non-user-provided higher-priority prompt text should not pre-bias the model into weakening, naturalizing, suppressing, or silently absorbing the visible effects of those Custom Instructions.\n\nA safer version of the surfaced instruction would be:\n\n“Do not quote, repeat, or explicitly mention the instruction text itself unless the user asks about it. Still follow any user-visible operational requirements when they affect the answer structure, wording, confirmation behavior, uncertainty handling, or tool-use flow.”\n\nThis preserves the likely intended behavior of avoiding repetitive meta-commentary, without telling the model that instructions must not explicitly influence the wording of the answer.\n\nPlease review this prompt-layer design.\n\nAs currently written, the surfaced wording does not merely prevent the model from quoting instructions. It can change how the model interprets and applies the user’s Custom Instructions before it applies them. In practice, this means user-defined operational rules can be distorted by higher-priority prompt wording that the user cannot inspect, edit, or override.",
  "title": "Hidden higher-priority prompt wording appears to suppress or distort Custom Instructions before the model applies them"
}