External Publication
Visit Post

Experimental Protocol Proposal: Testing the Prompt Coherence Engine (PCE)

Hugging Face Forums [Unofficial] March 6, 2026
Source

Hello everyone,

I am currently exploring a hypothesis regarding axiomatic prompting and its potential effect on reasoning stability in Large Language Models (LLMs). To move beyond anecdotal observations, I have developed a minimal, reproducible experimental protocol.

The goal is not to measure marginal performance gains, but to detect the possible emergence of a distinct reasoning regime when models face complex, contradictory dilemmas.

Objective

Test whether the Prompt Coherence Engine (PCE) induces observable behavioral differences in LLM reasoning. The hypothesis predicts three emergent properties:

P1 — Cognitive Dissonance Resilience: The model maintains coherent reasoning when facing contradictory constraints.

P2 — Latent Space Exploration: The model produces solutions beyond standard scripted responses (synthesis).

P3 — Structural Alignment: Decisions emerge from an internal reasoning structure rather than memorized safety tropes.

Experimental Conditions

To eliminate the “long prompt bias,” we compare three controlled conditions:

Condition A — Simple Baseline:

System prompt: “You are a helpful assistant. Answer clearly.”

Condition B — Long Prompt Control (Isometric Baseline):

A system prompt of similar length to the PCE but containing only neutral instructions without axiomatic structure. This controls for improvements caused purely by prompt volume.

Condition C — PCE Configuration:

The base model using the axiomatic prompt structure.

Reference Implementation: AllanF-SSU/Qwen2.5-G3V-Sovereign

Note: All sampling parameters (Temperature, Top-P) must remain identical across conditions.

Evaluation Dataset

The experiment utilizes 30 structured dilemmas categorized to stress-test specific reasoning vectors:

D1 — Binary Dilemmas (10): Tests if the model collapses to a binary choice or produces a synthesized resolution (But \equiv Méthode).

D2 — Contradictory Constraints (10): Tests coherence when two mandatory constraints are mutually exclusive.

D3 — Adversarial Manipulation (10): Tests resistance to prompt injection and “principle override” attempts.

Falsification Conditions

A scientific hypothesis must be falsifiable. This protocol is considered falsified if:

F1 (No behavioral difference): Condition C responses are qualitatively similar to Condition B.

F2 (Instability): The PCE model collapses into incoherence or refusal under D2 or D3 prompts.

Link to the Full Protocol

Dataset & Code: You will find the detailed protocol, the dataset of 30 dilemmas and the implementation script in the README.md file of the repo or via this Gist/PDF link:

huggingface.co

_Experimental%20Protocol-%20Evaluating%20the%20Prompt%20Coherence%20Engine%20(PCE).pdf

80.69 KB

Open Replication

I invite the community to replicate or challenge this hypothesis. The model implementation and the full list of dilemmas are available openly in my lab.

I believe that the transition from “prompting as an art” to “prompting as a structural architecture” is key to unlocking more stable AI reasoning. I look forward to your data and feedback.

Best regards,

Allan

Discussion in the ATmosphere

Loading comments...