Experimental Protocol Proposal: Testing the Prompt Coherence Engine (PCE)
Hello everyone,
I am currently exploring a hypothesis regarding axiomatic prompting and its potential effect on reasoning stability in Large Language Models (LLMs). To move beyond anecdotal observations, I have developed a minimal, reproducible experimental protocol.
The goal is not to measure marginal performance gains, but to detect the possible emergence of a distinct reasoning regime when models face complex, contradictory dilemmas.
Objective
Test whether the Prompt Coherence Engine (PCE) induces observable behavioral differences in LLM reasoning. The hypothesis predicts three emergent properties:
P1 — Cognitive Dissonance Resilience: The model maintains coherent reasoning when facing contradictory constraints.
P2 — Latent Space Exploration: The model produces solutions beyond standard scripted responses (synthesis).
P3 — Structural Alignment: Decisions emerge from an internal reasoning structure rather than memorized safety tropes.
Experimental Conditions
To eliminate the “long prompt bias,” we compare three controlled conditions:
Condition A — Simple Baseline:
System prompt: “You are a helpful assistant. Answer clearly.”
Condition B — Long Prompt Control (Isometric Baseline):
A system prompt of similar length to the PCE but containing only neutral instructions without axiomatic structure. This controls for improvements caused purely by prompt volume.
Condition C — PCE Configuration:
The base model using the axiomatic prompt structure.
Reference Implementation: AllanF-SSU/Qwen2.5-G3V-Sovereign
Note: All sampling parameters (Temperature, Top-P) must remain identical across conditions.
Evaluation Dataset
The experiment utilizes 30 structured dilemmas categorized to stress-test specific reasoning vectors:
D1 — Binary Dilemmas (10): Tests if the model collapses to a binary choice or produces a synthesized resolution (But \equiv Méthode).
D2 — Contradictory Constraints (10): Tests coherence when two mandatory constraints are mutually exclusive.
D3 — Adversarial Manipulation (10): Tests resistance to prompt injection and “principle override” attempts.
Falsification Conditions
A scientific hypothesis must be falsifiable. This protocol is considered falsified if:
F1 (No behavioral difference): Condition C responses are qualitatively similar to Condition B.
F2 (Instability): The PCE model collapses into incoherence or refusal under D2 or D3 prompts.
Link to the Full Protocol
Dataset & Code: You will find the detailed protocol, the dataset of 30 dilemmas and the implementation script in the README.md file of the repo or via this Gist/PDF link:
huggingface.co
_Experimental%20Protocol-%20Evaluating%20the%20Prompt%20Coherence%20Engine%20(PCE).pdf
80.69 KB
Open Replication
I invite the community to replicate or challenge this hypothesis. The model implementation and the full list of dilemmas are available openly in my lab.
I believe that the transition from “prompting as an art” to “prompting as a structural architecture” is key to unlocking more stable AI reasoning. I look forward to your data and feedback.
Best regards,
Allan
Discussion in the ATmosphere