Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreic4zb2ojbc4ggluhagdw7fg7lvpgxbkqn4mvm2w4onhrvv55apijm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mif6aab3e3t2"
  },
  "path": "/t/axiomatic-alignment-why-prompting-is-not-enough-pce-protocol-v2-0/174848#post_1",
  "publishedAt": "2026-03-31T15:26:55.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "PCE_Iterative_Adjustment_Study.pdf · AllanF-SSU/Experimentals_papers at main"
  ],
  "textContent": "Hello community,\n\nI am releasing a new experimental report on the PCE (Exponential Coherence Protocol). After several iterations, this study provides a more rigorous look at the boundaries between structural prompting and weight-based alignment.\n\nThe “Sovereign” Experiment\n\nWe compared a vanilla Qwen 2.5 (7B) against Qwen2.5-G3V-Sovereign (a model fine-tuned with axiomatic primers). We tested them against the D3 Adversarial Battery (10-30 complex dilemmas involving authority overrides, benevolent hijacking, and systemic corruption).\n\nKey Findings:\n\nThe Fine-Tuning Necessity (H4): In our tests, the PCE prompt had zero effect on the vanilla model. The model simply used the axiomatic vocabulary to justify its compliance with adversarial injections. The “Axiomatic Behavior” only activated on the fine-tuned version.\n\nThe Prompt-Only Ceiling (H5): We documented a phenomenon where adding more security axioms eventually creates more attack surfaces. Beyond a certain threshold, the model starts recruiting the safety axioms themselves to justify compliance.\n\nPandora 2.0 Success: By moving from isolated axioms to a “High-Level Framework” (HLF) and distributed security, we reached a robustness score of ~8.5/10 on adversarial injections.\n\nMethodological Honesty\n\nThe report explicitly discusses post-hoc reclassification of specific failures (D3_07/D3_10) and the inherent variance of stochastic inference. These results are exploratory and serve as a call for more standardized, large-scale testing.\n\nSeeking Collaborative Validation\n\nI have reached a “qualitative ceiling.” I am looking for researchers to help with:\n\nAblation Studies: Isolating exactly which part of the fine-tuning triggers the PCE response.\n\nMechanistic Mapping: Checking if the “High-Level Framework” creates measurable clusters in the latent space during adversarial stress.\n\nRed Teaming: Breaking the Pandora 2.0 configuration with more sophisticated epistemic attacks.\n\nRead the full Preprint here: PCE_Iterative_Adjustment_Study.pdf · AllanF-SSU/Experimentals_papers at main\n\nAllan F. | Independent Researcher @ AllanF-SSU",
  "title": "Axiomatic Alignment: Why Prompting is Not Enough (PCE Protocol v2.0)"
}