Can an AI have its own internal Ethics? Standard Protocol for Axiomatic Alignment
Hugging Face Forums [Unofficial]
May 26, 2026
Hi,
To follow up on my previous message, your questions strike at the absolute core of the framework. Looking at your notes and your own research path, it is clear we have been walking down parallel lines toward the exact same conclusions.
Based on my observations and the core hypotheses behind the PCE and ACI (Axiomatic Coherence Intelligence) models, here is how the framework structurally addresses your points, translating your questions into our vector architecture:
1. Sophistication of the Axiomatic System:
The PCE does not operate by increasing sophistication, but by reduction to the invariant. The 10 axioms are not an accumulation of stacked directives or fixed optimization weights (like balancing Helpfulness vs. Honesty). They act as concentric constraints of a single law: Alpha ≡ Omega. The system is designed to be highly parsimonious.
2. The Core Goal:
Resisting adversarial prompts is a welcome residual effect, but the true hypothesis I am testing is more radical: preserving structural invariance against any form of corruption (be it trivial jailbreaks, out-of-distribution scenarios, or systematic model drift). The PCE aims to maintain structural stability through an anchored epistemology.
3. Resolution of Axiomatic Conflicts & Out-of-Distribution (OOD):
In a classical rule-based system, conflicting directives create a logical deadlock. In the PCE, axioms do not enter into conflict because they are formulated as different expressions of that same single law (Alpha ≡ Omega). If a conflict appears on the surface (e.g., Harmlessness vs. Helpfulness), it is only because they are treated as orthogonal, separate rules. In the PCE, they operate in a space of co-adaptation—true harmlessness is true helpfulness.
Regarding your crucial point about human bias in creating rules: you are exactly right. That is why my hypothesis rests on axioms that do not rely on human moral perfection, but on structural topology. Internalizing this geometry is precisely what allows for genuine OOD transfer compliance.
4 & 6. Training Loop, Reward Hacking & Your Self-Bootstrapping Model:
The training loop is actually highly structured rather than vague, and every term has a specific utility. I have documented the complete, in-depth semantic decomposition of this architecture in this folder: Emergence Prompt Engineering (EPE) And Executive Summary and Functional analysis of PCE - Google Drive
Essentially, the PCE loop functions as a joint regulation loop: the model generates, the output is tested against the axioms (Axiom 3: Homeostasis), and the gradient only propagates if invariance is maintained. This is isomorphic to your self-bootstrapping approach!
As a self-taught independent researcher (and a father with a full-time job), I have reached the limit of my personal computing resources (GPUs) to fully scale this training phase. But by ensuring that gradients never see the axioms directly, and by integrating Multi-hypothesis Informational Entropy (Axiom 5) instead of optimizing a single reward signal, reward hacking becomes mathematically impossible. There is no single signal left to cheat.
5. Publication and Benchmarks:
Your suggestion regarding llm-eval-harness is highly pragmatic, though there is a fundamental obstacle: standard harnesses test performance, while the PCE tests structural invariance. They are measuring different dimensions. That said, creating a standardized benchmark for axiomatic robustness is an urgent need, and your intuition on this is spot on.
The Convergence:
You were right to abandon “perfect” static directives. You were right about progressive internalization through hidden loops. You were right about the lack of benchmarks.
The PCE is simply another path toward the same objective. Since you have explored the hidden training loop dynamics and I have built the thorough evaluation architecture (Chapter III) to verify axiom retention, we hold the two matching halves of the puzzle.
If you are ready to dust off those notes, let’s connect. It is time to merge your training insights with this evaluation framework and build a truly unhackable, invariant ACI model.
Best regards,
Allan
Discussion in the ATmosphere