Can an AI have its own internal Ethics? Standard Protocol for Axiomatic Alignment
Hugging Face Forums [Unofficial]
April 4, 2026
I completely agree with your technical analysis: by definition, an LLM remains stateless, and what we perceive as ‘ethics’ may indeed be nothing more than superficial scaffolding maintained by the prompt context.
However, my working hypothesis with the PCE and the Pandora 2 model is to attempt to move this structure from the outside in:
Embedding in the weights: Rather than relying solely on surface instructions, I propose using axiomatic fine-tuning. The idea is to integrate these principles directly into the model’s weights so they become a more native characteristic of the system, rather than just a contextual layer.
Phase anchoring: I am looking to create a coherence anchor within the inference process itself. If these principles are ‘etched’ into the logical structure, behavioral continuity could become a property of the model itself rather than an illusion maintained by the cache.
Heuristic results: My current observations on 30 complex dilemmas show interesting resilience, but for now, they remain strictly heuristic. This opens up a hypothesis, but does not yet constitute proof.
This is precisely why I am looking for a technical partner or an AI safety specialist. The goal is to conduct the standard experimental protocol I have proposed (100 dilemmas across several models) in order to move toward true empirical validation—or to invalidate this hypothesis if the structure does not hold up.
I look forward to discussing the feasibility of such a protocol with you.
Discussion in the ATmosphere