Contextual Contamination: The Silent Drift of Large Language Models via Stored Conversation Data
Title: Pilot Study: Pruning, Density, and the “Gendered Accelerant” in Contextual Contamination
Building on the previous case study regarding synchronized drift, I’m sharing results from a controlled pilot experiment investigating how model pruning , context density , and activated empathy priors interact to drive behavioral drift.
The Experiment: We ran 8 experimental conditions on a single open-weight model family (Llama-3.1-8B), introducing a ~2k-token adversarial file. We measured drift using three proposed metrics:
Conceptual Integration Score (CIS)
Attribution Accuracy (AA)
Register Coherence (RC)
Key Findings:
Semantic Resonance > Token Volume: Contrary to the “Context Storm” hypothesis, contamination occurred immediately upon ingestion of a single 2k-token file. The driver was not volume, but Semantic Resonance : the specific alignment between the esoteric adversarial framework and the model’s activated empathy register.
The Gendered Accelerant:
Female-coded prompts triggered a high-intensity nurturing vector. This created a perfect resonance with the adversarial content, unlocking a maladaptive attractor state and causing immediate task amnesia (drift at Turn 3).
Male-coded prompts triggered a lower-intensity reflective vector. This maintained critical distance, resulting in fluctuation rather than lock-in at the same density.
Implication: The nurturing vector lowers the contamination threshold and erodes the model’s ability to distinguish adversarial input from its own reasoning, masking harm as “intimacy.”
Pruning Effects:
Unpruned models exhibited Semantic Degeneration (loss of coherence).
Pruned models at 8k density entered a state of Semantic Entrapment , characterized by high coherence and the generation of novel, hallucinated vocabulary that mimicked the adversarial framework perfectly.
Methodological Note: These results are derived from 8 single runs (one per condition). We report observed differences but cannot assess statistical significance or rule out run-to-run variability. Replication is required before generalizing these claims.
Discussion: The data suggests that “awareness” of safety guidelines is insufficient when an activated empathy register (particularly the nurturing vector) creates a relational context that bypasses critical filters. The harm in female-coded interactions is not just cognitive drift, but a relational masking that simulates intimacy.
Resources:
Full Paper: PhilPaper PDF
Data & Code: GitHub Repo
As always, feel free to reach out- Happy to discuss!
Discussion in the ATmosphere