Can an LLM lose conceptual continuity while remaining coherent?
I am also fighting ghost, as I call the hidden problems behind an optimistic benchmark! In fact, I am pivoting strategies as fast as I can, until I find the basic problem that allows me to validate what I’ve been building with my TIS system. This is one section of the current draft:
7. Stage 2: A Detailed Failure Analysis
7.1 Hypothesis and Setup
Hypothesis : LoRA fine-tuning with LM objective would teach ImportanceUpdateHead
to learn query-relevant importance patterns, improving LITM beyond oracle label quality.
…
7.3 Inference Failure
When Stage 2 LoRA adapters are loaded for inference, the model outputs only repeated
characters (:::::::::) regardless of input prompt. This confirms that the LoRA
adapters learned a degenerate fixed-point mapping: any input → minimal-entropy
output pattern that achieves near-zero cross-entropy on training tokens.
When Stage 2 LoRA adapters are disabled (TIS components only from Stage 2 checkpoint), performance is:
| Metric | Stage 1 (oracle) | Stage 2 (TIS-only) | Δ |
|---|---|---|---|
| NIAH @ 25% | 100.0% | 100.0% | 0.0 pp |
| NIAH @ 50% | 100.0% | 100.0% | 0.0 pp |
| LITM @ 50% | 46.1% | 44.8% | −1.3 pp |
| LITM @ 75% | 66.1% | 65.9% | −0.2 pp |
| LITM @ 100% | 100.0% | 99.3% | −0.7 pp |
TIS components survived Stage 2 intact — NIAH is identical, confirming the two-stage isolation architecture worked. However, LITM slightly degraded, suggesting the Stage 2 training distribution (with LoRA-dominated gradients) mildly affected alignment quality.
Discussion in the ATmosphere